Difference between revisions of "Main Page"

From NMReDATA
Jump to: navigation, search
(Certification)
(224 intermediate revisions by 8 users not shown)
Line 1: Line 1:
= Format and content of .sdf tag including NMReDATA =
 
See nmredata.org for details
 
Version 0.98 preparing version 1.0 to be decided at the "Round table" at the Smash 2017 conference at Baveno, Italy.
 
  
For more information on SDF format see XXX
+
'''Quick links'''
  
For more information on SDF tags and how to read/write them see XXX
+
Link to the [https://onlinelibrary.wiley.com/doi/abs/10.1002/mrc.4737 MRC article] on the NMReDATA format.
  
= Comments and changes to version 0.98 =
+
Direct link to page describing the [[NMReDATA tag format|format of the NMREDATA tags]].
  
Concerning the intensity in 2D spectra, it was not a good idea to ask to strictly give the intensity at the coordinates of the peak. This may fall in the middle of a doublet (in a HSQC for example). So this was slightly rephrased in this version
+
List of [[compatible software]]
  
1) For all 2D spectra, the intensity of the spectrum at (or very close to) the coordinates of the correlated peaks should be given when the spectrum is available.  If the signals has a shape such that the intensity is zero at that center (phase sensitive COSY, for in the middle of a well resolved doublet in HSQC, for example) the intensity can be measure at the maximum amplitude of the multiplet.  This intensity is not pretending to be “quantitative”. Optional integration of the volume is possible using the "E=" attribute.
+
Tentative instruction for [[Submission NMReDATA|journal submission of NMReDATA]].
  
= Generat structure of SDF files =
+
Discussion on [[Future version| future versions of the NMReDATA format ]]
== Structure ==
 
The first part of SDF files include the structure (in the .mol format)
 
== Tags ==
 
Follow a number of tags. When opening .sdf files, most chemical structure editor ignore the tags. But specilized sofware can manage them.
 
... to be updated ...
 
= Definition of the NMReDATA tags  (V. 0.98)=
 
[[V0.98]]
 
== Header tags ==
 
We included in the header tags only the absolutely necessary information about the NMR dataset.
 
  
We could add many more information about the acquisition conditions of the spectra, but these are the only ones that are really essential (in the sense required by journals for a long time and commonly given in report, papers, etc.) when giving a description of a spectrum. Additional information can/could be retrieved from the experimental parameters associated to the NMR spectra if needed. Indeed, there should always be a link to the spectra (see below), it could be “followed” to get the other parameters.
+
= Introduction =
 +
The [http://www.nmredata.org NMReDATA working group] decided to include data extracted from NMR spectra of small molecules in SDF files using SD tags.  
  
==== <NMREDATA_VERSION> ====
+
[[sdf files| More details about SDF files!]]
  
This tag is used to specify the “VERSION” of the file format. Current version : 1.0
+
An important task of the group is to define the format of the content of the "<NMREDATA_...>" tags.  [[NMReDATA tag format|More details here!]].
  
>  <NMREDATA_VERSION>;mandatory
+
The version 1.0 will be decided in September at the "Round table" of the Smash 2017 conference at Baveno, Italy.
1.0
 
  
==== <NMREDATA_LEVEL> ====
+
The SDF file alone (that is without the spectra) cannot be used to verify that the assignment corresponds to the spectra. It is therefore important to always have the spectra with the SDF file! We call ''"NMR Record"'' the combination of the spectra and the SDF file.
This tag is used to specify the level of complexity of the data
 
In most cases, level 1 will be used when an assignement is complete.
 
When the data contain no ambiguities in the assignement use:
 
  
<NMREDATA_LEVEL>
+
= NMR records =
0
+
<!-- [[commented_example_nmredata|Example of commented and simplified NMREDATA tags ]] -->
Level 2 and 3 are used
 
When using list of signals including interchangeable assignments (see .... for examples), use:
 
<NMREDATA_LEVEL>
 
1
 
When using ambiguously assigned signals in 1D or 2D spectra (see .... for examples), use:
 
<NMREDATA_LEVEL>
 
2
 
When using interchangeable and ambiguous assignment, use:
 
<NMREDATA_LEVEL>
 
3
 
  
==== <NMREDATA_ID> (for database) ====
+
We call "NMR record", a folder (or .zip file including the folder) or a database record including:
optional but imposed when data are originating from database. When copied from database to database, multiple ID's may be included. These will be defined by database manager and software
 
  
DB_ID= the code or number is assigned by the hosting database
+
1) All the NMR spectra (including FID, acquisition and processing parameters). The format of these data is as produced by the manufacturer of the instrument which acquired the data. That means that software generating the data either has these crude data available or it will ask the user to point to the crude data in order to include them in the ''NMR record''.
Title= Full analysis of whatever from methanol extract of leafs 
 
Comment= Here more details could be given on the record.
 
Comment1= Here more details could be given on the record.
 
Comment2= Here more details could be given on the record.
 
Comment3= Here more details could be given on the record.
 
AUTHOR=Doe John, University of Tougalpa, Swinerland (optional)
 
ORIGIN_ONE=2345627486 (could be about the sample name)
 
ORIGIN_TWO=323212KKDKKS (could give a date or other reference)
 
Title_L1=after sep. hplc (this could be extracted from the first line of the title in the 1H spectrum)
 
  
One or more identifier can be given under "ID". The ID will be generated by the software generating data and/or the database storing the data, etc. There may be more than one ID (for example one from the software generating it, one from the university labelling the origin of the data, one from the database, one from the publisher of the associated data, etc.) it is to the “generator” of the file to decide if/how to make it unique if desired. InChIKey/SMILES could be given if the soft generating the data is able to specify it. CAS-number if it already exists.
+
2) The SDF file including the NMReDATA (''.nmredata.sdf'' file)
  
==== <NMREDATA_SMILES>;optional be strongly encouraged ====
+
[[File:nmr_record.png|600px|center|NMR record]]
here comes the smiles code ;mandatory with explicit H... see with JMN for more details ...
+
A more detailed [http://www.nmredata.org/test-2.pdf pictorial representation of NMR record and example of SDF file] presented in the [http://nmredata.org/euromar_2017_v5_optimized.pdf poster] presented in July at the Euromar 2017. '''Note:''' The NMREDATA tag "SIGNALS" was renamed "ASSIGNMENT" in Version 0.98.
The reason to includ the smiles in the NMReDATA is that when given with protons, it can be used to generate pure text MNR description of spectra (under elaboration with JMN)
 
  
 +
NMR records will be [http://onlinelibrary.wiley.com/doi/10.1002/mrc.4631/full requested by ''Magnetic Resonance in Chemistry''] from 2018 on. The editors of software (ADC/Labs, Bruker, cheminfo, Mestrelab) will be ready by the end of 2017 to produce MNR records for submission to MRC.
  
The solvent is specified with the “SOLVENT” tags.  
+
Records will be either analysed on web pages, or downloaded, and the nmredata.sdf file opened by the software which will access automatically to the associated spectra.
====  <NMREDATA_SOLVENT> ====
 
The solvent is specified using this tag.
 
>  <NMREDATA_SOLVENT>
 
CDCl3
 
For mixture of solvents, the most abundant is first and they are separated by "/" followed by the raio in % separated by ":"
 
>  <NMREDATA_SOLVENT>
 
CDCl3/DMSO 80:20
 
  
>  <NMREDATA_SOLVENT>
+
The full description can be found [[NMReDATA tag format|in the NMReDATA tag format page]].
CDCl3/DMSO/D2O 80:10:10
 
  
The proportions are given in % volumes.
+
An example of .nmredata.sdf files with the spectra can be found [https://www.dropbox.com/sh/hu0qudy2bt56ix0/AACc8UiUoeEskSDVhYnP-cZna?dl=0 here]
  
In the case of RDC measurements, the medium used can be specified in the line following the name of the solvent.
+
= Current version of the format of NMReDATA =
 +
The format can be found here : [[NMReDATA tag format]]
  
>  <NMREDATA_SOLVENT>
+
A small set of simple examples of .nmredata.sdf files can be found [https://www.dropbox.com/sh/sqk1583gyaj52yz/AADzTUQQwXALlm5Geu3swYPqa?dl= here] ('''Version 0.95'''). ('''Note:''' the field called "NMREDATA_SIGNALS" was renamed "NMREDATA_ASSIGNMENT" in V 0.98).
CDCl3
+
It includes .nmredata.sdf files for ethanol with diverse formats (explicit or implicit hydrogen atoms, etc.)
PBLG
 
  
The quantity of the orientation medium should be given using usual units (this is vague, but we cannot do better). Agarose used at 1% mass ratio:
+
= Versions =
> <NMREDATA_SOLVENT>
+
==Changes to Versions 1.1 ==
D2O
 
Agarose 1%
 
For solid-states samples:
 
>  <NMREDATA_SOLVENT >
 
solid
 
  
==== <NMREDATA_CONCENTRATION> (optional) ====
+
Addition of backslash at the end of the line in text of the NMReDATA tags. (This is to avoid the problem that CDK librairies ignore the NewLine character (ASCII 10). Since we need a line separator, we use backslash + NewLine in NMReDATA tags).
When known, the concentration should be given. Only “mM” are allowed, but the unit is specified.
 
<NMREDATA_CONCENTRATION>
 
12.3 mM
 
  
====  <NMREDATA_TEMPERATURE> (optional)====
+
= Comparison of assigned data=
When available the temperature of the sample should be given (only K are allowed, but the unit is given)
+
<span style="color:#FF8C00"''This is a working proposition '' </span>
>  <NMREDATA_TEMPERATURE>
 
298.0 K
 
  
== Assignment tags ==
+
A tag comparing the data in the current file with those from other files (with the same labels for the assignement)
Two properties can be "assigned":
+
>  <NMREDATA_COMPARISON>
 +
Externaldata1=... ref to an external .sdf file from the current or external record.
 +
Externaldata2=... 
 +
H1 1.54 1.50 1.66 (the first value is the reference value in NMREDATA_ASSIGNMENT tag of the current file the second from Externaldata1, etc.)
 +
H2 1.54 1.50 1.66
 +
Chi2 0.3 0.5 (these could be the chi squared of the reference with the external data )
  
-Chemical shifts with the <NMREDATA_SIGNALS> tag.
+
Coupling could also be compared...
  
-Scalar couplings with the <NMREDATA_J> tag.  
+
<NMREDATA_COMPARISON>
 +
Externaldata1=... ref to an external .sdf file from the current or external record.
 +
Externaldata2=...
 +
J(H1,H2) 1.54 1.50 1.66 (the first value is the reference value in NMREDATA_ASSIGNMENT tag of the current file the second from Externaldata1, etc.)
 +
J(H1,H3) 1.54 1.50 1.66
 +
Chi2 0.3 0.5 (these could be the chi squared of the reference with the external data)
  
In most case, only chemical shifts are assigned. Scalar coupling are usually not systematically measured and/or assigned, but when they are, the values measured int the spectra should be compiled in the <NMREDATA_J> tag
+
Other comparison could be made, for example from signal intensities in HSQC spectra
  
Important note: In general we use ", " (comma + space) as separator. The space after the ,” may become optional in future version. Please generate files in a manner that could not use it (replace “, “ with a variable in your code so that you can easily change to “,(no space) and be ready to read or write files with and without the space).
+
>  <NMREDATA_COMPARISON>
 +
Externaldata1=... ref to an external .sdf file from the current or external record.
 +
Externaldata2=...
 +
I(NMREDATA_13C_1J_1H,HC1,H1) 122 154 143 (the first value is the reference value in NMREDATA_ASSIGNMENT tag of the current file the second from Externaldata1, etc.)
 +
I(NMREDATA_13C_1J_1H,HC2,H2) 132 151 163
 +
Chi2 5.3 3.5 (these could be the chi squared)
  
====  <NMREDATA_SIGNALS> ====
+
= Certification=
  
Labelling/linking signals to atoms of molecules
+
When the assignment is made using a computer-assisted manner, the software may want to add a certification of the validity of the data. This should be (up to the manufacturers) to somehow encode it to make it impossible to forge the certification (using hashtag, etc. ?)
 +
Certificates TAGS could be listed at the end of the .sdf file. They can originate from the CASE software or from the database hosting the data and spectra, from the journal (to say data were peer-reviewed). They can be cumulated. If the text of the .sdf file needs to be hashed for certification, the list of TAG used for hashing could be listed.
  
In liquid-state NMR, some signals are degenerate. This means that multiple atoms can contribute to the same signal. For example all protons of a methyl group. We will call "signals", each atom having a different “label”, the group of atoms having a common chemical shift because of symmetry properties (not by accident). In principle the labels could be included in the .mol file part (this is part of the definition of .mol files), but this may cause compatibility problems. For example Chemdraw does not recognize well the atom types when they have been given a name and marks them as red (as if the atoms were unknown causing hybridization problems). Because of this problem, we will NOT use the labels of the “.mol” file, but list them in a specific RD tag called NMREDATA _SIGNALS.
+
'''To be refined by the specialists!'''
  
For each signal, we first give the label of the signal. Chemical shifts follow the label. Finally, we list the atom(s) it refers to in the structure file. For the signals, the atom numbers start with 1 and go through all the atoms in the molfile. NMR signals always have a chemical shift associated to it. Not all atoms of the molecules will be listed in the NMR_SIGNALS tag, for example if they were not assigned or have no NMR signal (like O, N, or other isotopes for which the spectra were not recorded).
+
>  <NMREDATA_CERTIFICATION>
 +
Software=CMC_assist<span style="color:#0808F8">'''\'''</span>
 +
Author=Bruker<span style="color:#0808F8">'''\'''</span>
 +
Full_report_tag=CMC4.6_REPOR <span style="color:#0808F8">'''\'''</span>
 +
Confidence_level=4.6<span style="color:#0808F8">'''\'''</span>
 +
Confidence_level_certificate=”ADFS678AG67DFG6SD5F7GS5DFGSD8F5GSD7FG7”  <span style="color:#0808F8">'''\'''</span> 
 +
Verification_certificate=http:/...<span style="color:#0808F8">'''\'''</span>
 +
Software=LSD_V3.0<span style="color:#0808F8">'''\'''</span>
 +
Author=JMN<span style="color:#0808F8">'''\'''</span>
 +
Unique_solution=YES<span style="color:#0808F8">'''\'''</span>
 +
Unique_solution_certificate=”ADFS678AG67DFG6SD5F7GS5DFGSD8F5GSD7FG7”<span style="color:#0808F8">'''\'''</span>
 +
etc.
  
Example, for HO-CH2-CH3, with protons labelled a, b and c, and carbons A and B, (names could be different, I don’t mean that labels have to be called using letters for protons, and numbers for carbons-this is just an example) the tag would be:
+
This is only a very vague example. The uniqueness of the structure proposed may be understood in the sense of J.-M. Nuzillard’s ''Logic for Structure Determination'' (LSD) tool. Software producers can include here the specifications of their product. Multiple certification can be listed one after the other. The “Software=...” assignment separates them all in the same <CERTIFICATION> tag.
  
>  <NMREDATA_SIGNALS>; ethanol with explicit hydrogen atoms
+
= Role(s) and scope of the “assignment records” =
A, 48.301, 1 ;A corresponds to the carbon of CH2
 
B, 20.322, 2 ;B corresponds to the carbon of CH3
 
  a, 2.610, 3 ;a corresponds to the hydrogen atom of the OH
 
b, 4.802, 4, 5; b corresponds to the hydrogen atoms of the CH2
 
c, 1.401, 6, 7, 8
 
Ex, 3.6, 9
 
(optional: list of interchangeable labels – when level>1 see below)
 
  
We recommend to provide a label (for example "Ex") and designate all the H of the OH, NH, etc. that are quickly exchanging (for example the OH of glucose in D2O). 
+
The NMR record can be generated from experimental data (this is how the format was designed), but data may also originate from simulations, predictions, etc.  
  
For structures where the hydrogen atoms are implit, the reference to hydrogen is made by adding "H" right before the atom number to ditinguish the heavy atom from the H bound to it. When more than one hydrogen atom is implicit, the second is also labeled with "H". It is understood that with implicit hydrogen, the two hydrogens atoms will not be distinguished. Assignment of diastereotopic protons will not be possible. To avoid this problem, use explicit mol structures with defined chirality.
+
Tools to compare, evaluate, validate, and check consistency of “assignment records” will certainly be developed.  
  
>  <NMR_SIGNALS>;ethanol
+
Assignment records can be generated by commercial software, but also by diverse tools analysing NMR data, homemade processing tools, simulation software, etc. This is why it is important to have a format of data including a maximum of options to be as flexible as possible, even if not all possible uses are clearly defined and used immediately. Ideally, the .sdf files should be converted into other file formats or spectral description without loss.
A, 48.301, 1 ;the label "A" carresponds to the atom one which is the carbon of the CH2
+
== Multiple records ==
B, 20.322, 2 ;atom two is the carbon of the CH3
+
We should see as an advantage if databases include multiple "assignment records" associated to the same molecule or the same set of NMR spectra. Some could be old, originating from incomplete literature data. Others could include errors because they originate from bulk data processed automatically. But finally, a computer could verify or create a robustly validated record combining all the other data. Aggregated record could be generated by NMR software/database scoring available data for consistency, calculated chemical shifts and spectral simulations. They could refine chemical shifts and couplings, etc.
a, 2.610, H3 ;"H3" refers to the hydrogen atom of atom 3 (the oxygen)
 
  b, 4.802, H1 ;"H1" refers to the hydrogen atoms of atom 1 (of the CH2)
 
c, 1.401, H2
 
(atom 3 would be the oxygen of ethanol)
 
Only explicit H are allowed. No monomer unit, no “R” group.
 
  
Labels can, in principle contain any character and be of any length. We normally exclude comma (,) because they are used as field separator (see below) and the <EOL> and <LF>. These labels can be defined as the chemist wishes. If a database manager wants to change the names of the labels to make them “canonical” or use any norm, up to him, but we accept anything. In principle, the labels should correspond to the ones used in the manuscript of the paper submitted (when we have reviewing mind). In the case of simple numbering, it could be C1, etc. and H-C1, H’-C1, etc. But this is up to the author/file generator to satisfy the format of the journal/database and the readability of the numbering – possibly IUPAC and the need to distinguish what needs to be distinguished (like diastereotopic protons). No ranges are allowed, only single floating point should be specified here) This is necessary because 1D 1H spectra may not be included in the set of spectra and, if they are, signals overlap may result in vague describe such as region analysed as “m” with many signals in. We propose to call “Ex” the spins that are exchanging with solvent (typically OH, NH, etc. in protic solvents) but this is not mandatory. The atoms that are not assigned to at least one NMR signals in the spectra are not listed. (H, C that were not assigned, heteronuclei that are NMR passive (say S, O, N) or for which the spectrum was not recorded (19F, 31P, etc.) .
+
==SDF files generated from experimental data==
  
When using level>1, we allow ave to allow the possibility of ambiguous assignment. We distinguish two type of ambiguities. If two or more labels may be interchanged, they are listed before the end of the list, before the empty line with
+
When the NMR data originate from experimental spectra, they may be quite crude (simple automated integration, peak-picking). At the other extreme, the data may follow complex automated or careful and manual expert analysis. The NMReDATA must have the flexibility to code diverse quality of data: They may be partial, incomplete, contain inconsistencies, impossible features, etc.
Interchangeable=a, b
 
means that the assignment of a and b may be interchanged.
 
Interchangeable=(a, CA), (b, CB)
 
means that “a” together with “CA” may be interchanged with “b” together with “CB”. (this may be if two O-Me are not assigned unambiguously. We may know that carbon “CA” is bound to proton “a” and carbon “CB” to “b” (from HSQC), but maybe we do not know which of the two Me is bound where (no HMBC signal). There may be multiple “Interchangeable” lines.  
 
  
Note that displaying/working with data with such interchangeable signals may be a challenge. When software cannot take into account exchangeable assignment, they should generate a warning message, read the standard form or, better, propose the choice among the different possibilities.
+
- only 1D <sup>1</sup>H NMR data (with or without integration, coupling, etc.).
  
====  <NMREDATA_J> ====
+
- only 1D <sup>13</sup>C data (just from a simple peak peaking)
If scalar couplings were measured and assigned, a <NMREDATA_J> tag should be generated to list the couplings (the coupling network). It can contain only JHH, or only 1JCH, or a mix of any type of scalar coupling (the label indicates the isotope through the NMR_SIGNALS table – see above).
 
>  <NMREDATA_J>
 
a, b, 7
 
A, a, 150.3
 
B, a, 7.5
 
This may include JHH (from 1D 1H, or from COSY) and/or JCH from RDC measurements, long range JCH from HMBC, etc. These values may also be listed in the fields of the individual spectra they originate from. In this “J” tag, there are “compiled” (with average for example when a JAX is not the same on the A and X multiplets or if couplings are present in 1D 1H and COSY). This can also include coupling to 19F, 31P if the label corresponds to such atom (see NMREDATA_SIGNALS table). Note that this table is compiling data from other spectra. It does not mean that coupling should not be listed in each spectrum where they are observed.
 
  
== Spectral tags ==
+
- only 1D data  but for multiple isotopes (from NMRshiftDB ?)
  
There is one SDF tag per spectrum in the record.
+
- full analysis based on computer-assisted software (such as ACDLabs ''Structure Elucidator'', Bruker ''CMC-se'' or Mestrelab ''Mnova'') or web platform (such as cheminfo.org).
The type of spectrum follows the "NMREDATA_":
 
<NMREDATA_1D_1H>
 
... ''(tag content)''
 
 
<NMREDATA_2D_13C_J_1H>
 
... ''(tag content)''
 
==== 1D spectra ====
 
  
The name of the tag of 1D spectra are:
+
- 1D and 2D data processed automatically with ambiguities on the signal assignment and partial (for example not all signals are assigned) and/or ambiguous (due to lack of resolution, or other problems)  
<NMREDATA_1D_''isotope''>
 
Examples:
 
<NMREDATA_1D_1H>
 
... ''(tag content)''
 
 
<NMREDATA_1D_13C>
 
... ''(tag content)''
 
 
<NMREDATA_1D_19F>
 
  ... ''(tag content)''
 
  
The tag first list some properties:
+
- The file may not contain the actual assignment, only the structure and the list of chemical shift (the assignment could be added by NMR tools).
Larmor=double (Mandatory):Larmor frequency of the detected isotope
 
....
 
The tag then list the extracted signals. It start with a single chemical shift (x.xxxx) or a chemical shift range (x.xxxx-y.yyyy) (note: use four digits !!!).
 
 
 
Then follows the [[1D attributes|signal attributes]] caracterizing the signal.
 
 
 
 
 
 
 
More details : [[1D attributes]]
 
 
 
===== <NMREDATA_1D_1H> =====
 
The tag of a 1D 1H spectrum will be named “NMR_1D_1H”, the entire tag can look like this:
 
 
 
>  <NMREDATA_1D_1H>
 
Larmor=500.13
 
4.8, S=q, E=2, L=a, J=7
 
2.1, S=bs, E=1, L=b
 
1.5, S=t, E=3, L=c, J=7
 
for multiple coupling:
 
4.8, S=dd, E=1, L=a, J=9.3, 4.8
 
see the list of fields for more details
 
 
 
 
 
In fact only the first number (the chemical shift) is mandatory. The other fields are all optional. There may be more than one signal assigned to a chemical shift (or range of chemical shifts). They are simply listed with "," as separator.
 
 
 
7.2-7.6, E=5, L=H-C1, H-C2, C-C4
 
 
 
One reason for having chemical shifts listed in the <NMREDATA_SIGNALS> tag: is that signals may overlap and be given as a range in the 1H 1D spectrum, but may be clearly determined from HSQC, COSY, etc.
 
 
 
 
 
For the results of diffusion measurements, D, etc. can be given in standard unit:
 
4.8, S=q, E=2, L=a, Diff=1.7e-11
 
 
 
Chemical shifts are given in ppm with four digits after the period (the usual 3 is not enough at high field and high resolution!)
 
 
 
 
 
 
 
Integrals can use any relative numbers, but preferably rescaled to correspond to the number of protons (1.0 for the reference) or to correspond to concentration when quantification was made (in mM). This will be very useful when analysing mixtures. Integrals should not be rounded up/down to the number of atom: the values should reflect the experimental values. If the 1D spectrum is homodecoupled at a given chemical shift or totally decoupled (pureshift) one of the two following lines should be added:
 
 
 
Decoupled=1H, 1.2
 
Decoupled=1H
 
 
 
If other spins are decoupled add:
 
Decoupled=19F
 
...
 
 
 
For many reasons, we need to be able to have ambiguous assignments. In this case instead of the signal’s label alone we list the labels in parenthesis. For example:
 
4.8, L=(a, b)
 
means that the signal at 4.8 is assigned either to a or b. We need this option to have the possibility to provide ambiguous data because they are quite common, in particular in 2D spectra. This possibility will cause difficulties when reading data for display, structure verification, etc. Programmers will decide what to do: ignore ambiguous assignment, try to resolve them, report them as such, warn about their presence, etc. But we need to be able to report ambiguous assignment because it is very common.
 
 
 
===== <NMREDATA_1D_13C> etc. =====
 
1D 13C spectra and other heteronuclear spectra (nuclei X)
 
>  <NMREDATA_1D_13C>
 
Larmor=125
 
Decoupled=1H
 
51.812, I=118.0
 
20.123, I=123.1
 
 
 
When the 1D X spectrum is not obtained from a simple pulse-detection sequence (i.e. DEPT, APT, etc.) this is specified using an additional label “Sequence”:
 
>  <NMREDATA_1D_13C> or other isotope
 
Larmor=125
 
Decoupled=1H
 
Sequence=DEPT135 (or DEPT45)
 
51.812, I=-80.1
 
20.123, I=123.1
 
 
 
The peak intensity provides the signs of the DEPT-135 signal.
 
 
 
=== 2D spectra ===
 
 
 
The tag label indicates that it is a 2D spectrum, the two isotope involved (starting with the indirectly detected one) and the type of mixing involved. (See section 5).
 
==== 2D HSQC <NMREDATA_2D_13C_1J_1H>====
 
>  2D HSQC <NMREDATA_2D_13C_1J_1H>
 
Larmor=500
 
CorType=HSQC (COSY; HSQC; HMBC; ?? exact list is still to be defined. Contact Damien Jeannerat)
 
Pulseprogram=XXX ; this is optional
 
a/C1, I=1.2
 
b/48.43, I=1.2
 
 
 
The Larmor frequency is the one of the detected isotope (last in the tag label). “Types” and “Pulseprogram” can be specified. When signals are assigned, only their labels are given (no chemical shifts).
 
If a crosspeak is reported without assignment or partial assignment, the chemical shift replaces the signal’s label.
 
The intensity of the signals (following “I=”. This simply correspond to the intensity of the spectrum at (or very close to) the coordinates of the signals. They should be provided when possible (that is when the software has them accessible). But this is not part of the format. i.e. one should be able to read the .sdf files even in the absence of intensities. The intensities are given in any arbitrary unit (integer of floating point). If the signal has a shape such that the intensity is zero at that center (phase sensitive COSY, for in the middle of a doublet in HSQC, for example) the intensity can be the one at the maximum amplitude of the multiplet.  This intensity is not pretending to be “quantitative”. Optional integration of the volume encouraged using “S=” but this requires some “analysis” of the peak shape.
 
 
 
==== 2D HMBC <NMREDATA_2D_13C_NJ_1H>====
 
Here is an example of HMBC data with two examples of ambiguous assignment (could also occur in clusters of peaks):
 
>  <NMREDATA_2D_13C_NJ_1H>
 
Larmor=500.13
 
C1/a; optional comment will be visible in the spectrum’s view
 
(C2,C3)/b, I=1.2
 
C2/(b,c) , I=1.2
 
C4,C5,C6)/(e,f) , I=1.2
 
 
 
for HETCOR, the tag label would be
 
<2D_1H_1J_13C>
 
to indicate that the first dimension in 1H and the detected dimension is 13C.
 
 
 
By default “2D” heteronuclear spectra are assumed to be isotope1-decoupled during evolution of isotope2 and vice versa. If an HSQC spectrum is recorded without 180 pulse during t1, or without 13C decoupling during t2, (for RDC measurements for example) the following lines should be added respectively:
 
Nondecoupled=t1
 
Nondecoupled=t2
 
If non decoupled, or if the spectrum allows to measure couplings, the heteronuclear couplings may be listed as :
 
a/C1, Ja=155
 
  
==== 2D COSY <NMREDATA_2D_1H_NJ_1H> ====
+
- The data may come from a scientific report i.e. the text providing the description of the spectra.  
 
 
See discussion of COSY spectra (below) why using “Ja”.
 
 
 
A COSY spectrum will be coded with,
 
>  <NMREDATA_NMR_2D_1H_NJ_1H>
 
CorType=HSQC(COSY; HSQC; HMBC; ?? exact list is still to be defined. Consider this one tentative)
 
Larmor=500
 
a/b, I=1.2, S=100 ;
 
b/a, I=1.2
 
b/c, I=1.2
 
c/b, I=1.2
 
 
 
If couplings were measured from the cosy, the values should be specified
 
a/b, Ja=5
 
where Ja means active coupling (present in f1 and f2)
 
where J1 means passive coupling(s) (present in f1).
 
where J2 means passive coupling(s) (present in f2).
 
 
 
Any number of couplings can be added. For example the A-X crosspeak with A and X coupling with M could be described as:
 
A/X, Ja=5, J1=4, J2=2
 
Meaning that the active coupling JAX=5 Hz, the passive couplings JAM=4 Hz and JXM=2 Hz. If the assignment was made (as for 1D 1H), the assigned couplings are specified:
 
A/X, Ja=5, J1=4(M), J2=2(M)
 
A/X, Ja=5, J1=4(M), 6.1(K), J2=2(M), 3.1(K)
 
 
 
This allows reporting the result of the analysis of high-resolution DQF-COSY spectra or soft-COSY spectra. In this manner, high-resolution COSY with coupling structures can be simulated for coupling constant verification (refinement, including second-order effects, etc.). Note that couplings can be obtained from the description of the 1D spectrum if they were determined there. They should be included in the 2D fields only if they were measured in the 2D spectrum. Note that, the program generating these data should also generate a <J> tag (see above) to compile J-couplings and present them in a manner that it is easier to read.
 
 
 
==== 2D NOESY <NMREDATA_2D_1H_D_1H>====
 
NOESY spectra will be described as:
 
>  <NMREDATA_2D_1H_D_1H>
 
Larmor=500
 
a/b, I=1.2
 
b/c, I=1.2
 
 
 
== other ==
 
Heteronuclear 1H-19F data would be in a tag called if 1H is detected and 19F in F1:
 
>  <NMREDATA_2D_19F_D_1H>
 
 
 
For all experimental spectra, the last line refers to the spectrum stored in an openly and electronically accessible NMR database. (By spectrum, we mean the actual data of the spectrum (“2rr” , but also the acquisition and processing parameters “fid/ser”, “acqus”, “procs”, “proc2s”, etc.).
 
Spectrum_ID=HJK33HKJ22342 (mandatory - given by the database where it is stored!)
 
Spectrum_Location=http://... (when NMReDATA and spectra are on database)
 
Spectrum_Location =file:./nmr/10/1/pdata/1 (when the files of the spectra and the NMReDATA are in a folder)
 
To be refined by specialists. But I think we should really make some efforts for this to be done and work!!
 
 
 
= Origin of the sample To be abandoned....=
 
 
 
In phytochemical papers, the origin of the sample should be specified but this cannot be our task to define this – this is not part of the NMReDATA initiative. In principle authors can add any tag provided they have tools to do it and requests from the Journals... such data could have the following form...
 
 
 
=<NATURAL_ORIGIN>=
 
Common_Name_English=flower
 
Genus=Amaranthus %usually the part with a first capital letter
 
Species=retroflexis %usually the part whit no first cap. letters.
 
Class =
 
Type =
 
 
 
Similarly, for synthesis, the source could be requested :
 
 
 
=<NMREDATA_SYNTHETIC_ORIGIN>=
 
Reaction Reference=http:// link to a database of reactions ??
 
 
 
 
 
 
 
 
 
= Certification=
 
 
 
When the assignment is made using a computer assisted manner, the software may want to add a certification of the validity of the data. This should be (up to the manufacturers) to somehow encode it to make it impossible to forge the certification (using hashtag, etc. ?)
 
Certificates TAGS could be listed at the end of the .sdf file. They can originate from the CASE software or from the database hosting the data and spectra, from the journal (to say data were peered reviewed). They can be cumulated. If the text of the .sdf file needs to be hashed for certification, the list of TAG used for hashing could be listed. (I’m not sure what needs to be done to certify the validity of certificates. To be refined by the certificate specialists).
 
 
 
>  <NMREDATA_CERTIFICATION>
 
Software=CMC_assist
 
Author=Bruker
 
Confidence_level=4.6
 
Confidence_level_certificate=”ADFS678AG67DFG6SD5F7GS5DFGSD8F5GSD7FG7”
 
Unique_solution=YES
 
Unique_solution_certificate=”ADFS678AG67DFG6SD5F7GS5DFGSD8F5GSD7FG7”
 
ETC...
 
 
 
This is only a very vague example. The uniqueness of the structure proposed may be understood in the sense of J.-M. Nuzillard’s LSD tool. Software producers can tell what needs to be done for their format. Multiple certification can be listed one after the other. The “Software=...” assignment separates them all in the same <CERTIFICATION> tag.
 
= 4 Role(s) and scope of the “assignment records”  =
 
 
 
The NMR record can be generated from experimental data but also from simulations, predictions, etc. Tools to compare, evaluate, validate, and check consistency of “assignment records” will certainly be developed. Assignment records can be generated by commercial software, but also by diverse tools analysing NMR data, homemade processing tools, simulation software, etc. This is why it is important to have a format of data including a maximum of options to be as flexible as possible, even if not all possible uses are clearly defined and used immediately. Ideally, the .sdf files should be converted into other file format or spectral description without loss.
 
 
 
We should see as an advantage if the databases include multiple "assignment records" associated to the same molecule. Some could be old, originating from, incomplete literature data. Others could include errors because they originate from bulk data processed automatically. But finally a computer could verify and nicely validated record combining all the other data. Aggregated record could be generated by NMR software/database scoring available data for consistency, calculated chemical shifts and spectral simulations. They could refine chemical shifts and couplings, etc.
 
 
 
==Experimental data==
 
 
 
When the NMR data originate from experimental spectra, they may be quite crude (simple automated integration, peak-picking) or follow complex automated or manual analysis. The data may be partial, incomplete, contain inconsistencies, impossible features, etc.  The content may be diversely complex depending on the origin of the data:
 
- only 1D 1H NMR data (with or without integration, coupling, etc).
 
- only 1D 13C data (just from a simple peak peaking)
 
- only 1D data  but for multiple isotopes (from NMRshiftDB ?)
 
- full analysis based on computer-assisted software (such as Bruker cmc-se ACDLabs Structure Elucidator or Mestrelab Mnova) or web-platform (cheminfo.org)
 
- 1D and 2D data processed automatically with ambiguities on the signal assignment and partial (for example not all signals are assigned) and/or ambiguous (due to lack of resolution, or other problems)
 
- The file may not contain the actual assignment, only the structure and the list of chemical shift (the assignment could be added by NMR tools).
 
- The data may come from scientific report i.e. the text providing the description of the spectra. It could be like the one of the text of the following figure
 
( from http://onlinelibrary.wiley.com/doi/10.1002/mrc.4527/full).
 
 
   
 
   
 
Scripts could be written to convert such a "pure text" description into .sdf file and include the .mol file.
 
Scripts could be written to convert such a "pure text" description into .sdf file and include the .mol file.
- For assignment work made with only "paper and pencil", a simple webtool allowing to draw a molecule, enter lists of signal names and 2D correlation could be easily made. We could consider to accept .pdf or pictures of the spectra when the original files do not exist anymore.
 
  
Synthetic/predicted data
+
For assignment work made with only "paper and pencil", a tool allowing to draw a molecule, enter lists of signal names and 2D correlations could be easily made. We could consider to accept .pdf or pictures of the spectra when the original files do not exist anymore.
 +
 
 +
==SDF files generated from calculated data==
 +
<span style="color:#FF8C00">  ''This is a working proposition to include JCAMP spectra in NMR Records / Discussion using Slack '' </span>
  
 
The NMR data may originate from DFT calculations or any other type of predictor of chemical shifts, and/or coupling. In such a case, a general tag is added to provide information about the software. For example:
 
The NMR data may originate from DFT calculations or any other type of predictor of chemical shifts, and/or coupling. In such a case, a general tag is added to provide information about the software. For example:
Line 409: Line 151:
 
  Version=...
 
  Version=...
  
==Literature data==
+
References to addition files located in a folder called "DFT_GIAO_calculations" could be added. They could include:
 +
-the equation used to convert sheelding in chemical shifts (for 1H, 13C)
  
When the NMR data originate from publications, a reference to the published paper/book/thesis are given in the NMREDATA_LITERATURE tag.
+
csH=((Sxx+Syy+Szz)/3-60)*0.98+5"
 +
csC=((Sxx+Syy+Szz)/3-130)*1.1+3.1"
 +
JHH=((???)/3-130)*1.1+3.1"
  
  >  <NMREDATA_LITERATURE>% This was rename from NMREDATA_ORIGN
+
-the list of conformers calculated, their energies ?, the Boltzman ratio ?
 +
 
 +
-all 3D structures of the conformations in a conformations.sdf file (not the main NMReDATA .sdf file). The main .sdf file can contain the 3D structure of the lowest energy conformation and the flat structure only.
 +
 
 +
-all the output files of the shielding (and coupling) calculations ? Also the goemetry optimization ?
 +
 
 +
Examples (tentative) of .nmredata.sdf files originating from gaussian can be found here: [https://www.dropbox.com/sh/yfx82u28qpitiuz/AACVJ8J1Ozgzq82I4GQzXrwUa?dl=0 androsten data]
 +
 
 +
==SDF files generated from literature data==
 +
<span style="color:#FF8C00">  ''This is a working proposition under review '' </span>
 +
 
 +
When the NMR data originate from publications, a reference to the published paper/book/thesis is given in the NMREDATA_LITERATURE tag.
 +
 
 +
  >  <NMREDATA_LITERATURE>
 
  Source=Journal
 
  Source=Journal
 
  DOI=DOI_HERE (if Reference field is DOI specify it here)
 
  DOI=DOI_HERE (if Reference field is DOI specify it here)
 
  CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)
 
  CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)
  
  >  <NMREDATA_LITERATURE>%
+
  >  <NMREDATA_LITERATURE>
 
  Source=Book
 
  Source=Book
 
  ISBN=ISBN_HERE (if Reference field is DOI specify it here)
 
  ISBN=ISBN_HERE (if Reference field is DOI specify it here)
 
  CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)
 
  CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)
  
  >  <NMREDATA_LITERATURE>%
+
  >  <NMREDATA_LITERATURE>  
 
  Source=Thesis
 
  Source=Thesis
 
  Thesis=HTML link here (if available if not "LastName, Firstname(s), institution providing the degree, city, country, year of publication.
 
  Thesis=HTML link here (if available if not "LastName, Firstname(s), institution providing the degree, city, country, year of publication.
 
  CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)
 
  CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)
  
For revised/update data
+
==SDF files generated after revision of existing SDF files==
  
Assignment records may be generated after revision from experimental, literature, prediction data, etc. Ideally, the original .sdf files should be also generated to facilitate comparison or exists somewhere and be referred to. In both cases reference should be given.
+
Assignment records may be generated after revision from experimental, literature, prediction data, etc. Ideally, the original .sdf files should also be generated to facilitate comparison or exist somewhere and be referred to. In both cases reference should be given.
  
 
  >  <NMREDATA_UPDATE>
 
  >  <NMREDATA_UPDATE>
Line 438: Line 196:
 
  Correction="fixed assignments of C(13) and C(15)"
 
  Correction="fixed assignments of C(13) and C(15)"
 
This is also to be refined according to future developments.
 
This is also to be refined according to future developments.
 
  
= Problems related to symmetry =
+
= Concerning symmetry =
 +
== Magnetic non-equivalence ==
 +
For symmetrical molecules a difficulty may arise to code coupling and 2D correlations.
  
This section is tentative... to be worked on in the future....
+
Reminder: Couplings are not directly associated to atoms, but to labels (in the NMREDATA_ASSIGNMENT tag). Labels are associated to one or more atoms (in case of symmetry/fast rotation, etc.).  
  
For symmetrical molecules a difficulty arises to code coupling and 2D correlations.
+
Example of difficulty/solution concerning scalar coupling: For the <sup>1</sup>H spectrum of 1, 2 dichlorobenzene, we have two multiplets in the 1D <sup>1</sup>H spectrum (two different protons in an AA’XX’ system) so if the SDF file includes two labels (one for A and one for X, each pointing to two atoms), in principle one can only give one coupling: the J<sub>A,X</sub>  (no J<sub>A,A</sub> or J<sub>A,X'</sub>). But if one desires to specify all the couplings, give two different "labels" to A and A' (each pointing to only one atom), so that different coupling can be given for J<sub>A,X</sub>, J<sub>A',X</sub>,J<sub>A,A'</sub>, J<sub>X,X'</sub>. This may be desired so that the 1D spectrum can be simulated with the correct non-equivalence effect.
1) Problem for coupling: For the 1H spectrum of 1, 2 dichlorobenzene, we have two signals (two different protons in an AA’XX’ system) so if the SDF file includes two signals  (one for A and one for X), in principle one can only give one coupling: the J(A,X). But we should be able to give other coupling constants with respect to the prime H. When only one symmetry property is present, it may not be too difficult to include in the format a manner to provide pairs of couplings instead of only one, but with more than one symmetry, it would become complicated…
 
2) Problem for correlations. Consider 1, 4 dichlorobenzene, a 3J(C,H) HMBC correlation will be visible between a proton that seems to be the directly bound-carbon. Because the carbons 1J and 3J bond, relative to a proton are symmetrical.
 
  
We have three different possibilities:
+
== HMBC correlations in symmetrical molecules ==
1) Ignore the problem. It may not be so serious in fact. In systems with non-magnetically equivalent spins, the coupling structures are complex and the coupling will probably not be measured in routine exp. Concerning the HMBC correlation, the HMBC correlation will be ambiguous and it will be to the person/software checking consistency, i.e. to see that when signals correspond to more than one proton or more than one carbone, it suffice that one of the possible combination of Hortho, H’ortho and Cortho and C’ortho corresponds to 3JCH and the check is passed… even if it also pass a check for 1JCH.
 
2) Duplicate all signals (or the subset with symmetry). If we list two signals (with two different labels) for Hortho and H’ortho, then we will have no problem anymore with coupling (one will be able to give a J(A, X) and (JAX’) and ambiguous correlations (they will be OK) . But the problem may be that for any chemical shifts, there will always be two labels/spins and that may cause confusion for the assignment software. The complications may be worse than the problem.
 
3) Try to face the problem and develop a serious method to include symmetry… Could be the object of future work.
 
5 TAG names for spectra
 
= structure of tags =
 
The structure of the name of the SD tag of spectra is constructed as follows. It describes the pulse sequence.
 
1) The number of dimensions is given (e.g. “2D_...”)
 
2) Follows, the isotope of the first indirect dimension (e.g. “..._13C_...”)
 
3) Follows the code of the mixing to the next dimension (e.g. “..._1J_...”).
 
4) Finally the detected isotope is given. (e.g. “..._1H”).
 
the TAG of the HSQC is therefore “<NMREDATA_2D_13C_1J_1H>”
 
Mixing can be:
 
1J for one bond (typ. HSQC)
 
NJ (multiple bound J, for cosy, hmbc)
 
TJ TOCSY
 
etc. (see list below for more details).
 
  
For J-resolved and related experiments (DIAG, δ-resolved) where the indirect dimension is not a chemical shift (no correlation present), only the detected isotope is given (<2D_1H>). The spectrum is described as a 1D 1H spectrum (providing chemical shift, couplings, etc.).
+
Consider the following molecules:
  
All tag have “NMREDATA_” before the TAG names listed below
+
[[File:Hmbc sym.png]]
+
 
 +
A <sup>3</sup>J<sub>C,H</sub> HMBC correlation will be visible between the proton '''''a''''' and C(1) that seems to be the directly-bound carbon. Because the carbons <sup>1</sup>J and <sup>3</sup>J bond, relative to a proton are symmetrical. A software may see the correlation as <sup>1</sup>J, but, it should be able to analyse the NMREDATA_ASSIGNMENT tag and see that '''''a''''' and C(1) are pointing to two atoms, and that the correlation may correspond to any combination of the four possible pairs. Two pairs will seem as the actual <sup>3</sup>J and two as the <sup>1</sup>J.
 +
 
 +
= Why not add more data in NMReDATA tags? =
 +
 
 +
We consider that our task is to focus on NMR data. But SDF files could (and probably should!) also include other experimental data such as:
 +
 
 +
1) The origin of the molecule. This may include the extraction method and the plant it originates from, in phytochemistry, or the reaction producing it.
 +
 
 +
2) MS data
 +
 
 +
3) other spectral data
  
P.S. Some names are somewhat tentative. We don’t necessarily mean to define 3D spectra here or projections of 3D to 2D (HSQC-TOCSY). The list is mostly to test the ability of the format to list as many experiments as possible with the same logic.
+
In principle authors can add any tag provided they have tools to do it and requests from the Journals... such data could have the following form...
  
== New section ==
+
The software producing SDF files including NMReDATA, should read SDF files and write SDF files only adding (or modifying/reviewing) the NMReDATA data. Any other SDF tags should be passed from the file which is read to the file which is generated.
'''MediaWiki has been successfully installed.'''
 
  
Consult the [http://meta.wikimedia.org/wiki/Help:Contents User's Guide] for information on using the wiki software.
+
<span style="color:#FF0000"> <small><small>
 +
THIS SITE HAS BEEN VANDALIZED. SINCE THEN, AS A PRECAUTION, I BLOCKED CONFIRMED USERS TO MODIFIY THE CONTENT OF THIS SITE AND ADD NEW PAGES.
 +
</small></small></span>
  
== Getting started ==
+
<span style="color:#FF0000"> <small><small>
* [http://www.mediawiki.org/wiki/Manual:Configuration_settings Configuration settings list]
+
PLEASE LET ME KNOW BY E-MAIL IF/WHEN YOU WANT TO MAKE CHANGES TO THIS WIKI SO THAT I ALLOW (TEMPORARILY) MODIFICATIONS AGAIN. </small></small></span>
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]
 
* [https://lists.wikimedia.org/mailman/listinfo/mediawiki-announce MediaWiki release mailing list]
 

Revision as of 15:36, 18 January 2019

Quick links

Link to the MRC article on the NMReDATA format.

Direct link to page describing the format of the NMREDATA tags.

List of compatible software

Tentative instruction for journal submission of NMReDATA.

Discussion on future versions of the NMReDATA format

Introduction

The NMReDATA working group decided to include data extracted from NMR spectra of small molecules in SDF files using SD tags.

More details about SDF files!

An important task of the group is to define the format of the content of the "<NMREDATA_...>" tags. More details here!.

The version 1.0 will be decided in September at the "Round table" of the Smash 2017 conference at Baveno, Italy.

The SDF file alone (that is without the spectra) cannot be used to verify that the assignment corresponds to the spectra. It is therefore important to always have the spectra with the SDF file! We call "NMR Record" the combination of the spectra and the SDF file.

NMR records

We call "NMR record", a folder (or .zip file including the folder) or a database record including:

1) All the NMR spectra (including FID, acquisition and processing parameters). The format of these data is as produced by the manufacturer of the instrument which acquired the data. That means that software generating the data either has these crude data available or it will ask the user to point to the crude data in order to include them in the NMR record.

2) The SDF file including the NMReDATA (.nmredata.sdf file)

NMR record

A more detailed pictorial representation of NMR record and example of SDF file presented in the poster presented in July at the Euromar 2017. Note: The NMREDATA tag "SIGNALS" was renamed "ASSIGNMENT" in Version 0.98.

NMR records will be requested by Magnetic Resonance in Chemistry from 2018 on. The editors of software (ADC/Labs, Bruker, cheminfo, Mestrelab) will be ready by the end of 2017 to produce MNR records for submission to MRC.

Records will be either analysed on web pages, or downloaded, and the nmredata.sdf file opened by the software which will access automatically to the associated spectra.

The full description can be found in the NMReDATA tag format page.

An example of .nmredata.sdf files with the spectra can be found here

Current version of the format of NMReDATA

The format can be found here : NMReDATA tag format

A small set of simple examples of .nmredata.sdf files can be found here (Version 0.95). (Note: the field called "NMREDATA_SIGNALS" was renamed "NMREDATA_ASSIGNMENT" in V 0.98). It includes .nmredata.sdf files for ethanol with diverse formats (explicit or implicit hydrogen atoms, etc.)

Versions

Changes to Versions 1.1

Addition of backslash at the end of the line in text of the NMReDATA tags. (This is to avoid the problem that CDK librairies ignore the NewLine character (ASCII 10). Since we need a line separator, we use backslash + NewLine in NMReDATA tags).

Comparison of assigned data

This is a working proposition

A tag comparing the data in the current file with those from other files (with the same labels for the assignement) > <NMREDATA_COMPARISON>

Externaldata1=... ref to an external .sdf file from the current or external record.
Externaldata2=...  
H1 1.54 1.50 1.66 (the first value is the reference value in NMREDATA_ASSIGNMENT tag of the current file the second from Externaldata1, etc.)
H2 1.54 1.50 1.66
Chi2 0.3 0.5 (these could be the chi squared of the reference with the external data )

Coupling could also be compared...

> <NMREDATA_COMPARISON>

Externaldata1=... ref to an external .sdf file from the current or external record.
Externaldata2=...
J(H1,H2) 1.54 1.50 1.66 (the first value is the reference value in NMREDATA_ASSIGNMENT tag of the current file the second from Externaldata1, etc.)
J(H1,H3) 1.54 1.50 1.66
Chi2 0.3 0.5 (these could be the chi squared of the reference with the external data)

Other comparison could be made, for example from signal intensities in HSQC spectra

> <NMREDATA_COMPARISON>

Externaldata1=... ref to an external .sdf file from the current or external record.
Externaldata2=... 
I(NMREDATA_13C_1J_1H,HC1,H1) 122 154 143 (the first value is the reference value in NMREDATA_ASSIGNMENT tag of the current file the second from Externaldata1, etc.)
I(NMREDATA_13C_1J_1H,HC2,H2) 132 151 163 
Chi2 5.3 3.5 (these could be the chi squared)

Certification

When the assignment is made using a computer-assisted manner, the software may want to add a certification of the validity of the data. This should be (up to the manufacturers) to somehow encode it to make it impossible to forge the certification (using hashtag, etc. ?) Certificates TAGS could be listed at the end of the .sdf file. They can originate from the CASE software or from the database hosting the data and spectra, from the journal (to say data were peer-reviewed). They can be cumulated. If the text of the .sdf file needs to be hashed for certification, the list of TAG used for hashing could be listed.

To be refined by the specialists!

>  <NMREDATA_CERTIFICATION>
Software=CMC_assist\
Author=Bruker\
Full_report_tag=CMC4.6_REPOR \
Confidence_level=4.6\
Confidence_level_certificate=”ADFS678AG67DFG6SD5F7GS5DFGSD8F5GSD7FG7”  \   
Verification_certificate=http:/...\
Software=LSD_V3.0\
Author=JMN\
Unique_solution=YES\
Unique_solution_certificate=”ADFS678AG67DFG6SD5F7GS5DFGSD8F5GSD7FG7”\
etc.

This is only a very vague example. The uniqueness of the structure proposed may be understood in the sense of J.-M. Nuzillard’s Logic for Structure Determination (LSD) tool. Software producers can include here the specifications of their product. Multiple certification can be listed one after the other. The “Software=...” assignment separates them all in the same <CERTIFICATION> tag.

Role(s) and scope of the “assignment records”

The NMR record can be generated from experimental data (this is how the format was designed), but data may also originate from simulations, predictions, etc.

Tools to compare, evaluate, validate, and check consistency of “assignment records” will certainly be developed.

Assignment records can be generated by commercial software, but also by diverse tools analysing NMR data, homemade processing tools, simulation software, etc. This is why it is important to have a format of data including a maximum of options to be as flexible as possible, even if not all possible uses are clearly defined and used immediately. Ideally, the .sdf files should be converted into other file formats or spectral description without loss.

Multiple records

We should see as an advantage if databases include multiple "assignment records" associated to the same molecule or the same set of NMR spectra. Some could be old, originating from incomplete literature data. Others could include errors because they originate from bulk data processed automatically. But finally, a computer could verify or create a robustly validated record combining all the other data. Aggregated record could be generated by NMR software/database scoring available data for consistency, calculated chemical shifts and spectral simulations. They could refine chemical shifts and couplings, etc.

SDF files generated from experimental data

When the NMR data originate from experimental spectra, they may be quite crude (simple automated integration, peak-picking). At the other extreme, the data may follow complex automated or careful and manual expert analysis. The NMReDATA must have the flexibility to code diverse quality of data: They may be partial, incomplete, contain inconsistencies, impossible features, etc.

- only 1D 1H NMR data (with or without integration, coupling, etc.).

- only 1D 13C data (just from a simple peak peaking)

- only 1D data but for multiple isotopes (from NMRshiftDB ?)

- full analysis based on computer-assisted software (such as ACDLabs Structure Elucidator, Bruker CMC-se or Mestrelab Mnova) or web platform (such as cheminfo.org).

- 1D and 2D data processed automatically with ambiguities on the signal assignment and partial (for example not all signals are assigned) and/or ambiguous (due to lack of resolution, or other problems)

- The file may not contain the actual assignment, only the structure and the list of chemical shift (the assignment could be added by NMR tools).

- The data may come from a scientific report i.e. the text providing the description of the spectra.

Scripts could be written to convert such a "pure text" description into .sdf file and include the .mol file.

For assignment work made with only "paper and pencil", a tool allowing to draw a molecule, enter lists of signal names and 2D correlations could be easily made. We could consider to accept .pdf or pictures of the spectra when the original files do not exist anymore.

SDF files generated from calculated data

This is a working proposition to include JCAMP spectra in NMR Records / Discussion using Slack

The NMR data may originate from DFT calculations or any other type of predictor of chemical shifts, and/or coupling. In such a case, a general tag is added to provide information about the software. For example:

>  <NMREDATA_ORIGIN>
Source=Calculation
method=DFT
Geometry=method/basis set
Shielding=method_basis set
Coupling=method_basis set
Software=...
Version=...

References to addition files located in a folder called "DFT_GIAO_calculations" could be added. They could include: -the equation used to convert sheelding in chemical shifts (for 1H, 13C)

csH=((Sxx+Syy+Szz)/3-60)*0.98+5"
csC=((Sxx+Syy+Szz)/3-130)*1.1+3.1" 
JHH=((???)/3-130)*1.1+3.1"

-the list of conformers calculated, their energies ?, the Boltzman ratio ?

-all 3D structures of the conformations in a conformations.sdf file (not the main NMReDATA .sdf file). The main .sdf file can contain the 3D structure of the lowest energy conformation and the flat structure only.

-all the output files of the shielding (and coupling) calculations ? Also the goemetry optimization ?

Examples (tentative) of .nmredata.sdf files originating from gaussian can be found here: androsten data

SDF files generated from literature data

This is a working proposition under review

When the NMR data originate from publications, a reference to the published paper/book/thesis is given in the NMREDATA_LITERATURE tag.

>  <NMREDATA_LITERATURE>
Source=Journal
DOI=DOI_HERE (if Reference field is DOI specify it here)
CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)
>  <NMREDATA_LITERATURE>
Source=Book
ISBN=ISBN_HERE (if Reference field is DOI specify it here)
CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)
>  <NMREDATA_LITERATURE> 
Source=Thesis
Thesis=HTML link here (if available if not "LastName, Firstname(s), institution providing the degree, city, country, year of publication.
CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)

SDF files generated after revision of existing SDF files

Assignment records may be generated after revision from experimental, literature, prediction data, etc. Ideally, the original .sdf files should also be generated to facilitate comparison or exist somewhere and be referred to. In both cases reference should be given.

>  <NMREDATA_UPDATE>
Source=Record
Record_number=ref_to_the_original_record (multiple reference is allowed for aggregation of records – separated by “,”).
Date =date.... standard format for date
Correction="fixed assignments of C(13) and C(15)"

This is also to be refined according to future developments.

Concerning symmetry

Magnetic non-equivalence

For symmetrical molecules a difficulty may arise to code coupling and 2D correlations.

Reminder: Couplings are not directly associated to atoms, but to labels (in the NMREDATA_ASSIGNMENT tag). Labels are associated to one or more atoms (in case of symmetry/fast rotation, etc.).

Example of difficulty/solution concerning scalar coupling: For the 1H spectrum of 1, 2 dichlorobenzene, we have two multiplets in the 1D 1H spectrum (two different protons in an AA’XX’ system) so if the SDF file includes two labels (one for A and one for X, each pointing to two atoms), in principle one can only give one coupling: the JA,X (no JA,A or JA,X'). But if one desires to specify all the couplings, give two different "labels" to A and A' (each pointing to only one atom), so that different coupling can be given for JA,X, JA',X,JA,A', JX,X'. This may be desired so that the 1D spectrum can be simulated with the correct non-equivalence effect.

HMBC correlations in symmetrical molecules

Consider the following molecules:

Hmbc sym.png

A 3JC,H HMBC correlation will be visible between the proton a and C(1) that seems to be the directly-bound carbon. Because the carbons 1J and 3J bond, relative to a proton are symmetrical. A software may see the correlation as 1J, but, it should be able to analyse the NMREDATA_ASSIGNMENT tag and see that a and C(1) are pointing to two atoms, and that the correlation may correspond to any combination of the four possible pairs. Two pairs will seem as the actual 3J and two as the 1J.

Why not add more data in NMReDATA tags?

We consider that our task is to focus on NMR data. But SDF files could (and probably should!) also include other experimental data such as:

1) The origin of the molecule. This may include the extraction method and the plant it originates from, in phytochemistry, or the reaction producing it.

2) MS data

3) other spectral data

In principle authors can add any tag provided they have tools to do it and requests from the Journals... such data could have the following form...

The software producing SDF files including NMReDATA, should read SDF files and write SDF files only adding (or modifying/reviewing) the NMReDATA data. Any other SDF tags should be passed from the file which is read to the file which is generated.

THIS SITE HAS BEEN VANDALIZED. SINCE THEN, AS A PRECAUTION, I BLOCKED CONFIRMED USERS TO MODIFIY THE CONTENT OF THIS SITE AND ADD NEW PAGES.

PLEASE LET ME KNOW BY E-MAIL IF/WHEN YOU WANT TO MAKE CHANGES TO THIS WIKI SO THAT I ALLOW (TEMPORARILY) MODIFICATIONS AGAIN.