Difference between revisions of "Main Page"

From NMReDATA
Jump to: navigation, search
(etc.)
(2D NOESY)
Line 263: Line 263:
 
This allows reporting the result of the analysis of high-resolution DQF-COSY spectra or soft-COSY spectra. In this manner, high-resolution COSY with coupling structures can be simulated for coupling constant verification (refinement, including second-order effects, etc.). Note that couplings can be obtained from the description of the 1D spectrum if they were determined there. They should be included in the 2D fields only if they were measured in the 2D spectrum. Note that, the program generating these data should also generate a <J> tag (see above) to compile J-couplings and present them in a manner that it is easier to read.
 
This allows reporting the result of the analysis of high-resolution DQF-COSY spectra or soft-COSY spectra. In this manner, high-resolution COSY with coupling structures can be simulated for coupling constant verification (refinement, including second-order effects, etc.). Note that couplings can be obtained from the description of the 1D spectrum if they were determined there. They should be included in the 2D fields only if they were measured in the 2D spectrum. Note that, the program generating these data should also generate a <J> tag (see above) to compile J-couplings and present them in a manner that it is easier to read.
  
==== 2D NOESY ====
+
==== 2D NOESY <NMREDATA_2D_1H_D_1H>====
 
NOESY spectra will be described as:
 
NOESY spectra will be described as:
>  <NMREDATA_2D_1H_D_1H>
+
>  <NMREDATA_2D_1H_D_1H>
Larmor=500
+
Larmor=500
a/b, I=1.2  
+
a/b, I=1.2  
b/c, I=1.2  
+
b/c, I=1.2  
 
+
== other ==
 
Heteronuclear 1H-19F data would be in a tag called if 1H is detected and 19F in F1:
 
Heteronuclear 1H-19F data would be in a tag called if 1H is detected and 19F in F1:
>  <NMREDATA_2D_19F_D_1H>
+
>  <NMREDATA_2D_19F_D_1H>
  
 
For all experimental spectra, the last line refers to the spectrum stored in an openly and electronically accessible NMR database. (By spectrum, we mean the actual data of the spectrum (“2rr” , but also the acquisition and processing parameters “fid/ser”, “acqus”, “procs”, “proc2s”, etc.).
 
For all experimental spectra, the last line refers to the spectrum stored in an openly and electronically accessible NMR database. (By spectrum, we mean the actual data of the spectrum (“2rr” , but also the acquisition and processing parameters “fid/ser”, “acqus”, “procs”, “proc2s”, etc.).

Revision as of 10:03, 13 July 2017

Format and content of .sdf tag including NMReDATA

See nmredata.org for details Version 0.98

New section

For more information on SDF format see: For more information on SDF tags and how to read/write them see:

We could add many this is in red this is in grees this is in blus more information about the acquisition conditions of the spectra, but these are the only ones that are really essential (in the sense required by journals for a long time and commonly given in report, papers, etc.) when giving a description of a spectrum. Additional information can/could be retrieved from the experimental parameters associated to the NMR spectra if needed. Indeed, there should always be a link to the spectra (see below), it could be “followed” to get the other parameters.

Comments and changes in this version:

Concerning the intensity in 2D spectra, it was not a good idea to ask to strictly give the intensity at the coordinates of the peak. This may fall in the middle of a souble (in a HSQC for example). So this was slightly rephrased in this version

1) For all 2D spectra, the intensity of the spectrum at (or very close to) the coordinates of the correlated peaks should be given when the spectrum is available. If the signals has a shape such that the intensity is zero at that center (phase sensitive COSY, for in the middle of a doublet in HSQC, for example) the intensity can be measure at the maximum amplitude of the multiplet. This intensity is not pretending to be “quantitative”. Optional integration of the volume is possible. Version

Generat structure of SDF files

Structure

The first part of SDF files include the structure (in the .mol format)

Tags

Follow a number of tags. When opening .sdf files, most chemical structure editor ignore the tags. But specilized sofware can manage them. ... to be updated ...

Generat structure of SDF files including NMReDATA

NMReDATA tags

NMReDATA are included using a set of tags.

header tags

<NMREDATA_VERSION>

This tag is used to specify the “VERSION” of the file format. Current version : 1.0

>  <NMREDATA_VERSION>;mandatory
1.0

<NMREDATA_LEVEL>

This tag is used to specify the level of complexity of the data:

1: not using ambiguities 
2: using lists of interchangeable assignment 
3: using ambiguous assignment (with or without ambiguities)

<NMREDATA_ID> (for database)

optional but imposed when data are originating from database. When copied from database to database, multiple ID's may be included. These will be defined by database manger and software

DB_ID: the code or number is assigned by the hosting database
Title: Full analysis of whatever from methanol extract of leafs  
Comment: Here more details could be given on the record.
Comment1: Here more details could be given on the record.
Comment2: Here more details could be given on the record.
Comment3: Here more details could be given on the record.
AUTHOR:Doe John, University of Tougalpa, Swinerland (optional)
ORIGIN_ONE=2345627486 (could be about the sample name)
ORIGIN_TWO=323212KKDKKS (could give a date or other reference)
Title_L1=after sep. hplc (this could be extracted from the first line of the title in the 1H spectrum)

One or more identifier can be given under "ID". The ID will be generated by the software generating data and/or the database storing the data, etc. There may be more than one ID (for example one from the software generating it, one from the university labelling the origin of the data, one from the database, one from the publisher of the associated data, etc.) it is to the “generator” of the file to decide if/how to make it unique if desired. InChIKey/SMILES could be given if the soft generating the data is able to specify it. CAS-number if it already exists.

<NMREDATA_SMILES>;optional be strongly encouraged

here comes the smiles code ;mandatory with explicit H... see with JMN for more details ... The reason to includ the smiles in the NMReDATA is that when given with protons, it can be used to generate pure text MNR description of spectra (under elaboration with JMN)


The solvent is specified with the “SOLVENT” tags.

<NMREDATA_SOLVENT>;mandatory

CDCl3 For mixture of solvents the tag would be:

>  <NMREDATA_SOLVENT>
CDCl3/DMSO 80:20
CDCl3/DMSO/D2O 80:10:10

The proportions are given in % volumes.

In the case of RDC measurements, the medium used can be specified in the line following the name of the solvent.

>  <NMREDATA_SOLVENT>
CDCl3
PBLG

The quantity of the orientation medium should be given using usual units (this is vague, but we cannot do better). Agarose used at 1% mass ratio:

>  <NMREDATA_SOLVENT>
D2O
Agarose 1%

For solid-states samples:

>  <NMREDATA_SOLVENT >
solid

<NMREDATA_CONCENTRATION>; optional

(only “mM” are allowed, but the unit is specified)

<NMREDATA_CONCENTRATION>
12.3 mM

<NMREDATA_TEMPERATURE>

optional but very strongly encouraged (only K are allowed, but unit is given) 298.0 K

Assignment tag

Labelling/linking signals to atoms of molecules

In liquid-state NMR, some signals are degenerate. This means that multiple atoms can contribute to the same signal. For example all protons of a methyl group. We will call "signals", each atom having a different “label”, the group of atoms having a common chemical shift because of symmetry properties (not by accident). In principle the labels could be included in the .mol file part (this is part of the definition of .mol files), but this may cause compatibility problems. For example Chemdraw does not recognize well the atom types when they have been given a name and marks them as red (as if the atoms were unknown causing hybridization problems). Because of this problem, we will NOT use the labels of the “.mol” file, but list them in a specific RD tag called NMREDATA _SIGNALS.

For each signal, we first give the label of the signal. Chemical shifts follow the label. Finally, we list the atom(s) it refers to in the structure file. For the signals, the atom numbers start with 1 and go through all the atoms in the molfile. NMR signals always have a chemical shift associated to it. Not all atoms of the molecules will be listed in the NMR_SIGNALS tag, for example if they were not assigned or have no NMR signal (like O, N, or other isotopes for which the spectra were not recorded). For example, for HO-CH2-CH3, with protons labelled a, b and c, and carbons A and B, (names could be different, I don’t mean that labels have to be called using letters for protons, and numbers for carbons-this is just an example) the tag would be:

<NMREDATA_SIGNALS>

>  <NMREDATA_SIGNALS>; ethanol with explicit hydrogen atoms
A, 48.301, 1 ;A corresponds to the carbon of CH2
B, 20.322, 2 ;B corresponds to the carbon of CH3
a, 2.610, 3 ;a corresponds to the hydrogen atom of the OH
b, 4.802, 4, 5; b corresponds to the hydrogen atoms of the CH2
c, 1.401, 6, 7, 8
Ex, 3.6, 9
(optional: list of interchangeable labels – when level>1 see below)

We recommend to provide a label (for example "Ex") and designate all the H of the OH, NH, etc. that are quickly exchanging (for example the OH of glucose in D2O).

Important note: In general we use ", " (comma + space) as separator. The space after the “,” may become optional in future version. Please generate files in a manner that could not use it (replace “, “ with a variable in your code so that you can easily change to “,” (no space) and be ready to read or write files with and without the space).

For structures where the hydrogen atoms are implit, the reference to hydrogen is made by adding "H" right before the atom number to ditinguish the heavy atom from the H bound to it. When more than one hydrogen atom is implicit, the second is also labeled with "H". It is understood that with implicit hydrogen, the two hydrogens atoms will not be distinguished. Assignment of diastereotopic protons will not be possible. To avoid this problem, use explicit mol structures with defined chirality.

>  <NMR_SIGNALS>;ethanol
A, 48.301, 1 ;the label "A" carresponds to the atom one which is the carbon of the CH2
B, 20.322, 2 ;atom two is the carbon of the CH3
a, 2.610, H3 ;"H3" refers to the hydrogen atom of atom 3 (the oxygen)
b, 4.802, H1 ;"H1" refers to the hydrogen atoms of atom 1 (of the CH2)
c, 1.401, H2

(atom 3 would be the oxygen of ethanol) Only explicit H are allowed. No monomer unit, no “R” group.

Labels can, in principle contain any character and be of any length. We normally exclude comma (,) because they are used as field separator (see below) and the <EOL> and <LF>. These labels can be defined as the chemist wishes. If a database manager wants to change the names of the labels to make them “canonical” or use any norm, up to him, but we accept anything. In principle, the labels should correspond to the ones used in the manuscript of the paper submitted (when we have reviewing mind). In the case of simple numbering, it could be C1, etc. and H-C1, H’-C1, etc. But this is up to the author/file generator to satisfy the format of the journal/database and the readability of the numbering – possibly IUPAC and the need to distinguish what needs to be distinguished (like diastereotopic protons). No ranges are allowed, only single floating point should be specified here) This is necessary because 1D 1H spectra may not be included in the set of spectra and, if they are, signals overlap may result in vague describe such as region analysed as “m” with many signals in. We propose to call “Ex” the spins that are exchanging with solvent (typically OH, NH, etc. in protic solvents) but this is not mandatory. The atoms that are not assigned to at least one NMR signals in the spectra are not listed. (H, C that were not assigned, heteronuclei that are NMR passive (say S, O, N) or for which the spectrum was not recorded (19F, 31P, etc.) .

When using level>1, we allow ave to allow the possibility of ambiguous assignment. We distinguish two type of ambiguities. If two or more labels may be interchanged, they are listed before the end of the list, before the empty line with

Interchangeable=a, b

means that the assignment of a and b may be interchanged.

Interchangeable=(a, CA), (b, CB)

means that “a” together with “CA” may be interchanged with “b” together with “CB”. (this may be if two O-Me are not assigned unambiguously. We may know that carbon “CA” is bound to proton “a” and carbon “CB” to “b” (from HSQC), but maybe we do not know which of the two Me is bound where (no HMBC signal). There may be multiple “Interchangeable” lines.

Note that displaying/working with data with such interchangeable signals may be a challenge. When software cannot take into account exchangeable assignment, they should generate a warning message, read the standard form or, better, propose the choice among the different possibilities.

<NMREDATA_J>

If scalar couplings were measured and assigned, a <J> tag should be generated to list the couplings (the coupling network). It can contain only JHH, or only 1JCH, or a mix of any type of scalar coupling (the label indicates the isotope through the NMR_SIGNALS table – see above). > <NMREDATA_J> a, b, 7 A, a, 150.3 B, a, 7.5 This may include JHH (from 1D 1H, or from COSY) and/or JCH from RDC measurements, long range JCH from HMBC, etc. These values may also be listed in the fields of the individual spectra they originate from. In this “J” tag, there are “compiled” (with average for example when a JAX is not the same on the A and X multiplets or if couplings are present in 1D 1H and COSY). This can also include coupling to 19F, 31P if the label corresponds to such atom (see NMREDATA_SIGNALS table). Note that this table is compiling data from other spectra. It does not mean that coupling should not be listed in each spectrum where they are observed.

Spectral tags

There is one RD tag per spectrum. (See section 5 for nomenclature.)

1D spectra

<NMREDATA_1D_1H>

The tag of a 1D 1H spectrum will be named “NMR_1D_1H”, the entire tag can look like this:

>  <NMREDATA_1D_1H>
Larmor=500.13
4.8, S=q, E=2, L=a, J=7; optional comment will be visible in the spectrum’s view
2.1, S=bs, E=1, L=b
1.5, S=t, E=3, L=c, J=7
for multiple coupling:
4.8, S=dd, E=1, L=a, J=9.3, 4.8

where S is for the structure of the signal (multiplicity – ‘s’, ’d’, ’dd’, ’td’, etc. add ‘b’ for broad- e.g. : ‘bs’ or ‘sb’ means broad singlet, ‘m’ for multiplet), E the integral (normalized or not – this may be crude experimental integral), L the label (according to list in the <NMR_SIGNALS> tag), J the coupling(s) in Hz, I intensity (in arbitrary unit). In fact only the first number (the chemical shift) is mandatory. The other fields are all optional. There may be more than one signal assigned to a chemical shift (or range of chemical shifts). They are simply listed with "," as separator.

7.2-7.6, E=5, L=H-C1, H-C2, C-C4

One reason for having chemical shifts listed in the <NMREDATA_SIGNALS> tag: is that signals may overlap and be given as a range in the 1H 1D spectrum, but may be clearly determined from HSQC, COSY, etc.

The order of the field is not fixed (“L=...” may come before “E=...”). If the couplings are assigned, the assignment can be specified using the labels of the spins partner in parenthesis. (See also, above the field <J>)

4.8, S=q, E=2, L=a, J=7(b) ;meaning J(a,b)=9.3 Hz
4.8, S=dd, E=1, L=a, J=9.3(b), 4.8(c) 

Results of relaxation measurements, T1, T2, etc. can be given in seconds as: 4.8, S=q, E=2, L=a, T1=0.7

For the results of diffusion measurements, D, etc. can be given in standard unit: 4.8, S=q, E=2, L=a, Diff=1.7e-11

Chemical shifts are given in ppm with four digits after the period (the usual 3 is not enough at high field and high resolution!), couplings in Hz with one digit after the period. Integrals can use any relative numbers, but preferably rescaled to correspond to the number of protons (1.0 for the reference) or to correspond to concentration when quantification was made (in mM). This will be very useful when analysing mixtures. Integrals should not be rounded up/down to the number of atom: the values should reflect the experimental values. If the 1D spectrum is homodecoupled at a given chemical shift or totally decoupled (pureshift) one of the two following lines should be added:

Decoupled=1H, 1.2 Decoupled=1H

If other spins are decoupled add: Decoupled=19F ...

For many reasons, we need to be able to have ambiguous assignments. In this case instead of the signal’s label alone we list the labels in parenthesis. For example: 4.8, L=(a, b) means that the signal at 4.8 is assigned either to a or b. We need this option to have the possibility to provide ambiguous data because they are quite common, in particular in 2D spectra. This possibility will cause difficulties when reading data for display, structure verification, etc. Programmers will decide what to do: ignore ambiguous assignment, try to resolve them, report them as such, warn about their presence, etc. But we need to be able to report ambiguous assignment because it is very common.

<NMREDATA_1D_13C> etc.

1D 13C spectra and other heteronuclear spectra (nuclei X)

When the 1D X spectrum is not a 1H-decoupled 13C (i.e. DEPT, APT, etc.) this is specified using an additional label “Sequence”:

>  <NMREDATA_1D_13C> or other isotope
Larmor=125
Decoupled=1H
Sequence=DEPT135 (or DEPT45)
51.812, I=-80.1
20.123, I=123.1

The peak intensity provides the signs of the DEPT-135 signal.

2D spectra

The tag label indicates that it is a 2D spectrum, the two isotope involved (starting with the indirectly detected one) and the type of mixing involved. (See section 5).

2D HSQC <NMREDATA_2D_13C_1J_1H>

> 2D HSQC <NMREDATA_2D_13C_1J_1H> Larmor=500 CorType=HSQC (COSY; HSQC; HMBC; ?? exact list is still to be defined. Contact Damien Jeannerat) Pulseprogram=XXX ; this is optional a/C1, I=1.2 b/48.43, I=1.2

The Larmor frequency is the one of the detected isotope (last in the tag label). “Types” and “Pulseprogram” can be specified. When signals are assigned, only their labels are given (no chemical shifts). If a crosspeak is reported without assignment or partial assignment, the chemical shift replaces the signal’s label. The intensity of the signals (following “I=”. This simply correspond to the intensity of the spectrum at (or very close to) the coordinates of the signals. They should be provided when possible (that is when the software has them accessible). But this is not part of the format. i.e. one should be able to read the .sdf files even in the absence of intensities. The intensities are given in any arbitrary unit (integer of floating point). If the signal has a shape such that the intensity is zero at that center (phase sensitive COSY, for in the middle of a doublet in HSQC, for example) the intensity can be the one at the maximum amplitude of the multiplet. This intensity is not pretending to be “quantitative”. Optional integration of the volume encouraged using “S=” but this requires some “analysis” of the peak shape.

2D HMBC <NMREDATA_2D_13C_NJ_1H>

Here is an example of HMBC data with two examples of ambiguous assignment (could also occur in clusters of peaks): > <NMREDATA_2D_13C_NJ_1H> Larmor=500.13 C1/a; optional comment will be visible in the spectrum’s view (C2,C3)/b, I=1.2 C2/(b,c) , I=1.2 C4,C5,C6)/(e,f) , I=1.2


for HETCOR, the tag label would be <2D_1H_1J_13C>. By default “2D” heteronuclear spectra are assumed to be isotope1-decoupled during evolution of isotope2 and vice versa. If an HSQC spectrum is recorded without 180 pulse during t1, or without 13C decoupling during t2, (for RDC measurements for example) the following lines should be added respectively: Nondecoupled=t1 Nondecoupled=t2 If non decoupled, or if the spectrum allows to measure couplings, the heteronuclear couplings may be listed as : a/C1, Ja=155

2D COSY <NMREDATA_2D_1H_NJ_1H>

See discussion of COSY spectra (below) why using “Ja”.

A COSY spectrum will be coded with, > <NMREDATA_NMR_2D_1H_NJ_1H> CorType=HSQC(COSY; HSQC; HMBC; ?? exact list is still to be defined. Consider this one tentative) Larmor=500 a/b, I=1.2, S=100 ; b/a, I=1.2 b/c, I=1.2 c/b, I=1.2

If couplings were measured from the cosy, the values should be specified a/b, Ja=5 where Ja means active coupling (present in f1 and f2) where J1 means passive coupling(s) (present in f1). where J2 means passive coupling(s) (present in f2).

Any number of couplings can be added. For example the A-X crosspeak with A and X coupling with M could be described as: A/X, Ja=5, J1=4, J2=2 Meaning that the active coupling JAX=5 Hz, the passive couplings JAM=4 Hz and JXM=2 Hz. If the assignment was made (as for 1D 1H), the assigned couplings are specified: A/X, Ja=5, J1=4(M), J2=2(M) A/X, Ja=5, J1=4(M), 6.1(K), J2=2(M), 3.1(K)

This allows reporting the result of the analysis of high-resolution DQF-COSY spectra or soft-COSY spectra. In this manner, high-resolution COSY with coupling structures can be simulated for coupling constant verification (refinement, including second-order effects, etc.). Note that couplings can be obtained from the description of the 1D spectrum if they were determined there. They should be included in the 2D fields only if they were measured in the 2D spectrum. Note that, the program generating these data should also generate a <J> tag (see above) to compile J-couplings and present them in a manner that it is easier to read.

2D NOESY <NMREDATA_2D_1H_D_1H>

NOESY spectra will be described as:

>  <NMREDATA_2D_1H_D_1H>
Larmor=500
a/b, I=1.2 
b/c, I=1.2 

other

Heteronuclear 1H-19F data would be in a tag called if 1H is detected and 19F in F1:

>  <NMREDATA_2D_19F_D_1H>

For all experimental spectra, the last line refers to the spectrum stored in an openly and electronically accessible NMR database. (By spectrum, we mean the actual data of the spectrum (“2rr” , but also the acquisition and processing parameters “fid/ser”, “acqus”, “procs”, “proc2s”, etc.). Spectrum_ID=HJK33HKJ22342 (mandatory - given by the database where it is stored!) Spectrum_Location=http://... (when NMReDATA and spectra are on database) Spectrum_Location =file:./nmr/10/1/pdata/1 (when the files of the spectra and the NMReDATA are in a folder) To be refined by specialists. But I think we should really make some efforts for this to be done and work!!

2.4 Origin of the sample To be abandoned....

In phytochemical papers, the origin of the sample should be specified but this cannot be our task to define this – this is not part of the NMReDATA initiative. In principle authors can add any tag provided they have tools to do it and requests from the Journals... such data could have the following form...

> <NATURAL_ORIGIN> Common_Name_English=flower Genus=Amaranthus %usually the part with a first capital letter Species=retroflexis %usually the part whit no first cap. letters. Class = Type =

Similarly, for synthesis, the source could be requested :

> <NMREDATA_SYNTHETIC_ORIGIN> Reaction Reference=http:// link to a database of reactions ??



3 Certification

When the assignment is made using a computer assisted manner, the software may want to add a certification of the validity of the data. This should be (up to the manufacturers) to somehow encode it to make it impossible to forge the certification (using hashtag, etc. ?) Certificates TAGS could be listed at the end of the .sdf file. They can originate from the CASE software or from the database hosting the data and spectra, from the journal (to say data were peered reviewed). They can be cumulated. If the text of the .sdf file needs to be hashed for certification, the list of TAG used for hashing could be listed. (I’m not sure what needs to be done to certify the validity of certificates. To be refined by the certificate specialists).

> <NMREDATA_CERTIFICATION> Software=CMC_assist Author=Bruker Confidence_level=4.6 Confidence_level_certificate=”ADFS678AG67DFG6SD5F7GS5DFGSD8F5GSD7FG7” Unique_solution=YES Unique_solution_certificate=”ADFS678AG67DFG6SD5F7GS5DFGSD8F5GSD7FG7” ETC...

This is only a very vague example. The uniqueness of the structure proposed may be understood in the sense of J.-M. Nuzillard’s LSD tool. Software producers can tell what needs to be done for their format. Multiple certification can be listed one after the other. The “Software=...” assignment separates them all in the same <CERTIFICATION> tag. 4 Role(s) and scope of the “assignment records”

The NMR record can be generated from experimental data but also from simulations, predictions, etc. Tools to compare, evaluate, validate, and check consistency of “assignment records” will certainly be developed. Assignment records can be generated by commercial software, but also by diverse tools analysing NMR data, homemade processing tools, simulation software, etc. This is why it is important to have a format of data including a maximum of options to be as flexible as possible, even if not all possible uses are clearly defined and used immediately. Ideally, the .sdf files should be converted into other file format or spectral description without loss.

We should see as an advantage if the databases include multiple "assignment records" associated to the same molecule. Some could be old, originating from, incomplete literature data. Others could include errors because they originate from bulk data processed automatically. But finally a computer could verify and nicely validated record combining all the other data. Aggregated record could be generated by NMR software/database scoring available data for consistency, calculated chemical shifts and spectral simulations. They could refine chemical shifts and couplings, etc.

Experimental data

When the NMR data originate from experimental spectra, they may be quite crude (simple automated integration, peak-picking) or follow complex automated or manual analysis. The data may be partial, incomplete, contain inconsistencies, impossible features, etc. The content may be diversely complex depending on the origin of the data: - only 1D 1H NMR data (with or without integration, coupling, etc). - only 1D 13C data (just from a simple peak peaking) - only 1D data but for multiple isotopes (from NMRshiftDB ?) - full analysis based on computer-assisted software (such as Bruker cmc-se ACDLabs Structure Elucidator or Mestrelab Mnova) or web-platform (cheminfo.org) - 1D and 2D data processed automatically with ambiguities on the signal assignment and partial (for example not all signals are assigned) and/or ambiguous (due to lack of resolution, or other problems) - The file may not contain the actual assignment, only the structure and the list of chemical shift (the assignment could be added by NMR tools). - The data may come from scientific report i.e. the text providing the description of the spectra. It could be like the one of the text of the following figure ( from http://onlinelibrary.wiley.com/doi/10.1002/mrc.4527/full).

Scripts could be written to convert such a "pure text" description into .sdf file and include the .mol file. - For assignment work made with only "paper and pencil", a simple webtool allowing to draw a molecule, enter lists of signal names and 2D correlation could be easily made. We could consider to accept .pdf or pictures of the spectra when the original files do not exist anymore.

Synthetic/predicted data

The NMR data may originate from DFT calculations or any other type of predictor of chemical shifts, and/or coupling. In such a case, a general tag is added to provide information about the software. For example:

> <NMREDATA_ORIGIN> Source=Calculation method=DFT Geometry=method/basis set Shielding=method_basis set Coupling=method_basis set Software=... Version=...

Literature data

When the NMR data originate from publications, a reference to the published paper/book/thesis are given in the NMREDATA_LITERATURE tag.

> <NMREDATA_LITERATURE>% This was rename from NMREDATA_ORIGN Source=Journal DOI=DOI_HERE (if Reference field is DOI specify it here) CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)

> <NMREDATA_LITERATURE>% Source=Book ISBN=ISBN_HERE (if Reference field is DOI specify it here) CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)

> <NMREDATA_LITERATURE>% Source=Thesis Thesis=HTML link here (if available if not "LastName, Firstname(s), institution providing the degree, city, country, year of publication. CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)

For revised/update data

Assignment records may be generated after revision from experimental, literature, prediction data, etc. Ideally, the original .sdf files should be also generated to facilitate comparison or exists somewhere and be referred to. In both cases reference should be given.

> <NMREDATA_UPDATE> Source=Record Record_number=ref_to_the_original_record (multiple reference is allowed for aggregation of records – separated by “,”). Date =date.... standard format for date Correction="fixed assignments of C(13) and C(15)" This is also to be refined according to future developments.  

4 Problems related to symmetry

This section is tentative... to be worked on in the future....

For symmetrical molecules a difficulty arises to code coupling and 2D correlations. 1) Problem for coupling: For the 1H spectrum of 1, 2 dichlorobenzene, we have two signals (two different protons in an AA’XX’ system) so if the SDF file includes two signals (one for A and one for X), in principle one can only give one coupling: the J(A,X). But we should be able to give other coupling constants with respect to the prime H. When only one symmetry property is present, it may not be too difficult to include in the format a manner to provide pairs of couplings instead of only one, but with more than one symmetry, it would become complicated… 2) Problem for correlations. Consider 1, 4 dichlorobenzene, a 3J(C,H) HMBC correlation will be visible between a proton that seems to be the directly bound-carbon. Because the carbons 1J and 3J bond, relative to a proton are symmetrical.

We have three different possibilities: 1) Ignore the problem. It may not be so serious in fact. In systems with non-magnetically equivalent spins, the coupling structures are complex and the coupling will probably not be measured in routine exp. Concerning the HMBC correlation, the HMBC correlation will be ambiguous and it will be to the person/software checking consistency, i.e. to see that when signals correspond to more than one proton or more than one carbone, it suffice that one of the possible combination of Hortho, H’ortho and Cortho and C’ortho corresponds to 3JCH and the check is passed… even if it also pass a check for 1JCH. 2) Duplicate all signals (or the subset with symmetry). If we list two signals (with two different labels) for Hortho and H’ortho, then we will have no problem anymore with coupling (one will be able to give a J(A, X) and (JAX’) and ambiguous correlations (they will be OK) . But the problem may be that for any chemical shifts, there will always be two labels/spins and that may cause confusion for the assignment software. The complications may be worse than the problem. 3) Try to face the problem and develop a serious method to include symmetry… Could be the object of future work. 5 TAG names for spectra

The structure of the name of the SD tag of spectra is constructed as follows. It describes the pulse sequence. 1) The number of dimensions is given (e.g. “2D_...”) 2) Follows, the isotope of the first indirect dimension (e.g. “..._13C_...”) 3) Follows the code of the mixing to the next dimension (e.g. “..._1J_...”). 4) Finally the detected isotope is given. (e.g. “..._1H”). the TAG of the HSQC is therefore “<NMREDATA_2D_13C_1J_1H>” Mixing can be: 1J for one bond (typ. HSQC) NJ (multiple bound J, for cosy, hmbc) TJ TOCSY etc. (see list below for more details).

For J-resolved and related experiments (DIAG, δ-resolved) where the indirect dimension is not a chemical shift (no correlation present), only the detected isotope is given (<2D_1H>). The spectrum is described as a 1D 1H spectrum (providing chemical shift, couplings, etc.).

All tag have “NMREDATA_” before the TAG names listed below


P.S. Some names are somewhat tentative. We don’t necessarily mean to define 3D spectra here or projections of 3D to 2D (HSQC-TOCSY). The list is mostly to test the ability of the format to list as many experiments as possible with the same logic.

New section

2

New section

MediaWiki has been successfully installed.

Consult the User's Guide for information on using the wiki software.

Getting started