Difference between revisions of "NMReDATA tag format"

From NMReDATA
Jump to: navigation, search
(Created page with "= Definition of the NMReDATA tags (V. 0.98)= == Header tags == We included in the header tags only the absolutely necessary information about the NMR dataset. We could add...")
 
Line 311: Line 311:
 
  b/c, I=1.2
 
  b/c, I=1.2
  
== other ==
+
==== 2D HOESY <NMREDATA_2D_19F_D_1H>====
Heteronuclear 1H-19F data would be in a tag called if 1H is detected and 19F in F1:
+
Heteronuclear <sub>19</sub>F-<sub>1</sub>H data would be in a tag called if <sub>1</sub>H is detected and <sub>19</sub>F in F1:
 
  >  <NMREDATA_2D_19F_D_1H>
 
  >  <NMREDATA_2D_19F_D_1H>

Revision as of 07:09, 28 July 2017

Definition of the NMReDATA tags (V. 0.98)

Header tags

We included in the header tags only the absolutely necessary information about the NMR dataset.

We could add many more information about the acquisition conditions of the spectra, but these are the only ones that are really essential (in the sense required by journals for a long time and commonly given in report, papers, etc.) when giving a description of a spectrum. Additional information can/could be retrieved from the experimental parameters associated to the NMR spectra if needed. Indeed, there should always be a link to the spectra (see below), it could be “followed” to get the other parameters.

<NMREDATA_VERSION>

This tag is used to specify the “VERSION” of the file format. Current version : 1.0

>  <NMREDATA_VERSION>;mandatory
1.0

<NMREDATA_LEVEL>

This tag is used to specify the level of complexity of the data In most cases, level 1 will be used when an assignement is complete. When the data contain no ambiguities in the assignement use:

<NMREDATA_LEVEL>
0

Level 2 and 3 are used When using list of signals including interchangeable assignments (see .... for examples), use:

<NMREDATA_LEVEL>
1

When using ambiguously assigned signals in 1D or 2D spectra (see .... for examples), use:

<NMREDATA_LEVEL>
2

When using interchangeable and ambiguous assignment, use:

<NMREDATA_LEVEL>
3

<NMREDATA_ID> (for database)

optional but imposed when data are originating from database. When copied from database to database, multiple ID's may be included. These will be defined by database manager and software

DB_ID= the code or number is assigned by the hosting database
Title= Full analysis of whatever from methanol extract of leafs  
Comment= Here more details could be given on the record.
Comment1= Here more details could be given on the record.
Comment2= Here more details could be given on the record.
Comment3= Here more details could be given on the record.
AUTHOR=Doe John, University of Tougalpa, Swinerland (optional)
ORIGIN_ONE=2345627486 (could be about the sample name)
ORIGIN_TWO=323212KKDKKS (could give a date or other reference)
Title_L1=after sep. hplc (this could be extracted from the first line of the title in the 1H spectrum)

One or more identifier can be given under "ID". The ID will be generated by the software generating data and/or the database storing the data, etc. There may be more than one ID (for example one from the software generating it, one from the university labelling the origin of the data, one from the database, one from the publisher of the associated data, etc.) it is to the “generator” of the file to decide if/how to make it unique if desired. InChIKey/SMILES could be given if the soft generating the data is able to specify it. CAS-number if it already exists.

<NMREDATA_SMILES>;optional be strongly encouraged

here comes the smiles code ;mandatory with explicit H... see with JMN for more details ... The reason to includ the smiles in the NMReDATA is that when given with protons, it can be used to generate pure text MNR description of spectra (under elaboration with JMN)


The solvent is specified with the “SOLVENT” tags.

<NMREDATA_SOLVENT>

The solvent is specified using this tag.

>  <NMREDATA_SOLVENT>
CDCl3

For mixture of solvents, the most abundant is first and they are separated by "/" followed by the raio in % separated by ":"

>  <NMREDATA_SOLVENT>
CDCl3/DMSO 80:20
>  <NMREDATA_SOLVENT>
CDCl3/DMSO/D2O 80:10:10

The proportions are given in % volumes.

In the case of RDC measurements, the medium used can be specified in the line following the name of the solvent.

>  <NMREDATA_SOLVENT>
CDCl3
PBLG

The quantity of the orientation medium should be given using usual units (this is vague, but we cannot do better). Agarose used at 1% mass ratio:

>  <NMREDATA_SOLVENT>
D2O
Agarose 1%

For solid-states samples:

>  <NMREDATA_SOLVENT >
solid

<NMREDATA_CONCENTRATION> (optional)

When known, the concentration should be given. Only “mM” are allowed, but the unit is specified.

<NMREDATA_CONCENTRATION>
12.3 mM

<NMREDATA_TEMPERATURE> (optional)

When available the temperature of the sample should be given (only K are allowed, but the unit is given)

>  <NMREDATA_TEMPERATURE>
298.0 K

Assignment tags

Two properties can be "assigned":

-Chemical shifts with the <NMREDATA_SIGNALS> tag.

-Scalar couplings with the <NMREDATA_J> tag.

In most case, only chemical shifts are assigned. Scalar coupling are usually not systematically measured and/or assigned, but when they are, the values measured int the spectra should be compiled in the <NMREDATA_J> tag

Important note: In general we use ", " (comma + space) as separator. The space after the “,” may become optional in future version. Please generate files in a manner that could not use it (replace “, “ with a variable in your code so that you can easily change to “,” (no space) and be ready to read or write files with and without the space).

<NMREDATA_SIGNALS>

Labelling/linking signals to atoms of molecules

In liquid-state NMR, some signals are degenerate. This means that multiple atoms can contribute to the same signal. For example all protons of a methyl group. We will call "signals", each atom having a different “label”, the group of atoms having a common chemical shift because of symmetry properties (not by accident). In principle the labels could be included in the .mol file part (this is part of the definition of .mol files), but this may cause compatibility problems. For example Chemdraw does not recognize well the atom types when they have been given a name and marks them as red (as if the atoms were unknown causing hybridization problems). Because of this problem, we will NOT use the labels of the “.mol” file, but list them in a specific RD tag called NMREDATA _SIGNALS.

For each signal, we first give the label of the signal. Chemical shifts follow the label. Finally, we list the atom(s) it refers to in the structure file. For the signals, the atom numbers start with 1 and go through all the atoms in the molfile. NMR signals always have a chemical shift associated to it. Not all atoms of the molecules will be listed in the NMR_SIGNALS tag, for example if they were not assigned or have no NMR signal (like O, N, or other isotopes for which the spectra were not recorded).

Example, for HO-CH2-CH3, with protons labelled a, b and c, and carbons A and B, (names could be different, I don’t mean that labels have to be called using letters for protons, and numbers for carbons-this is just an example) the tag would be:

>  <NMREDATA_SIGNALS>; ethanol with explicit hydrogen atoms
A, 48.301, 1 ;A corresponds to the carbon of CH2
B, 20.322, 2 ;B corresponds to the carbon of CH3
a, 2.610, 3 ;a corresponds to the hydrogen atom of the OH
b, 4.802, 4, 5; b corresponds to the hydrogen atoms of the CH2
c, 1.401, 6, 7, 8
Ex, 3.6, 9
(optional: list of interchangeable labels – when level>1 see below)

We recommend to provide a label (for example "Ex") and designate all the H of the OH, NH, etc. that are quickly exchanging (for example the OH of glucose in D2O).

For structures where the hydrogen atoms are implit, the reference to hydrogen is made by adding "H" right before the atom number to ditinguish the heavy atom from the H bound to it. When more than one hydrogen atom is implicit, the second is also labeled with "H". It is understood that with implicit hydrogen, the two hydrogens atoms will not be distinguished. Assignment of diastereotopic protons will not be possible. To avoid this problem, use explicit mol structures with defined chirality.

>  <NMR_SIGNALS>;ethanol
A, 48.301, 1 ;the label "A" carresponds to the atom one which is the carbon of the CH2
B, 20.322, 2 ;atom two is the carbon of the CH3
a, 2.610, H3 ;"H3" refers to the hydrogen atom of atom 3 (the oxygen)
b, 4.802, H1 ;"H1" refers to the hydrogen atoms of atom 1 (of the CH2)
c, 1.401, H2

(atom 3 would be the oxygen of ethanol) Only explicit H are allowed. No monomer unit, no “R” group.

Labels can, in principle contain any character and be of any length. We normally exclude comma (,) because they are used as field separator (see below) and the <EOL> and <LF>. These labels can be defined as the chemist wishes. If a database manager wants to change the names of the labels to make them “canonical” or use any norm, up to him, but we accept anything. In principle, the labels should correspond to the ones used in the manuscript of the paper submitted (when we have reviewing mind). In the case of simple numbering, it could be C1, etc. and H-C1, H’-C1, etc. But this is up to the author/file generator to satisfy the format of the journal/database and the readability of the numbering – possibly IUPAC and the need to distinguish what needs to be distinguished (like diastereotopic protons). No ranges are allowed, only single floating point should be specified here) This is necessary because 1D 1H spectra may not be included in the set of spectra and, if they are, signals overlap may result in vague describe such as region analysed as “m” with many signals in. We propose to call “Ex” the spins that are exchanging with solvent (typically OH, NH, etc. in protic solvents) but this is not mandatory. The atoms that are not assigned to at least one NMR signals in the spectra are not listed. (H, C that were not assigned, heteronuclei that are NMR passive (say S, O, N) or for which the spectrum was not recorded (19F, 31P, etc.) .

When using level>1, we allow ave to allow the possibility of ambiguous assignment. We distinguish two type of ambiguities. If two or more labels may be interchanged, they are listed before the end of the list, before the empty line with

Interchangeable=a, b

means that the assignment of a and b may be interchanged.

Interchangeable=(a, CA), (b, CB)

means that “a” together with “CA” may be interchanged with “b” together with “CB”. (this may be if two O-Me are not assigned unambiguously. We may know that carbon “CA” is bound to proton “a” and carbon “CB” to “b” (from HSQC), but maybe we do not know which of the two Me is bound where (no HMBC signal). There may be multiple “Interchangeable” lines.

Note that displaying/working with data with such interchangeable signals may be a challenge. When software cannot take into account exchangeable assignment, they should generate a warning message, read the standard form or, better, propose the choice among the different possibilities.

<NMREDATA_J>

If scalar couplings were measured and assigned, a <NMREDATA_J> tag should be generated to list the couplings (the coupling network). It can contain only JHH, or only 1JCH, or a mix of any type of scalar coupling (the label indicates the isotope through the NMR_SIGNALS table – see above).

>  <NMREDATA_J>
a, b, 7
A, a, 150.3
B, a, 7.5

This may include JHH (from 1D 1H, or from COSY) and/or JCH from RDC measurements, long range JCH from HMBC, etc. These values may also be listed in the fields of the individual spectra they originate from. In this “J” tag, there are “compiled” (with average for example when a JAX is not the same on the A and X multiplets or if couplings are present in 1D 1H and COSY). This can also include coupling to 19F, 31P if the label corresponds to such atom (see NMREDATA_SIGNALS table). Note that this table is compiling data from other spectra. It does not mean that coupling should not be listed in each spectrum where they are observed.

Spectral tags

There is one SDF tag per spectrum in the record. The type of spectrum follows the "NMREDATA_":

<NMREDATA_1D_1H>
... (tag content)

<NMREDATA_2D_13C_J_1H>
... (tag content)

1D spectra

The name of the tag of 1D spectra are:

<NMREDATA_1D_isotope> 

Examples:

<NMREDATA_1D_1H> 
... (tag content)

<NMREDATA_1D_13C>
... (tag content)

<NMREDATA_1D_19F>
 ... (tag content)

The tag first list some properties:

Larmor=double (Mandatory):Larmor frequency of the detected isotope
....

The tag then list the extracted signals. It start with a single chemical shift (x.xxxx) or a chemical shift range (x.xxxx-y.yyyy) (note: use four digits !!!).

Then follows the signal attributes caracterizing the signal.


More details : 1D attributes

<NMREDATA_1D_1H>

The tag of a 1D 1H spectrum will be named “NMR_1D_1H”, the entire tag can look like this:

>  <NMREDATA_1D_1H>
Larmor=500.13
4.8, S=q, E=2, L=a, J=7
2.1, S=bs, E=1, L=b
1.5, S=t, E=3, L=c, J=7

for multiple coupling:

4.8, S=dd, E=1, L=a, J=9.3, 4.8

see the list of fields for more details


In fact only the first number (the chemical shift) is mandatory. The other fields are all optional. There may be more than one signal assigned to a chemical shift (or range of chemical shifts). They are simply listed with "," as separator.

7.2-7.6, E=5, L=H-C1, H-C2, C-C4

One reason for having chemical shifts listed in the <NMREDATA_SIGNALS> tag: is that signals may overlap and be given as a range in the 1H 1D spectrum, but may be clearly determined from HSQC, COSY, etc.


For the results of diffusion measurements, D, etc. can be given in standard unit:

4.8, S=q, E=2, L=a, Diff=1.7e-11

Chemical shifts are given in ppm with four digits after the period (the usual 3 is not enough at high field and high resolution!)


Integrals can use any relative numbers, but preferably rescaled to correspond to the number of protons (1.0 for the reference) or to correspond to concentration when quantification was made (in mM). This will be very useful when analysing mixtures. Integrals should not be rounded up/down to the number of atom: the values should reflect the experimental values. If the 1D spectrum is homodecoupled at a given chemical shift or totally decoupled (pureshift) one of the two following lines should be added:

Decoupled=1H, 1.2 Decoupled=1H

If other spins are decoupled add: Decoupled=19F ...

For many reasons, we need to be able to have ambiguous assignments. In this case instead of the signal’s label alone we list the labels in parenthesis. For example: 4.8, L=(a, b) means that the signal at 4.8 is assigned either to a or b. We need this option to have the possibility to provide ambiguous data because they are quite common, in particular in 2D spectra. This possibility will cause difficulties when reading data for display, structure verification, etc. Programmers will decide what to do: ignore ambiguous assignment, try to resolve them, report them as such, warn about their presence, etc. But we need to be able to report ambiguous assignment because it is very common.

<NMREDATA_1D_13C> etc.

1D 13C spectra and other heteronuclear spectra (nuclei X)

>  <NMREDATA_1D_13C> 
Larmor=125
Decoupled=1H
51.812, I=118.0
20.123, I=123.1

When the 1D X spectrum is not obtained from a simple pulse-detection sequence (i.e. DEPT, APT, etc.) this is specified using an additional label “Sequence”:

>  <NMREDATA_1D_13C> or other isotope
Larmor=125
Decoupled=1H
Sequence=DEPT135 (or DEPT45)
51.812, I=-80.1
20.123, I=123.1

The peak intensity provides the signs of the DEPT-135 signal.

2D spectra

The tag label indicates that it is a 2D spectrum, the two isotope involved (starting with the indirectly detected one) and the type of mixing involved. (See section 5).

2D HSQC <NMREDATA_2D_13C_1J_1H>

>  2D HSQC <NMREDATA_2D_13C_1J_1H>
Larmor=500
CorType=HSQC (COSY; HSQC; HMBC; ?? exact list is still to be defined. Contact Damien Jeannerat)
Pulseprogram=XXX ; this is optional
a/C1, I=1.2
b/48.43, I=1.2

The Larmor frequency is the one of the detected isotope (last in the tag label). “Types” and “Pulseprogram” can be specified. When signals are assigned, only their labels are given (no chemical shifts). If a crosspeak is reported without assignment or partial assignment, the chemical shift replaces the signal’s label. The intensity of the signals (following “I=”. This simply correspond to the intensity of the spectrum at (or very close to) the coordinates of the signals. They should be provided when possible (that is when the software has them accessible). But this is not part of the format. i.e. one should be able to read the .sdf files even in the absence of intensities. The intensities are given in any arbitrary unit (integer of floating point). If the signal has a shape such that the intensity is zero at that center (phase sensitive COSY, for in the middle of a doublet in HSQC, for example) the intensity can be the one at the maximum amplitude of the multiplet. This intensity is not pretending to be “quantitative”. Optional integration of the volume encouraged using “S=” but this requires some “analysis” of the peak shape.

2D HMBC <NMREDATA_2D_13C_NJ_1H>

Here is an example of HMBC data with two examples of ambiguous assignment (could also occur in clusters of peaks):

>  <NMREDATA_2D_13C_NJ_1H>
Larmor=500.13
C1/a; optional comment will be visible in the spectrum’s view
(C2,C3)/b, I=1.2 
C2/(b,c) , I=1.2 
C4,C5,C6)/(e,f) , I=1.2

for HETCOR, the tag label would be

<2D_1H_1J_13C>

to indicate that the first dimension in 1H and the detected dimension is 13C.

By default “2D” heteronuclear spectra are assumed to be isotope1-decoupled during evolution of isotope2 and vice versa. If an HSQC spectrum is recorded without 180 pulse during t1, or without 13C decoupling during t2, (for RDC measurements for example) the following lines should be added respectively:

Nondecoupled=t1
Nondecoupled=t2

If non decoupled, or if the spectrum allows to measure couplings, the heteronuclear couplings may be listed as :

a/C1, Ja=155

2D COSY <NMREDATA_2D_1H_NJ_1H>

See discussion of COSY spectra (below) why using “Ja”.

A COSY spectrum will be coded with,

>  <NMREDATA_NMR_2D_1H_NJ_1H>
CorType=HSQC(COSY; HSQC; HMBC; ?? exact list is still to be defined. Consider this one tentative)
Larmor=500
a/b, I=1.2, S=100 ;
b/a, I=1.2
b/c, I=1.2 
c/b, I=1.2 

If couplings were measured from the cosy, the values should be specified

a/b, Ja=5
where Ja means active coupling (present in f1 and f2)
where J1 means passive coupling(s) (present in f1). 
where J2 means passive coupling(s) (present in f2).

Any number of couplings can be added. For example the A-X crosspeak with A and X coupling with M could be described as:

A/X, Ja=5, J1=4, J2=2

Meaning that the active coupling JAX=5 Hz, the passive couplings JAM=4 Hz and JXM=2 Hz. If the assignment was made (as for 1D 1H), the assigned couplings are specified:

A/X, Ja=5, J1=4(M), J2=2(M)
A/X, Ja=5, J1=4(M), 6.1(K), J2=2(M), 3.1(K)

This allows reporting the result of the analysis of high-resolution DQF-COSY spectra or soft-COSY spectra. In this manner, high-resolution COSY with coupling structures can be simulated for coupling constant verification (refinement, including second-order effects, etc.). Note that couplings can be obtained from the description of the 1D spectrum if they were determined there. They should be included in the 2D fields only if they were measured in the 2D spectrum. Note that, the program generating these data should also generate a <J> tag (see above) to compile J-couplings and present them in a manner that it is easier to read.

2D NOESY <NMREDATA_2D_1H_D_1H>

NOESY spectra will be described as:

>  <NMREDATA_2D_1H_D_1H>
Larmor=500
a/b, I=1.2 
b/c, I=1.2

2D HOESY <NMREDATA_2D_19F_D_1H>

Heteronuclear 19F-1H data would be in a tag called if 1H is detected and 19F in F1:

>  <NMREDATA_2D_19F_D_1H>