Difference between revisions of "Main Page"

Jump to: navigation, search
Line 63: Line 63:
Discussion and complete list on [[Future version| future versions of the NMReDATA format ]]
Discussion and complete list on [[Future version| future versions of the NMReDATA format ]]
= Molblock (2D/3D) structures =
<span style="color:#FF8C00">  ''This has to be refined - comments are welcome! '' </span>
The SDF file format allows to include multiple structures/model/frames in a single SDF file. They are separated by a line with "$$$$".
For the NMReDATA format, there is always one (first) structure representing the "flat" 2D structure. By flat we don't mean that chirality is not specified, but that it has a ''z''-coordinate set to zero.
For version 2.0, we will introduce the possibility to include a 3D structure (additional to the first - not replacing it!).
The second structure (3D with non-zero ''z'' coordinates) may be added by simply appending a molblock to the SDF file and terminate (as usual), the file with "$$$$".
It should fulfil the following conditions: the order of atoms and bonds should be the same as for the main (first) structure. The "only" difference should be the ''x, y, z'' coordinates that will correspond to the determined 3D structure, instead of having ''z'' set to zero as for "flat" structure.
To obey the [http://accelrys.com/products/informatics/cheminformatics/ctfile-formats/no-fee.php official specification] of the MOLfile format and, hence, assure compatibility of the files with other software, the second line in the header of each molblock should include either "2D" or "3D" (the 'dimensional codes') in columns 21 and 22 (the ''dd'' below):
<pre>Line 2 has the format:
A2<--A8--><---A10-->A2I2<--F10.5-><---F12.5--><-I6-> )
User's first and last initials (I), program name (P), date/time (M/D/Y,H:m),
dimensional codes (d), scaling factors (S, s), energy (E) if modelling program input,
internal registry number (R) if input through MDL form.
Note that future developments may impose to include additional structures (for example for multiple conformations DFT/GIAO data...). We will need to make sure the software can unambiguously find the correct 3D structures. We may therefore have to add addition flag to indicate the 3D structure corresponding to the main structure of the NMReDATA. For now, we can consider the that the second structure in the file will be the 3D structures and ignore any addition ones (third, fourth, ''etc.'')
We strongly recommend to have all the NMReDATA tags associated with the first structure, i.e. included before the first "$$$$" line. This is because the current reader may stop reading the SDF file at the first occurrence of "$$$$" and would miss them if they are listed after the 3D structure.
== 2D to 3D conversion ==
When a 3D visualizer does not find a 3D structure, it could generate and add the 3D structure to the output, BUT ask for permission to the user and warn him on the consequences and/or guide him through the process:
-Transforming 2D into 3D is not innocent. If two enantiotopic hydrogen atoms are drawn with regular bonds (simple straight line) and assigned two different signals in the spectrum, it may be for the good reason that the assignment is not known. Introducing a 3D structure will erase the "unknown" and introduce the risk of error. When there is a risk for this to occur, one should use the [http://nmredata.org/wiki/NMReDATA_tag_format#Interchangeable_assignment_.28Only_for_Level.3E0.29"ambiguous" statement in the "NMREDATA_ASSIGNMENT" tag.]
-Other problems of this type probably exist...
In principle transforming 2D into 3D is quite important and useful but has to be done carefully to avoid introducing error or removing information!
= Comparison of assigned data=
<span style="color:#FF8C00">  ''This is a working proposition '' </span>
A tag comparing the data in the current file with those from other files (with the same labels for the assignment)
Externaldata1=... ref to an external .sdf file from the current or external record.
H1 1.54 1.50 1.66 (the first value is the reference value in NMREDATA_ASSIGNMENT tag of the current file the second from Externaldata1, etc.)
H2 1.54 1.50 1.66
Chi2 0.3 0.5 (these could be the chi square of the reference with the external data )
Coupling could also be compared...
Externaldata1=... ref to an external .sdf file from the current or external record.
J(H1,H2) 1.54 1.50 1.66 (the first value is the reference value in NMREDATA_ASSIGNMENT tag of the current file the second from Externaldata1, etc.)
J(H1,H3) 1.54 1.50 1.66
Chi2 0.3 0.5 (these could be the chi square of the reference with the external data)
Other comparisons could be made, for example from signal intensities in HSQC spectra
Externaldata1=... ref to an external .sdf file from the current or external record.
I(NMREDATA_13C_1J_1H,HC1,H1) 122 154 143 (the first value is the reference value in NMREDATA_ASSIGNMENT tag of the current file the second from Externaldata1, etc.)
I(NMREDATA_13C_1J_1H,HC2,H2) 132 151 163
Chi2 5.3 3.5 (these could be the chi squared)
= Certification=
''This is under development... contact administrators if you are considering using this tag...''
When the assignment is made using a computer-assisted manner, the software may want to add a certification of the validity of the data. This should be (up to the manufacturers) to somehow encode it to make it impossible to forge the certification (using hashtag, etc. ?)
Certificates TAGS could be listed at the end of the .sdf file. They can originate from the CASE software or from the database hosting the data and spectra, from the journal (to say data were peer-reviewed). They can be cumulated. If the text of the .sdf file needs to be hashed for certification, the list of TAG used for hashing could be listed.
'''To be refined by the  specialists!'''
Software=CMC_assist<span style="color:#0808F8">'''\'''</span>
Author=Bruker<span style="color:#0808F8">'''\'''</span>
Full_report_tag=CMC4.6_REPOR <span style="color:#0808F8">'''\'''</span>
Confidence_level=4.6<span style="color:#0808F8">'''\'''</span>
Confidence_level_certificate=”ADFS678AG67DFG6SD5F7GS5DFGSD8F5GSD7FG7”  <span style="color:#0808F8">'''\'''</span> 
Verification_certificate=http:/...<span style="color:#0808F8">'''\'''</span>
Software=LSD_V3.0<span style="color:#0808F8">'''\'''</span>
Author=JMN<span style="color:#0808F8">'''\'''</span>
Unique_solution=YES<span style="color:#0808F8">'''\'''</span>
Unique_solution_certificate=”ADFS678AG67DFG6SD5F7GS5DFGSD8F5GSD7FG7”<span style="color:#0808F8">'''\'''</span>
This is only a very vague example. The uniqueness of the structure proposed may be understood in the sense of J.-M. Nuzillard’s ''Logic for Structure Determination'' (LSD) tool. Software producers can include here the specifications of their product. Multiple certification can be listed one after the other. The “Software=...” assignment separates them all in the same <CERTIFICATION> tag.
= Role(s) and scope of the “assignment records”  =
The NMR record can be generated from experimental data (this is how the format was designed), but data may also originate from simulations, predictions, etc.
Tools to compare, evaluate, validate, and check consistency of “assignment records” will certainly be developed.
Assignment records can be generated by commercial software, but also by diverse tools analysing NMR data, homemade processing tools, simulation software, etc. This is why it is important to have a format of data including a maximum of options to be as flexible as possible, even if not all possible uses are clearly defined and used immediately. Ideally, the .sdf files should be converted into other file formats or spectral description without loss.
== Multiple records ==
We should see as an advantage if  databases include multiple "assignment records" associated to the same molecule or the same set of NMR spectra. Some could be old, originating from incomplete literature data. Others could include errors because they originate from bulk data processed automatically. But finally, a computer could verify or create a robustly validated record combining all the other data. Aggregated record could be generated by NMR software/database scoring available data for consistency, calculated chemical shifts and spectral simulations. They could refine chemical shifts and couplings, etc.
==SDF files generated from experimental data==
When the NMR data originate from experimental spectra, they may be quite crude (simple automated integration, peak-picking). At the other extreme, the data may follow complex automated or careful and manual expert analysis. The NMReDATA must have the flexibility to code diverse quality of data: They may be partial, incomplete, contain inconsistencies, impossible features, etc. 
- only 1D <sup>1</sup>H NMR data (with or without integration, coupling, etc.).
- only 1D <sup>13</sup>C data (just from a simple peak peaking)
- only 1D data  but for multiple isotopes (from NMRshiftDB ?)
- full analysis based on computer-assisted software (such as ACDLabs ''Structure Elucidator'', Bruker ''CMC-se'' or Mestrelab ''Mnova'') or web platform (such as cheminfo.org).
- 1D and 2D data processed automatically with ambiguities on the signal assignment and partial (for example not all signals are assigned) and/or ambiguous (due to lack of resolution, or other problems)
- The file may not contain the actual assignment, only the structure and the list of chemical shift (the assignment could be added by NMR tools).
- The data may come from a scientific report i.e. the text providing the description of the spectra.
Scripts could be written to convert such a "pure text" description into .sdf file and include the .mol file.
For assignment work made with only "paper and pencil", a tool allowing to draw a molecule, enter lists of signal names and 2D correlations could be easily made. We could consider to accept .pdf or pictures of the spectra when the original files do not exist anymore.
==SDF files generated from calculated data==
<span style="color:#FF8C00">  ''This is a working proposition to include JCAMP spectra in NMR Records / Discussion using Slack '' </span>
The NMR data may originate from DFT calculations or any other type of predictor of chemical shifts, and/or coupling. In such a case, a general tag is added to provide information about the software. For example:
Geometry=method/basis set
Shielding=method_basis set
Coupling=method_basis set
References to addition files located in a folder called "DFT_GIAO_calculations" could be added. They could include:
-the equation used to convert sheelding in chemical shifts (for 1H, 13C)
-the list of conformers calculated, their energies ?, the Boltzman ratio ?
-all 3D structures of the conformations in a conformations.sdf file (not the main NMReDATA .sdf file). The main .sdf file can contain the 3D structure of the lowest energy conformation and the flat structure only.
-all the output files of the shielding (and coupling) calculations ? Also the geometry optimization ?
Examples (tentative) of .nmredata.sdf files originating from Gaussian can be found here: [https://www.dropbox.com/sh/yfx82u28qpitiuz/AACVJ8J1Ozgzq82I4GQzXrwUa?dl=0 androsten data]
==SDF files generated from literature data==
<span style="color:#FF8C00">  ''This is a working proposition under review '' </span>
When the NMR data originate from publications, a reference to the published paper/book/thesis is given in the NMREDATA_LITERATURE tag.
DOI=DOI_HERE (if Reference field is DOI specify it here)
CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)
ISBN=ISBN_HERE (if Reference field is DOI specify it here)
CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)
Thesis=HTML link here (if available if not "LastName, Firstname(s), institution providing the degree, city, country, year of publication.
CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)
==SDF files generated after revision of existing SDF files==
Assignment records may be generated after revision from experimental, literature, prediction data, etc. Ideally, the original .sdf files should also be generated to facilitate comparison or exist somewhere and be referred to. In both cases reference should be given.
Record_number=ref_to_the_original_record (multiple reference is allowed for aggregation of records – separated by “,”).
Date =date.... standard format for date
Correction="fixed assignments of C(13) and C(15)"
This is also to be refined according to future developments.
= Concerning symmetry =
== Magnetic non-equivalence ==
For symmetrical molecules a difficulty may arise to code coupling and 2D correlations.
Reminder: Couplings are not directly associated to atoms, but to labels (in the NMREDATA_ASSIGNMENT tag). Labels are associated to one or more atoms (in case of symmetry/fast rotation, etc.).
Examples of difficulty/solution concerning scalar coupling: For the <sup>1</sup>H spectrum of 1, 2 dichlorobenzene, we have two multiplets in the 1D <sup>1</sup>H spectrum (two different protons in an AA’XX’ system) so if the SDF file includes two labels (one for A and one for X, each pointing to two atoms), in principle one can only give one coupling: the J<sub>A,X</sub>  (no J<sub>A,A</sub> or J<sub>A,X'</sub>). But if one desires to specify all the couplings, give two different "labels" to A and A' (each pointing to only one atom), so that different coupling can be given for J<sub>A,X</sub>, J<sub>A',X</sub>,J<sub>A,A'</sub>, J<sub>X,X'</sub>. This may be desired so that the 1D spectrum can be simulated with the correct non-equivalence effect.
== HMBC correlations in symmetrical molecules ==
Consider the following molecules:
[[File:Hmbc sym.png]]
A <sup>3</sup>J<sub>C,H</sub> HMBC correlation will be visible between the proton '''''a''''' and C(1) that seems to be the directly bound carbon. Because the carbons <sup>1</sup>J and <sup>3</sup>J bond, relative to a proton are symmetrical. Software may see the correlation as <sup>1</sup>J, but, it should be able to analyse the NMREDATA_ASSIGNMENT tag and see that '''''a''''' and C(1) are pointing to two atoms, and that the correlation may correspond to any combination of the four possible pairs. Two pairs will seem as the actual <sup>3</sup>J and two as the <sup>1</sup>J.
= Why not add more data in NMReDATA tags? =
We consider that our task is to focus on NMR data. But SDF files could (and probably should!) also include other experimental data such as:
1) The origin of the molecule. This may include the extraction method and the plant it originates from, in phytochemistry, or the reaction producing it.
2) MS data
3) other spectral data
In principle authors can add any tag provided they have tools to do it and requests from the journals... such data could have the following form...
The software producing SDF files including NMReDATA, should read SDF files and write SDF files only adding (or modifying/reviewing) the NMReDATA data. Any other SDF tags should be passed from the file which is read to the file which is generated.
<span style="color:#FF0000"> <small><small>
<span style="color:#FF0000"> <small><small>

Revision as of 17:04, 1 April 2019

Quick links

Important annoncement: On Sept 26, 2019 the First NMReDATA symposium will take place in Porto, Portugal.

Direct link to the page describing the format of the NMREDATA tags.

List of compatible software

Tentative instruction for journal submission of NMReDATA.

Discussion on future versions of the NMReDATA format


The NMReDATA working group decided to include data extracted from NMR spectra of small molecules in SDF files using SD tags.

Link to the MRC article on the NMReDATA format.

An important task of the group is to define the format of the content of the "<NMREDATA_...>" tags. More details here!.

The version 1.0 will be decided in September at the "Round table" of the Smash 2017 conference at Baveno, Italy.

The SDF file alone (that is without the spectra) cannot be used to verify that the assignment corresponds to the spectra. It is therefore important to always have the spectra with the SDF file! We call "NMR Record" the combination of the spectra and the SDF file.

NMR records

We call "NMR record", a folder (or .zip file including the folder) or a database record including:

1) All the NMR spectra (including FID, acquisition and processing parameters). The format of these data is as produced by the manufacturer of the instrument which acquired the data. That means that software generating the data either has these crude data available or it will ask the user to point to the crude data in order to include them in the NMR record.

2) The SDF file including the NMReDATA (.nmredata.sdf file)

NMR record

A more detailed pictorial representation of NMR record and example of SDF file presented in the poster presented in July at the Euromar 2017. Note: The NMREDATA tag "SIGNALS" was renamed "ASSIGNMENT" in Version 0.98.

NMR records will be requested by Magnetic Resonance in Chemistry from 2018 on. The editors of software (ADC/Labs, Bruker, cheminfo, Mestrelab) will be ready by the end of 2017 to produce MNR records for submission to MRC.

Records will be either analysed on web pages, or downloaded, and the nmredata.sdf file opened by the software which will access automatically to the associated spectra.

The full description can be found in the NMReDATA tag format page.

An example of .nmredata.sdf files with the spectra can be found here

Current version of the format of NMReDATA

The format can be found here : NMReDATA tag format

A small set of simple examples of .nmredata.sdf files can be found here (Version 0.95). (Note: the field called "NMREDATA_SIGNALS" was renamed "NMREDATA_ASSIGNMENT" in V 0.98). It includes .nmredata.sdf files for ethanol with diverse formats (explicit or implicit hydrogen atoms, etc.)


Changes to Versions 1.1

Addition of backslash at the end of the line in text of the NMReDATA tags. (This is to avoid the problem that CDK librairies ignore the NewLine character (ASCII 10). Since we need a line separator, we use backslash + NewLine in NMReDATA tags). (More details.)

Changes to Versions 2.0

The version 2 is under development. It will probably include the following changes:

  • Possibility to add a 3D structure (on top of the "flat" 2D structure).
  • Possibility to add author and institution information (to facilitate integration of these data in repositories).
  • etc.

Discussion and complete list on future versions of the NMReDATA format