Difference between revisions of "Main Page"

From NMReDATA
Jump to: navigation, search
(Publications)
 
(270 intermediate revisions by 10 users not shown)
Line 1: Line 1:
= Introduction =
 
The [http://www.nmredata.org NMReDATA working group] decided to include data extracted from NMR spectra of small molecules using "tags" in SDF files.
 
  
== SDF Files ==
+
'''Quick links'''
In their simplest form, SDF files include a molecular structure with the .mol format.
 
  
But SDF files offer more possibilities, in particular they allow to add meta data called "Tags" below the ".mol" part of the file.
+
Direct link to the page describing the [[NMReDATA tag format|format of the NMREDATA tags (version 1.0/1.1)]] and [[NMReDATA tag format 2.0|format of the NMREDATA tags (version 2.0)]].
  
For more information on SDF format see ...
+
List of [[compatible software|compatible software and webtools]]
  
== SDF tags ==
+
Tentative instruction for [[Submission NMReDATA|journal submission of NMReDATA]].
A SDF tag, always has the following structure:
 
 
 
>  <TAG_NAME>
 
... tag content line 1 ...
 
... tag content line 2 ...
 
... tag content line 3 ...
 
[Empty line to indicate the end of the tag]
 
 
 
For more information on SDF tags and how to read/write them see ...
 
 
 
== SDF tags including nmr data ==
 
The working group decided to use a set of tags to include the NMR data extracted from the NMR spectra in SDF files.
 
 
 
The labels of the tag all start with "NMREDATA_".
 
 
 
[[File:example.pdf]]
 
 
 
[[NMReDATA tag format|Format of the content of the "<NMREDATA_...>" tags]].
 
 
 
The version 1.0 will be decided in September at the "Round table" of the Smash 2017 conference at Baveno, Italy.
 
 
 
= Comments and changes to version 0.98 =
 
 
 
Concerning the intensity in 2D spectra, it was not a good idea to ask to strictly give the intensity at the coordinates of the peak. This may fall in the middle of a doublet (in a HSQC for example). So this was slightly rephrased in this version
 
 
 
1) For all 2D spectra, the intensity of the spectrum at (or very close to) the coordinates of the correlated peaks should be given when the spectrum is available.  If the signals has a shape such that the intensity is zero at that center (phase sensitive COSY, for in the middle of a well resolved doublet in HSQC, for example) the intensity can be measure at the maximum amplitude of the multiplet.  This intensity is not pretending to be “quantitative”. Optional integration of the volume is possible using the "E=" attribute.
 
 
 
= Generat structure of SDF files =
 
== Structure ==
 
The first part of SDF files include the structure (in the .mol format)
 
== Tags ==
 
Follow a number of tags. When opening .sdf files, most chemical structure editor ignore the tags. But specilized sofware can manage them.
 
... to be updated ...
 
[[NMReDATA tag format]]
 
 
 
For all experimental spectra, the last line refers to the spectrum stored in an openly and electronically accessible NMR database. (By spectrum, we mean the actual data of the spectrum (“2rr” , but also the acquisition and processing parameters “fid/ser”, “acqus”, “procs”, “proc2s”, etc.).
 
Spectrum_ID=HJK33HKJ22342 (mandatory - given by the database where it is stored!)
 
Spectrum_Location=http://... (when NMReDATA and spectra are on database)
 
Spectrum_Location =file:./nmr/10/1/pdata/1 (when the files of the spectra and the NMReDATA are in a folder)
 
To be refined by specialists. But I think we should really make some efforts for this to be done and work!!
 
 
 
= Origin of the sample To be abandoned....=
 
 
 
In phytochemical papers, the origin of the sample should be specified but this cannot be our task to define this – this is not part of the NMReDATA initiative. In principle authors can add any tag provided they have tools to do it and requests from the Journals... such data could have the following form...
 
 
 
=<NATURAL_ORIGIN>=
 
Common_Name_English=flower
 
Genus=Amaranthus %usually the part with a first capital letter
 
Species=retroflexis %usually the part whit no first cap. letters.
 
Class =
 
Type =
 
 
 
Similarly, for synthesis, the source could be requested :
 
 
 
=<NMREDATA_SYNTHETIC_ORIGIN>=
 
Reaction Reference=http:// link to a database of reactions ??
 
  
 +
Link to the [https://github.com/NMReDATAInitiative GitHub page] of the NMReDATA Initiative.
  
 +
= Introduction =
 +
The [http://www.nmredata.org/partners.html NMReDATA working group] decided to include data extracted from NMR spectra of small molecules in [[sdf files|  SDF files]] using SD tags.
  
 +
Link to the [https://onlinelibrary.wiley.com/doi/abs/10.1002/mrc.4737 MRC article] on the NMReDATA format.
  
= Certification=
+
An important task of the group is to define the format of the content of the "<NMREDATA_...>" tags.  [[NMReDATA tag format|More details here!]].
  
When the assignment is made using a computer assisted manner, the software may want to add a certification of the validity of the data. This should be (up to the manufacturers) to somehow encode it to make it impossible to forge the certification (using hashtag, etc. ?)
+
The version 1.0 has been decided in September at the "Round table" of the Smash 2017 conference at Baveno, Italy.
Certificates TAGS could be listed at the end of the .sdf file. They can originate from the CASE software or from the database hosting the data and spectra, from the journal (to say data were peered reviewed). They can be cumulated. If the text of the .sdf file needs to be hashed for certification, the list of TAG used for hashing could be listed. (I’m not sure what needs to be done to certify the validity of certificates. To be refined by the certificate specialists).
 
  
>  <NMREDATA_CERTIFICATION>
+
The SDF file alone (that is without the spectra) cannot be used to verify that the assignment corresponds to the spectra. It is therefore important to always have the spectra with the SDF file! We call ''"NMR Record"'' the combination of the spectra and the SDF file.
Software=CMC_assist
 
Author=Bruker
 
Confidence_level=4.6
 
Confidence_level_certificate=”ADFS678AG67DFG6SD5F7GS5DFGSD8F5GSD7FG7”
 
Unique_solution=YES
 
Unique_solution_certificate=”ADFS678AG67DFG6SD5F7GS5DFGSD8F5GSD7FG7”
 
ETC...
 
  
This is only a very vague example. The uniqueness of the structure proposed may be understood in the sense of J.-M. Nuzillard’s LSD tool. Software producers can tell what needs to be done for their format. Multiple certification can be listed one after the other. The “Software=...” assignment separates them all in the same <CERTIFICATION> tag.
+
A 1.1 version was released to fix a problem in 1.0.
= 4 Role(s) and scope of the “assignment records”  =
 
  
The NMR record can be generated from experimental data but also from simulations, predictions, etc. Tools to compare, evaluate, validate, and check consistency of “assignment records” will certainly be developed. Assignment records can be generated by commercial software, but also by diverse tools analysing NMR data, homemade processing tools, simulation software, etc. This is why it is important to have a format of data including a maximum of options to be as flexible as possible, even if not all possible uses are clearly defined and used immediately. Ideally, the .sdf files should be converted into other file format or spectral description without loss.  
+
Version 2.0 adds extensions and was released after the [[Symposium2019|'''1st NMReDATA symposium''']] (Sept. 16, Porto, Portugal).
  
We should see as an advantage if the databases include multiple "assignment records" associated to the same molecule. Some could be old, originating from, incomplete literature data. Others could include errors because they originate from bulk data processed automatically. But finally a computer could verify and nicely validated record combining all the other data. Aggregated record could be generated by NMR software/database scoring available data for consistency, calculated chemical shifts and spectral simulations. They could refine chemical shifts and couplings, etc.
+
= NMR records =
 +
<!-- [[commented_example_nmredata|Examples of commented and simplified NMREDATA tags ]] -->
  
==Experimental data==
+
We call "NMR record", a folder (or .zip file including the folder) or a database record including:
  
When the NMR data originate from experimental spectra, they may be quite crude (simple automated integration, peak-picking) or follow complex automated or manual analysis. The data may be partial, incomplete, contain inconsistencies, impossible features, etc.  The content may be diversely complex depending on the origin of the data:
+
1) All the NMR spectra (including FID, acquisition and processing parameters). The format of these data is as produced by the manufacturer of the instrument which acquired the data. That means that software generating the data either has these crude data available or it will ask the user to point to the crude data in order to include them in the ''NMR record''.  
- only 1D 1H NMR data (with or without integration, coupling, etc).
 
- only 1D 13C data (just from a simple peak peaking)
 
- only 1D data but for multiple isotopes (from NMRshiftDB ?)
 
- full analysis based on computer-assisted software (such as Bruker cmc-se ACDLabs Structure Elucidator or Mestrelab Mnova) or web-platform (cheminfo.org)
 
- 1D and 2D data processed automatically with ambiguities on the signal assignment and partial (for example not all signals are assigned) and/or ambiguous (due to lack of resolution, or other problems)
 
- The file may not contain the actual assignment, only the structure and the list of chemical shift (the assignment could be added by NMR tools).
 
- The data may come from scientific report i.e. the text providing the description of the spectra. It could be like the one of the text of the following figure
 
( from http://onlinelibrary.wiley.com/doi/10.1002/mrc.4527/full).
 
 
Scripts could be written to convert such a "pure text" description into .sdf file and include the .mol file.
 
- For assignment work made with only "paper and pencil", a simple webtool allowing to draw a molecule, enter lists of signal names and 2D correlation could be easily made. We could consider to accept .pdf or pictures of the spectra when the original files do not exist anymore.  
 
  
Synthetic/predicted data
+
2) The SDF file(s) including the NMReDATA (''.nmredata.sdf'' file)
  
The NMR data may originate from DFT calculations or any other type of predictor of chemical shifts, and/or coupling. In such a case, a general tag is added to provide information about the software. For example:
 
  
>  <NMREDATA_ORIGIN>
+
[[File:nmr_record.png|600px|center|NMR record]]
Source=Calculation
+
A more detailed picture (see [[File:test-2.pdf|600px|center|pictorial representation of NMR record and example of SDF file]]) was presented in the [http://nmredata.org/euromar_2017_v5_optimized.pdf poster] presented in July at the Euromar 2017. '''Note:''' The NMREDATA tag "SIGNALS" was renamed "ASSIGNMENT" in Version 0.98.
method=DFT
 
Geometry=method/basis set
 
Shielding=method_basis set
 
Coupling=method_basis set
 
Software=...
 
Version=...
 
  
==Literature data==
 
  
When the NMR data originate from publications, a reference to the published paper/book/thesis are given in the NMREDATA_LITERATURE tag.
+
NMR records will be [http://onlinelibrary.wiley.com/doi/10.1002/mrc.4631/full requested by ''Magnetic Resonance in Chemistry''] from 2018 on. The editors of software (ADC/Labs, Bruker, cheminfo, Mestrelab) are able to grenerate MNR records.  
  
>  <NMREDATA_LITERATURE>% This was rename from NMREDATA_ORIGN
+
Records will be either analysed on web pages, or downloaded, and the nmredata.sdf file opened by the software which will access automatically to the associated spectra.
Source=Journal
 
DOI=DOI_HERE (if Reference field is DOI specify it here)
 
CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)
 
  
>  <NMREDATA_LITERATURE>%
+
The full description can be found [[NMReDATA tag format|in the NMReDATA tag format page]].
Source=Book
 
ISBN=ISBN_HERE (if Reference field is DOI specify it here)
 
CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)
 
  
>  <NMREDATA_LITERATURE>%
+
An example ('''probably obsolete''') of .nmredata.sdf files with the spectra can be found [https://www.dropbox.com/sh/hu0qudy2bt56ix0/AACc8UiUoeEskSDVhYnP-cZna?dl=0 here]
Source=Thesis
 
Thesis=HTML link here (if available if not "LastName, Firstname(s), institution providing the degree, city, country, year of publication.
 
CompoundNumber=label used in the reference to designate the compound (typically a number in boldface)
 
  
For revised/update data
+
A better source is the [https://github.com/NMReDATAInitiative/Examples-of-NMR-records GitHub example page] of the [https://github.com/nmredatainitiative GitHub page of the Initiative].
  
Assignment records may be generated after revision from experimental, literature, prediction data, etc. Ideally, the original .sdf files should be also generated to facilitate comparison or exists somewhere and be referred to. In both cases reference should be given.
+
== Mixtures of compounds ==
 +
When more than one compound is present in a sample, multiple .sdf are produced and called compound1.nmredata.sdf, compound2.nmredata.sdf, etc.
 +
= Current version of the format of NMReDATA =
 +
The format can be found here : [[NMReDATA tag format]]
  
>  <NMREDATA_UPDATE>
+
Various examples of NMReDATA files can be found in the [https://github.com/NMReDATAInitiative/Examples-of-NMR-records github repository].
Source=Record
+
= Database =
Record_number=ref_to_the_original_record (multiple reference is allowed for aggregation of records – separated by “,”).
+
The NMReDATA Inititiative define the format and rules for hosting and provide data, but hosting data is not part of our mission.  
Date =date.... standard format for date
 
Correction="fixed assignments of C(13) and C(15)"
 
This is also to be refined according to future developments.
 
 
  
= Problems related to symmetry =
+
Discussion on [[database_policy | about database providers ]]
  
This section is tentative... to be worked on in the future....
+
=  Versions =
 +
==Changes to Versions 1.1 ==
  
For symmetrical molecules a difficulty arises to code coupling and 2D correlations.
+
*[[end-of-line | Addition of backslash]] at the end of the line of NMReDATA tags.
1) Problem for coupling: For the 1H spectrum of 1, 2 dichlorobenzene, we have two signals (two different protons in an AA’XX’ system) so if the SDF file includes two signals (one for A and one for X), in principle one can only give one coupling: the J(A,X). But we should be able to give other coupling constants with respect to the prime H. When only one symmetry property is present, it may not be too difficult to include in the format a manner to provide pairs of couplings instead of only one, but with more than one symmetry, it would become complicated…
 
2) Problem for correlations. Consider 1, 4 dichlorobenzene, a 3J(C,H) HMBC correlation will be visible between a proton that seems to be the directly bound-carbon. Because the carbons 1J and 3J bond, relative to a proton are symmetrical.
 
  
We have three different possibilities:
+
==Changes to Versions 2.0 ==
1) Ignore the problem. It may not be so serious in fact. In systems with non-magnetically equivalent spins, the coupling structures are complex and the coupling will probably not be measured in routine exp. Concerning the HMBC correlation, the HMBC correlation will be ambiguous and it will be to the person/software checking consistency, i.e. to see that when signals correspond to more than one proton or more than one carbone, it suffice that one of the possible combination of Hortho, H’ortho and Cortho and C’ortho corresponds to 3JCH and the check is passed… even if it also pass a check for 1JCH.
 
2) Duplicate all signals (or the subset with symmetry). If we list two signals (with two different labels) for Hortho and H’ortho, then we will have no problem anymore with coupling (one will be able to give a J(A, X) and (JAX’) and ambiguous correlations (they will be OK) . But the problem may be that for any chemical shifts, there will always be two labels/spins and that may cause confusion for the assignment software. The complications may be worse than the problem.
 
3) Try to face the problem and develop a serious method to include symmetry… Could be the object of future work.
 
5 TAG names for spectra
 
= structure of tags =
 
The structure of the name of the SD tag of spectra is constructed as follows. It describes the pulse sequence.
 
1) The number of dimensions is given (e.g. “2D_...”)
 
2) Follows, the isotope of the first indirect dimension (e.g. “..._13C_...”)
 
3) Follows the code of the mixing to the next dimension (e.g. “..._1J_...”).
 
4) Finally the detected isotope is given. (e.g. “..._1H”).
 
the TAG of the HSQC is therefore “<NMREDATA_2D_13C_1J_1H>”
 
Mixing can be:
 
1J for one bond (typ. HSQC)
 
NJ (multiple bound J, for cosy, hmbc)
 
TJ TOCSY
 
etc. (see list below for more details).
 
  
For J-resolved and related experiments (DIAG, δ-resolved) where the indirect dimension is not a chemical shift (no correlation present), only the detected isotope is given (<2D_1H>). The spectrum is described as a 1D 1H spectrum (providing chemical shift, couplings, etc.).
+
*Possibility to include Jcamp spectra on top of original data from the instrument.
 +
*Possibility to add a 3D structure (on top of the "flat" 2D structure).
 +
*Possibility to add author and institution information (to facilitate integration of these metadata in repositories).
  
All tag have “NMREDATA_” before the TAG names listed below
+
For details, see [[NMReDATA_tag_format_2.0|the specification]].
 
  
P.S. Some names are somewhat tentative. We don’t necessarily mean to define 3D spectra here or projections of 3D to 2D (HSQC-TOCSY). The list is mostly to test the ability of the format to list as many experiments as possible with the same logic.
+
==Changes to Versions 2.1 (future version)==
 +
<span style="color:#FF8C00">  ''Future version of the format! Suggestions are welcome! '' </span>
 +
*Possibility to include [[Jassign|J-coupling assignment data]] (coupling network and J-graph data).
  
== New section ==
+
Discussion and complete list on [[Future version| future versions of the NMReDATA format ]]
'''MediaWiki has been successfully installed.'''
 
  
Consult the [http://meta.wikimedia.org/wiki/Help:Contents User's Guide] for information on using the wiki software.
+
= Jmol integration in this wiki =
 +
NMReDATA is being integrated with [[Jmol |  Jmol/JspecView ]]
  
== Getting started ==
+
= Publications =
* [http://www.mediawiki.org/wiki/Manual:Configuration_settings Configuration settings list]
+
* NMReDATA, a standard to report the NMR assignment and parameters of organic compounds. M. Pupier et al. (2018) Magnetic Resonance in Chemistry 56:703-715. doi: [https://doi.org/10.1002/mrc.4737 10.1002/mrc.4737]  PMID: 29656574
* [http://www.mediawiki.org/wiki/Manual:FAQ MediaWiki FAQ]
+
* NMReDATA: Tools and applications. S. Kuhn et al. (2021) Magnetic Resonance in Chemistry 59:792-803. doi: [https://doi.org/10.1002/mrc.5146 10.1002/mrc.5146] PMID: 33729627
* [https://lists.wikimedia.org/mailman/listinfo/mediawiki-announce MediaWiki release mailing list]
 

Latest revision as of 21:42, 17 February 2024

Quick links

Direct link to the page describing the format of the NMREDATA tags (version 1.0/1.1) and format of the NMREDATA tags (version 2.0).

List of compatible software and webtools

Tentative instruction for journal submission of NMReDATA.

Link to the GitHub page of the NMReDATA Initiative.

Introduction

The NMReDATA working group decided to include data extracted from NMR spectra of small molecules in SDF files using SD tags.

Link to the MRC article on the NMReDATA format.

An important task of the group is to define the format of the content of the "<NMREDATA_...>" tags. More details here!.

The version 1.0 has been decided in September at the "Round table" of the Smash 2017 conference at Baveno, Italy.

The SDF file alone (that is without the spectra) cannot be used to verify that the assignment corresponds to the spectra. It is therefore important to always have the spectra with the SDF file! We call "NMR Record" the combination of the spectra and the SDF file.

A 1.1 version was released to fix a problem in 1.0.

Version 2.0 adds extensions and was released after the 1st NMReDATA symposium (Sept. 16, Porto, Portugal).

NMR records

We call "NMR record", a folder (or .zip file including the folder) or a database record including:

1) All the NMR spectra (including FID, acquisition and processing parameters). The format of these data is as produced by the manufacturer of the instrument which acquired the data. That means that software generating the data either has these crude data available or it will ask the user to point to the crude data in order to include them in the NMR record.

2) The SDF file(s) including the NMReDATA (.nmredata.sdf file)


NMR record

A more detailed picture (see File:Test-2.pdf) was presented in the poster presented in July at the Euromar 2017. Note: The NMREDATA tag "SIGNALS" was renamed "ASSIGNMENT" in Version 0.98.


NMR records will be requested by Magnetic Resonance in Chemistry from 2018 on. The editors of software (ADC/Labs, Bruker, cheminfo, Mestrelab) are able to grenerate MNR records.

Records will be either analysed on web pages, or downloaded, and the nmredata.sdf file opened by the software which will access automatically to the associated spectra.

The full description can be found in the NMReDATA tag format page.

An example (probably obsolete) of .nmredata.sdf files with the spectra can be found here

A better source is the GitHub example page of the GitHub page of the Initiative.

Mixtures of compounds

When more than one compound is present in a sample, multiple .sdf are produced and called compound1.nmredata.sdf, compound2.nmredata.sdf, etc.

Current version of the format of NMReDATA

The format can be found here : NMReDATA tag format

Various examples of NMReDATA files can be found in the github repository.

Database

The NMReDATA Inititiative define the format and rules for hosting and provide data, but hosting data is not part of our mission.

Discussion on about database providers

Versions

Changes to Versions 1.1

Changes to Versions 2.0

  • Possibility to include Jcamp spectra on top of original data from the instrument.
  • Possibility to add a 3D structure (on top of the "flat" 2D structure).
  • Possibility to add author and institution information (to facilitate integration of these metadata in repositories).

For details, see the specification.

Changes to Versions 2.1 (future version)

Future version of the format! Suggestions are welcome!

Discussion and complete list on future versions of the NMReDATA format

Jmol integration in this wiki

NMReDATA is being integrated with Jmol/JspecView

Publications

  • NMReDATA, a standard to report the NMR assignment and parameters of organic compounds. M. Pupier et al. (2018) Magnetic Resonance in Chemistry 56:703-715. doi: 10.1002/mrc.4737 PMID: 29656574
  • NMReDATA: Tools and applications. S. Kuhn et al. (2021) Magnetic Resonance in Chemistry 59:792-803. doi: 10.1002/mrc.5146 PMID: 33729627