Parser JSmol

From NMReDATA
Revision as of 15:38, 12 May 2019 by Angel.herraez (talk | contribs) (parsing J from NMREDATA_1D_1H)
Jump to: navigation, search

Parser for NMReDATA zipfiles in a web page using JSmol

This is a copy of contents in Parser and portions of NMReDATA tag format with added annotations of what is being implemented in the NMReDATA file reader, HTML application using JSmol.

Reading the NMReDATA zipfile

Data-parser-ok.png The content of the input is kept in memory so that it can be written later.

Data-parser-ok.png Files contained in the zipfile are listed in a folder and file tree on the left column, alphabetically sorted. The tree is interactive in order to display contents of those files.

Reading the SDF file

Several models

The SDF file may contain more than one structure, each one (molblock) with its own tags. We expect that the NMReDATA tags will be included only on the first model, but it might be otherwise.

Data-parser-ok.png All tags for all models are displayed in the central section, first the group of NMReDATA tags and then the group with all other tags. If needed, a number is added in front to identify the model number.

The first model will usually be 2D. It is desirable to be 3D but this will not always happen.

We expect that a second model might be the 3D structure.

Data-parser-ok.png All structure models are transferred to the JSmol panel, and displayed there one at a time. Buttons are added below to switch the display from one model to another. The 2D or 3D nature is detected using two methods: the code in line 2 of the molblock (characters 21,22, according to the MOL/SDF V2000 format specification) and the Z coordinates of all atoms. The first takes precedence, but any discrepancies raise a warning.

It is important to keep any stereochemical information. Even in 2D data the atoms may have extra information, i.e. for explicit hydrogen atoms the chirality is given by the fourth column of the bond list in the molblock (more precisely, characters 10 to 12, according to the MOL/SDF V2000 format specification).

Data-parser-ok.png Explicit H atoms with stereo information in their bond are coloured light salmon (bisque, bond up) or light blue (powderBlue, bond down) in the JSmol structure. They are also moved 0.4Å up or down in the Z axis direction for better viewing.

Tags

The SDF files may contain diverse tags (not only the NMREDATA tags). Be ready for the tags to appear in any order.

Data-parser-ok.png They are accepted in any order and are displayed as they arrive, except that NMREDATA tags are grouped first. (We might sort them if needed)

All tags should be written in the output SDF file. We recommend to write tags then in the same order.

Determine how to read the NMREDATA tags

Scan the tags and list the index of the ones including NMREDATA_ in their name. Read the NMReDATA tags.

Data-parser-ok.png Both NMReDATA tags and any others are displayed, with their contents.

Data-parser-ok.png Keep in mind the end of line problem.

  1. If backslash+<EOL> pair is present, it is replaced with <EOL> for display.
  2. If backslash only is present, it is replaced with <EOL> for display.
  3. When saving, backslash will have to be added before any <EOL>.
  4. There seems to be no need to check the value of VERSION (although it is possible).

Read the NMREDATA tags

Many simple tags have no particular format. (NMREDATA_SOLVENT, NMREDATA_VERSION, etc.)

But most "complex" tags (NMREDATAT_ASSIGNMENT, NMREDATA_J, NMREDATA_1H, etc.) all have a common general structure:

Two type of lines should be distinguished:

  • Property lines
  • Item from a list

The "property lines" contain a serie of characters (letters) followed the "=" sign followed by the value of the variable. They can be directly used as name and value of object properties. The value should be read as text , because they have to be revritten later when writing the .sdf file ...

The "property lines" should be identified as such to distinguish them from element of a list. Property lines should be located before the list, but some may follow the list (not recommended, but possible).

IMPORTANT: Note that a property may appear more than once. In this case, it should be read into an array (and be written in the same order when writing the file).

The "Item from a list" have a format that depends on the tag. We have four types of tags:

  • NMREDATA_ASSIGNMENT
  • NMREDATA_J
  • NMREDATA_1D_ ...
  • NMREDATA_2D_ ...

For NMREDATA_1D_ and NMREDATA_2D_ see 1D_attributes and 2D_attributes.

We recommend the store the list as an array of array of characters, and analyse it later. We suggest that when analysing a tag, each line is tested to see if it is a "property line" (see the format above). If it is not a property line, it is a item from a list.

This allows to have a single functions testing properties.

Analyse the individual NMR tag

Depending on the program, what data will be extracted will vary, but all Properties (whether they are understood or not) should be send to the output when writing the file. They don't need to be analysed. Same argument for the properties of peaks. They should all be stored (even if not understood) to be written later.

<NMREDATA_ASSIGNMENT>

Data-parser-ok.png List of assignments is read and displayed on the structure using labels on each atom referenced. Labels for implicit Hydrogens are added to those of the heavy atom, as 2nd, 3rd lines.

Data-parser-ok.png Labels are enclosed in <""> when they include "," "/" "\" "|" or "&" characters. All these have been tested and are properly working:

a
H-C(1)
Proton_1
H'
H
Ha or Hb
<"Ha,Hb">
<"Ha/Hb">
<"Ha\Hb">
<"Ha|Hb">
<"Ha&Hb">
<"Ha & Hb">
<"Ha,Hb,Hc">
<"Ha/Hb/Hc">
<"Ha\Hb\Hc">
<"Ha|Hb|Hc">
<"Ha&Hb&Hc">
<"Ha & Hb & Hc">

Data-parser.png The possibility that the assignment may be vague. Interchangeable or ambiguous assignment, with the keyword "Interchangeable=".

  • Pre-implemented: it is not displayed, but it is detected and a warning alert is displayed (with a count).

Data-parser.png Equivalent spins, with the keyword "Equivalent=".

  • Pre-implemented: it is not displayed, but it is detected and a warning alert is displayed (with a count).

<NMREDATA_J> (couplings)

Data-parser.png This tag includes atom labels as defined in <NMREDATA_ASSIGNMENT>. Proposal:

  • Display couplings on the structure using a line that connects the two atoms involved.
    • Problem: when there are implicit Hydrogens involved in the coupling, the line may go to the respective heavy atom but needs to be differentiated.
    • Problem: the heavy atom may not have a label in the Assignment tag.
    • Problem: too crowded.
  • Show the value of the J constant as a text label near the line.
  • If present, display the "number of bonds" using different colours for the lines and labels.


<NMREDATA_1D_1H>

Spectrum data, also contains atom labels and couplings.

Data-parser.png

  • If there is content about couplings in the NMREDATA_J tag, should we ignore the coupling information inside the 1D_1H tag?
  • If the NMREDATA_J tag is empty (unassigned couplings),
    • and the coupling information in 1D_1H does not include atom labels, what to display?
    example: 4.1823, S=dd, N=1, L=a J=9.30,4.80;
    • and the coupling information in 1D_1H includes atom labels, try to display the coupled pairs of atoms on the structure.
    example: 4.1823, S=dd, N=1, L=a J=9.30(b), 4.80(c);
  • How to manage the display of couplings when there are several spectrum tags? (e.g. 1D_1H, 1D_13C, 2D_13C_1J_1H)

<NMREDATA_ALATIS> (optional but desired)

Here should come the ALATIS code of the compound. (If possible, it should be included!)

Data-parser.png Pre-implemented: access the ALATIS server, send the 2D SDF moldata and retrieve both the 3D structure, the ALATIS Key code and the InChI. These might be stored by adding them to new tags in the file.