Difference between revisions of "Parser"

From NMReDATA
Jump to: navigation, search
(Determine how to read the NMREDATA tags)
(Determine how to read the NMREDATA tags)
Line 26: Line 26:
 
* determine the line separator  
 
* determine the line separator  
  
For version 1: ASCII 10  
+
**For version 1: ASCII 10  
 
+
Analyse directly the tag (hopen there is no missing/spourious ASCII 10 in the text)
For version >1: "\" + ASCII 10  
+
**For version >1: "\" + ASCII 10  
Before analysing the text of the, ignore ASCII 10 and replace "\" with ASCII 10.
+
Before analysing the text of the TAG, ignore ASCII 10 and replace "\" with ASCII 10.
 
 
When writing the .sdf file, used "\" + ASCII 10 as line separator.
 
  
 
=== Read the NMREDATA tags ===
 
=== Read the NMREDATA tags ===

Revision as of 09:26, 2 May 2019

Reading the SDF file

Libraries for divers languages exist to read SDF files. This section is only relevant if you write your own reader/writer

We recommend having in mind, when reading, that an SDF file will have to be written at some point later. We recommend to write tags them in the same oder, but don't expect all writers to do so, so be ready for the tags to appear in any order.

If not too large, the content of the input should be kept in memory so that it can be written later. Since SDF files may contain divers tags (not only the NMREDATA tags) they should all be written in the output SDF file.

We recommend to

  • Open the SDF file
  • Read/store the molblock as chain of characters
  • Read/store the TAGS as chain of characters
  • Close the file

Analyses/check the molblock if needed (see possible object structure of NMReDATA).

Determine how to read the NMREDATA tags

Scan the tags and list the index of the ones including NMREDATA_ in their name

Read the NMReDATA tags (see below). Keep in mind the end of line problem .

First read and analyse the NMREDATA_VERSION to

  • determine what character should be ignored (ASCII 10, except for version 1)
  • determine the line separator
    • For version 1: ASCII 10

Analyse directly the tag (hopen there is no missing/spourious ASCII 10 in the text)

    • For version >1: "\" + ASCII 10

Before analysing the text of the TAG, ignore ASCII 10 and replace "\" with ASCII 10.

Read the NMREDATA tags

Many simple tags have no particular format. (NMREDATA_SOLVENT, NMREDATA_VERSION, etc.)

But most "complex" tags (NMREDATAT_ASSIGNMENT, NMREDATA_J, NMREDATA_1H, etc.) all have a common general structure:

Two type of lines should be distinguished:

  • Property lines
  • Item from a list

The "property lines" contain a serie of characters (letters) followed the "=" sign followed by the value of the variable.

The "property lines" should be identified as such to distinguish them from element of a list. Property lines should be located before the list, but some may follow the list (not recommended, but possible).

Note that a property may appear more than once. Example:

Author=John
Author=Paul

In this case it should be stored as an array.

We recommend the store the list as an array of array of characters, and analyse it later.

This allows to have a single functions reading many types of tags.

Analyse the individual NMR tag

Depending on the program, what data will be extracted will vary, but all Properties (whether they are understood or not) should be send to the output when writing the file. They don't need to be analysed. Same argument for the properties of peaks. They should all be stored (even if not understood) to be writting later.