InChiKey insertion technique for compound-specific and any-compound proximity search

presentation · 8 years ago
by Stephen Boyer (IBM Research)
JChem Base Naming

The combined technologies of text analytics and name-to-structure conversions for reading and processing molecular structures provide researchers the ability to build large databases of structures and derive important relationships previously inaccessible, a capability important to discovery and innovation. Our previous work took this approach to produce SMILES strings that represented chemical structures used as input for subsequent applications, rendering the scientific and patent literature searchable by structure/substructure programs. We now report the additional ability to detect, normalize, and replace chemical names in documents with InChiKeys and then index the combined text and embedded InChi’s using SOLR, a Lucene-based full text-indexing engine.

Download slides