ChemAxon EU User Group Meeting, May 19-20, Budapest
The 2015 ChemAxon European User Group Meeting was held at Hotel Novotel Budapest Centrum May 19 – 20. The weather cooperated for a lovely garden party in the business park where the offices are located. On the evening of May 19 we boarded a boat for a beautiful dinner cruise on the Danube. We boarded in daylight, but ended the cruise in the dark with the many city lights providing an enchanting view of the city.
One hundred eight people represented ChemAxon. Their name badges indicated the projects that they work on. ChemAxon developers were also much in evidence during the ample coffee breaks where they were eager to answer questions and demonstrate the latest software. The 76 guests represented fifty-one institutions. As well as large and small pharmaceutical companies, there were representatives from patent offices, a juice company, publishers, and several universities. Several partners provided demonstrations at the coffee breaks
The broad vision of the company was demonstrated in the changed format of the meeting and highlighted the broad scope of ChemAxon capabilities that are well beyond the more familiar Marvin, chemical calculators, and JChem. The Biomolecule toolkit extends ChemAxon capabilities to modified and native macromolecules; Name to Structure is now a key component of several text-mining applications; Plexus integrates key functionalities; KNIME nodes present ChemAxon capabilities in a pipelining environment; and Compliance Checker identifies compounds that are subject to government restrictions. All of these capabilities emphasize the value of ChemAxon to a company’s software holdings.
Further evidence of the utility of ChemAxon software was illustrated by its incorporation into partner and user’s software. The new collaboration with BSSN will provide tools for incorporation of analytical data into JChem. Use of JChem remains strong: IDBS is moving its chemistry infrastructure to ChemAxon, joining Biochemfusion, Schrödinger, Quattro Research, GSK, Boehringer Ingelheim, AstraZeneca, Novartis, and SureChEMBL. Name to Structure is a key component of text-mining capabilities in KNIME, Linguamatics I2E, Ontochem software, and patent curation at Boehringer Ingelheim and SureChEMBL.
The outside view of ChemAxon software is well described by Gartner research as “providing clients with a technology widely admired as flexible and easy to use.” https://www.gartner.com/doc/3035217/idbs-gets-chemaxons-flexible-chemistry
The keynote conversation entitled “Toward Analytics for Everybody” was presented by Michael Berthold, President of KNIME AG with comments from Mihály Medzihradszky, Head of European Sales at ChemAxon. Berthold said that whereas coders and Workflow Wizards produce analytics, these are consumed by workflow recyclers and by report and visualization consumers. Those producing the tools must remember the ultimate users. To foster this ease of use KNIME integrates tools donated by companies, contributed from research institutions, generated by technology partners, and maintained by KNIME. Templates and abstractions hide complexity from casual users while supervised analytics are available from the KNIME Web Portal. In summary, the vision of “Analytics for the Masses” includes guided access to complex analytics processes at just the right level of detail. To obtain this vision requires an open analytics platform that combines integration of tools and data, transparency, collaboration, and agility.
To provide a more comprehensive look at the ChemAxon offerings, in a departure from previous UGMs, the ChemAxon team presented as a block the first morning of the meeting. For each product category, the team presented a brief overview of its capabilities, what is new in the current release, and the plans for future development.
Overview of Iván Solt reported that the overarching goal of ChemAxon development is to move from the current small-molecule toolkits and separate applications for single-user desktops to a web-based platform that unites various capabilities fosters collaboration, and supports biomolecules and analytical data. Coupled with this vision is the new policy of more frequent releases, perhaps weekly, to catch and fix bugs immediately and release new features quickly.
Efi then went on to summarize the recent improvements to MarvinSketch for providing publication quality drawings. The user can specify IUPAC standard abbreviations in the label editor, parameters for drawing, and apply several built-in and custom definedthe journal style
András Strácz followed with a discussion of Marvin Live, a chemical drawing tool that supports a virtual chemical research meeting room. It tracks the entire project while maintaining a personal idea repository. Ad-hoc discussions and information from web services make the meeting more productive. The associated Meeting Assistant automatically captures all information and saves important structures for use in other applications. By using Meeting Room Service the user specifies the typical authentication, data retention, and backup policies.
Csaba Peltz then presented the ChemAxon Ccompound rRegistration systems. Hidden smart automation is a key component of providing a robust system. Standardizer and Structure Checker ensure that the chemical structures meet standards. An easy-to-use user interface makes registration, search, and export easy. Integration with other capabilities allows one to feed the registration and use and analyze the registered data. One such integration is a ChemAxon Auto Register component for Pipeline Pilot.
Roland Knispel described the web-based Biomolecule Toolkit that bridges the gap between biology and chemistry, particularly for complex biomolecular entities that contain chemical modifications of biological structures. A key capability is the ability to switch between multiple views of the structure even though atom-level information about the structure is stored. With the toolkit one can index, store, and query complex biomolecules and integrate this with existing resources for small molecules. The toolkit handles registration of standard sequences of peptides or nucleic acids, sequences with modifications including cyclization, structural ambiguity, and imprecision or lack of detail. In development is integration of the biomolecular toolkit with Instant JChem and Plexus Design. Later this year will see a biomolecule rendered and combined small and large molecule registration.
András Volford presented JChem Base, which provides chemical intelligence to relational databases. It supports the following chemical searches: substructure, superstructure, full fragment, duplicate, similarity, formula, reaction, polymer, and Markush. It is easily integrated into discovery tools such as IJC, Plexus Connect, Registration, JChem Oracle Cartridge, JChem Web Services, JChem for Office, JChem for SharePoint, and Compliance Checker. In a similar manner, JChem Web Services provides chemical intelligence for web applications, such as chemical structure searching, property calculations, and library enumeration. The Oracle cartridge is in use at GSK, Boehringer Ingelheim, Novartis, and Takeda. The PostgreSQL cartridge, launched officially on the 2015 Budapest UGM, on the other hand provides an opening to the cost effective RDBMS world. It currently provides substructure, full fragment, duplicate, and similarity searches and handles tautomers and stereoisomers.
Max Šauer discussed Instant JChem, (IJC), features that enrich Plexus Suite. In IJC one can import structures and assay data, use chemical terms and substructure filtering, and design forms for easy interpretation of the result. The searching and filtering is enhanced by integration with Spotfire. The newly developed Mass Spectra Display Widget supports similarity search by spectrum as well as retrieval of spectra by structure of chemical query terms. To align with Plexus Connect features, three new query operators have been added to IJC: “Does not start with”, “Does not end with”, and “Does not contain”.
Ákos Papp summarized the features of JChem for Office. Using JChem for Excel one can draw structures inside cells, use calculator functions and filtering, import from JChem databases, and perform structure-activity relationship analyses. In Word, PowerPoint, and Outlook one can add or edit structures, import from a database, perform property calculations, and paste tables from Excel. JChem in OneNote is still a prototype. Continued development has produced a 5-10X speed-up in chemical filtering. The SAR table in Excel now supports an unlimited number of R-groups in the Markush structure—each pair forms a table, if there are multiple hits for a particular pair, the average value is shown. A fixed scale of bond lengths has been implemented for Office documents. The .NET compliant JChem API wrapper has been used in internal applications at several clients. Plans include import using web services, Biomolecule integration, improving the SAR table by providing charts and other aggregate functions in cells and providing a visual R-group selector, and integrating with Office 365 and Office Online.
József Dávid described JChem for SharePoint, which adds chemistry aware capture and search. It also supports calculations in lists. The chemical indexing of data sources provides the basis for chemistry searching. The application supports drawing with Marvin JS, Marvin Sketch, ChemDraw, or Accelrys Draw. All configuration tasks are done through the SharePoint Central Administration interface. One use-case involves a company with JChem for Excel and Sharepoint whose collaborators enter structures directly into the SharePoint and hence back to the Excel files.
Miklós Szabó presented a roadmap of Plexus Suite. It is a web-based application that integrates Instant JChem, web services, biomolecules, and servicesmany more along with ChemAxon’s back-end technologies. It provides the user with spreadsheet, forms, charts, reporting for analysis of data and reagent search plus calculations for compound design. While Instant JChem will continue to serve as a Swiss army knife, Plexus Suite meant to be an alternative platform for cases, where simple web technologies play an important role. It combines ChemAxon’s strong chemistry knowledge with all the technical benefits of a browser-based environment. Integration of the external vendor applications LiveDesign and StarDrop with Plexus Design provides enhanced capabilities. Access points to Pipeline tools, like KNIME and Pipeline Pilot, will also be available<. /p>
Daniel Bonniot discussed Plexus Mining. One might choose to mine for chemical structures in internal documents in Microsoft Office orWord, PDF format, stored in a network drive or a repository system such as Documentum journal articles in PDF format; or patents in XML, HTML, or PDF format. Plexus Mining creates an index of documents mined that includes metadata as well as the chemical structures. By using the web interface there is no software for end-users to install. Its flexible architecture supports customization and integration with existing systems.
Max Šauer discussed Plexus Connect, the central module of Plexus Suite. It is a modular and configurable web services based toolkit that is deployed through thin client services. It provides the main access point to the user’s chemistry data. Plexus Connect now can export structures to Excel, SDF, or text; supports clipboard operations and similarity searches; and provides visualization of images. Soon users will see early results for lengthy searches and the ability to cancel searches.
Ágnes Peragovics presented Plexus Design, the virtual library design module of Plexus Suite, the on-line collaboration tool. Both reaction-based and scaffold-based enumeration methods are supported, as is calculation of the properties of the molecules.
Dániel Szisz summarized the ChemAxon Discovery Tools. He first discussed changes to similarity searches that increases the speed of concurrent descriptor generation, using multiple search queries, or overlap analysis of chemical libraries. It is available as an API toolkit and in JChem. The ChemAxon Calculator Plugins, available in all main ChemAxon products, now contains a solubility predictor that is also available as a KNIME node. In collaboration with A.lex Avdeef the pKa calculator has been fine-tuned to recognize the effect of stereoelectronic effects on pKa. The influence of cis-trans isomerization and tautomerization has also been implemented. Tautomer generation and canonicalization has also been improved. Hydrophil-Lipophil Balance number is currently in cxcalc, but will be integrated into the Calculator Plugins.
Attila Szabó described the ChemAxon nodes in KNIME, the pipelining program. These include inputting structures from MarvinSketch, performing chemical reactions, calculating chemical descriptors, filtering by chemical terms, viewing structures with MarvinView, Markush enumeration, and reporting via the KNIME Web portal.
Daniel Bonniot presented the issue of naming chemical structures and of recognizing chemical names in documents. Additionally, Document can annotate PDF, text, patent XML, and HTML documents producing a chemicallyn annotated HTML document. The naming suite is also available in Plexus. As always, there have been improvements to Structure to Name as well as the English, Chinese, and Japanese Name and Document to Structure.
Árpád Figyelmesi described ChemCurator, which provides computer-assisted chemical data extraction. It includes English, Chinese, and Japanese, Name to Structure, the Markush editor, Structure Checker, and chemical search and representation. Input is in the form of files, Google Patents, IFI claims, and Recognized structure Iimages from CliDE and OSRA. The Markush representation contains R-groups, atom and bond lists, position variations, link nodes, repeating units, and homology groups. It provides search, enumeration, hit and non-hit visualization, overlap, and Markush Composer. The non-hit identification is based on Markush Overlap and Maximal Common Substructure technologies—it doesn’t require enumeration and has no limitations on the number of structures represented by the Markush structure. Markush Composer generates a Markush structure from a set of molecules.
Norbert Sas presented Compliance Checker, the desktop application that checks if a compound violates any laws for controlled substances. It detects various forms such as “salt or ether of” or “any stereoisomer of” while also recognizing exclusions. It is available as a web client, command line, Windows client, or SOAP/REST interface. Currently the application recognizes laws from nineteen countries.
Nóra Lapusnyik introduced the partner session by reminding the group that ChemAxon has more than fifty integration partners, approximately 3000 academic users, and installations in 600 commercial organizations. .
Andreas Nicklas presented anic’s software for improving process performance, particularly in the context of preparing larger amounts of compounds discovered in research and development. They have incorporated MarvinSketch.
Jan Holst Jensen described their software for registration and analysis of modified peptides and proteins. By incorporating JChem Cartridge, JChem for Office, JChem PostgreSQL Cartridge, and MarvinSketch into their Proteax toolkit they provide a registration system that spans small molecules to macromolecules.
Alexander Steudle described their D360 product that integrates research data, analysis, and collaboration tools for use in drug discovery, preclinical testing, and clinical studies. A key feature of the software is that any scientist can construct a data view without knowing data format, data location, Oracle, etc. ChemAxon tools used include Biomolecule Toolkit, JChem Cartridge, Compound Registration, MarvinSketch, and JChem for Office, and property calculators.
Michael Dippolito reminded the audience that ChemCart from DeltaSoft integrates software from various sources and makes them available across the institution. Pre-packaged customizable applications include ELN, Reagent Inventory, Registration, BioAssay, Sample Tracking, and Structure Activity Browsing. ChemAxon tools used include JChem Base and MarvinSketch.
Anthony Barnado described the newly announced collaboration between IDBS and ChemAxon. In the past IDBS used their own chemistry software. As part of the new collaboration IDBS will integrate ChemAxon into their E-WorkBook ELN and ActivityBase. ChemAxon software will be used to store all chemistry structural data across the IDBS platform. The ChemAxon tools to be used include Calculator plugins, JChem Base, and Reactor.
Jon Fuller reported that over 90% of ChemAxon’s cheminformatics functionality is included in KNIME nodes. /p>
Andrew Hinton presented Linguamatics I2E that provides context to chemistry. The Linguamatics platform uses natural language processing to mine text. For example, they use linguistics to establish relationships such as name to structure and enzyme name to Entrez Gene ID. The text mining is enhanced by access to multiple data sources. ChemAxon tools used include JChem for Office, MarvinSketch, and Naming. The technology also supports linking within documents for example between the synthesis of a compound and its reported biological activity.
Róbert Kiss described their offerings as an online drug discovery platform. A key component is their compound procurement service. The software provides ChemAxon calculators for filtering out unwanted compounds and a diversity selection option to provide the most diverse compounds. A further option is to identify compounds by virtual structure-based screening.
Lutz Weber described their use of ChemAxon’s Document Annotation and Naming capabilities as part of their semantic knowledge discovery.
Andreas Witte described LiveDesign, an interactive tool that records all proposed chemical structures, supports property calculations and KNIME workflows, and provides validated 3D models in a PyMOL browser. It incorporates ChemAxon Markush library enumeration, as well as JChem Base, Naming, Calculator Plugins, and Plexus Suite.
Cathrin Mayer presented the use of ChemAxon small-molecule tools and HELM at Quattro Research, a software and service company, whose main focus has been on the management and registration of biologics. They have incorporated JChem Base, JChem Cartridge, and MarvinSketch into their products. In addition, Quattro developed Exchangeable HELM in coordination with ChemAxon and under contract from the Pistoia Alliance. Exchangeable HELM provides an unambiguous description of modified macromolecules. Most recently they have developed, with Stefan Klostermann from Roche, a HELM antibody editor.
Richard Bolton discussed the problem of the lack of consistent results from stereochemical searches. This problem needs to be solved because they plan to have no registrar intervention as users mark-up and register compounds with complex stereochemistry. In the current system, based on v2000 mol file format, they see different stereochemical search results depending on the search system as well as incorrect identification/misidentification of molecules that contain complex stereochemistry. To solve these problems they will move all applications to the v3000 mol format and reconfigure IJC to search across and return this new structure field.
Anna Pelliccioli presented their use of the Markush cartridge to track chemical series for project teams. The goal of this work was to not only make it easier for project teams to organize their compounds, but also to store the knowledge of the progression of series from hit to lead to clinical candidate or from hit to lead to abandoned. The application has a web-based front end using ChemDraw, a service layer that uses RDKit for structure validation, and an Oracle database running JChem with Markush extensions. Each series contains an overview page, detail pages illustrating the thinking behind a particular compound, and tags to highlight the results of key biological tests. The application is integrated with Spotfire.
Gábor Põcze (ComCix) and András Danscó (Egis) discussed Egis’s migration from ISIS/base by ComCix. EGIS is a Hungarian pharmaceutical company whereas ComCix is the innovation development arm of Darholding Network, a holding company with corporations in Hungary, US, UK, and Ireland. At the start of the project ComCix had developed Kamilla, laboratory management software that includes an ELN, reagent and product inventories, etc. It was being used in several research laboratories. For the EGIS project it was important to not only migrate the legacy data but also support Egis’s workflows and provide Hungarian language support. The data migration was not smooth, but was finally successful. The main causes of migration problems were old undetected errors and stereochemical issues—how to specify compounds with a racemic atom, two possible resolved substances of unknown stereochemistry; axial chirality as in allenes or ortho-substituted biphenyls, planar chirality in substituted ferrocenes, and helical chirality in polycyclic hydrocarbons. In their experience the best training is to have the end-users test the software.
Matt Segall reminded the audience that StarDrop provides data visualization and R group analysis to aid the design of compounds with the optimum balance of properties. It is easy to add customer’s own applications. Integrating of StarDrop with ChemAxon’s Plexus platform provides a seamlessly integrated workflow and user interface enabling the use of Plexus for scaffold- and reaction-based enumeration of compound libraries. These are returned as data sets within StarDrop for detailed analysis, visualization and selection of compounds with a balance of the properties required for success. A thin Web Kit client helped meet the challenge of merging the StarDrop C++/QT architecture and Python scripting with the Plexus Java and web interface
Thierry Kogej discussed the issues when one uses collaborations, partnerships, and open-innovation to increase the value and diversity of a company’s compound collection. In a project designed to increase the value of the existing compound collection, AstraZeneca and Bayer are screening the other’s library against a target that the other is not pursuing. In the process they discovered that 3.3% of the total compounds involved are in both collections and of these 95% are public domain compounds. A second effort involves allowing academics and CROs to submit (blinded) compounds for testing against AstraZeneca targets. Profiling the submitted compounds has been outsourced to ChemAxon who does novelty and compliance checks, performs physical property calculations, and reports back to AstraZeneca. In the third effort, AstraZeneca profiles external chemical libraries for novelty to AstraZeneca, absence of structural alerts, attractive physical properties, novel 2D substructure or 3D shape. Structures that pass are the clustered and reviewed by chemists.
Burkhard Schaefer reprorted that they and ChemAxon have partnered to bring analytical data support to ChemAxon products, such as Plexus and Instant JChem. Their signature product is Seahorse Scientific Workbench, which is a vendor-neutral software suite for capturing, analyzing, and sharing analytical data from multiple analytical techniques. The challenge is to integrate this analytical data with the putative chemical structure of the substance analyzed. The first deliverable of this collaboration, an Instant JChem plugin for mass spectrometry data, has been developed and deployed at a customer site. The solution supports a number of instrument data formats as well as the open ASTM AnIML format. With this widget, users can perform spectrum similarity searches, as well as retrieve spectra by structure or other chemical query terms. Search results or data library browsing can display the spectra alongside chemical information in an IJC display widget that can zoom, identify peaks, and display query and result spectra alongside each other. As well as integrating with Plexus, the plan is to provide functionality to support other analytical methods such as NMR, IR, UV/Vis, LC and GC data.
Jens Fracke reported that most departments within the Bayer Group use SharePoint. This presents a challenge when a specific user group, such as R&D, needs to add capabilities to the global platform. ChemAxon has been very helpful in getting over this problem. The current solution is to install a dedicated server, Chemical Intelligence 4 SharePoint, that incorporates JChem for SharePoint rather than to install it within the global SharePoint enterprise. The new server also includes a collection of some twenty thousand documents. The downside to this solution is that there cost is higher than if it had been possible to add to the global Sharepoint, that the servers require constant attention, and that customer support now had different needs. So far JChem4SP has approximately 200 users who mostly use it to search for documents that contain a molecule with a specific substructure. Their near-term plans are to support order management tracking. Michael J. Bodkin from Evotec discussed how network modeling can lead to understanding the relationships between disease networks and the properties of small molecules that impact them.
Michael Bodkin presented musings on how algorithms, evolutions, and network-based approaches can add knowledge to the data routinely captured in molecular discovery. He reminded us that most known drugs interact with more than one target, as do most experimental compounds. The connections between the compounds and biological targets form a very complex network as can be seen at the CHEMBL website. Further extension of this is found in the Similarity Ensemble Approach of Keiser, et al. in which the mapping between drugs and targets suggests new targets that might be hit by the drugs. Mapping the targets back to pharmacological profile can be aided by recursive partitioning. Because chemical space is enormous and current screening databases contain approximately 3 10-6 % of possible compounds, automated structure design is attractive. Reactant vectors are useful for this task. Networks in molecular discovery include phenotypic hits with target and pathway annotations and hit expansion by target.
Matthias Negril (Boehringer Ingelheim Phama) and Árpád Figyelmesi (ChemAxon) presented their efforts to automate chemistry-enriched patent curation. The goal is to make a database of chemical structures in patents with their associated bioactivity data vis-à-vis targets and diseases. Central to this effort is KNIME and various ChemAxon tools, particularly ChemCurator. The chemistry in patents is contained in text as names, images, attachment structure files, tables, and as Markush structures. KNIME contains a number of ChemAxon tools, especially ChemCurator, but also text/data-mining capabilities from Linguamatics and optical structure recognition with CliDE. The two branches work together with ChemCurator creating sdf output from the patents. In the KNIME branch OCR errors are cleaned up and ChemAxon standardizer and checker applied. Structure images are replaced by IUPAC names; text mining with Linguamatics I2E extracts the bioactivity data from tables, claims, and diseases. The combined data is then visualized in ChemCurator. Unresolved issues include the lack of homogeneity in patent data and the error rate in chemistry recognition, particularly
George Papadatos updated the group on SureChEMBL, the open-source patent database found at www.surechembl.org. It currently contains 16 million chemical structures from patent literature. Background: Digital Science/Macmillan built and distributed this database previously. When it no longer met Macmillan’s company vision, they donated the database to EMBL-EBI. SureChEMBL covers WO patents, EP and US applications and granted patents, and JP abstracts. The automated curation process takes 2-7 days and identifies not only the chemical structures in the patents, but also the context of the structures. One can search the database by keywords, patent number, patent authority, chemical structure, filter the results by date or document section, and export the results.
Alexandre Varnek discussed their work to develop decision-tree models that predict if a Michael reaction will proceed under specified conditions. They used both condensed reaction graphs simplified into ISIDA fragment descriptors and electron effect descriptors to describe a reaction. They modeled 222 different reactions under eight conditions plus 24 cases where no reaction is observed. In general the random-forest models that used the ISIDA descriptors performed the best with ROC scores ranging from 0.7 to 1.0. Some of the models predicted a test set of 52 reactions quite well, but some of them failed, perhaps because more data is needed to train the original models. The work is published: G. Marcou, et al. J. Chem. Inf. Model. 2015, 55, 239-250.
Zoltán Simon from Printnet described their Drug Profile Matching research that builds on the idea that most drugs exert the effects by interacting with more than one target. They calculated the strength of interactions of each 1,200 FDA approved drug with a set of non-target proteins and demonstrated the statistical signifacnce between the interaction profile and 177 pharmacological effects. The pharmacological effects in turn map to 77 drug target categories. The predictive power of the method was validated by a detailed analysis of four effect categories. This work is published: Simon, et al. J. Chem. Inf. Model. 2012, 52, 134-145.