Meeting Report from Yvonne Martin - ChemAxon User Group Meeting, Budapest, May 17-18, 2011
Integration of Software within a Company | Name to Structure and Related Capabilities such as Structure Checker | Lightning-fast Partner Presentations | Structure and Markush Searching | Tools for Structure-activity Analysis | SharePointDesktop tools for end users | Marvin | Closing words and Social Events | Summary | UGM archive - all presentations
In summary, the European ChemAxon User Group Meeting showed again the enormous talent, openness, and enthusiasm that characterizes the company. The event at the Széchenyi Bath and Spa exemplified the spirit of the meeting: organized but not stuffy; purposeful but not rigid; original but not irrelevant. I especially appreciated hearing about such unique efforts as enhancing Marvin to support drawing complex inorganic structures and www.chemicalize.org whose goal is to provide chemical structures where ever a chemical or common name appears in a document. ChemAxon continues to show the openness, enthusiasm and talent that typify the company and has propelled it into a leading cheminformatics software vendor.
The 2011 European ChemAxon User Group Meeting was held May 17 – 19th at the Danubius Thermal and Grand Hotel Margitsziget in Budapest. One hundred and ten non-ChemAxon participants registered for the meeting. This is a large increase from last year. In addition, 83 ChemAxon employees were present. The attendees represented 54 companies, the European and Hungarian patent/intellectual property offices, and 18 universities. Non-European delegates came from Japan (6), the USA (5) and India (4). Pharmaceutical, agrichemical, and contract research companies represented include: Abbott, Almirall, AstraZeneca, Bayer CropScience, Boehringer Ingelheim Pharma, Chiesi Farmaceutici, Daiichi Sankyo, Eli Lilly, F. Hoffmann-La Roche, Evotec, Gedeon Richter, GlaxoSmithKline, H. Lundbeck, Merck Serono, Sanofi-Aventis, and Teva. The following ChemAxon partners were represented: AKos, Aureus Sciences, Biochemfusion, Chemonaut, Contur, Delta Informatika, DeltaSoft, eAdmet, Elsevier, IDBS, Infocom, John Wiley & Sons, KineMatik, KNIME.com, Lhasa Limited, Linguamatics, Patcore, Schrödinger, Sysment, TIBCO Spotfire and Vans Information Ltd. In short, the participants included a broad range of perspectives on the utility of the ChemAxon software and of the company in general.
The meeting was preceded by a day with separate tracks for user and developer training, and followed by a half-day meeting devoted to Markush patent searching. The end-user training included half-hour introductions to each application, rather than concentrating on just a few. Because I have been using the software I was familiar with most of the capabilities, but also learned some hints and new features.
There were 30 talks in the formal meeting itself: 12 of these were by ChemAxon. Eight major themes were discussed: integration of ChemAxon software within a company, Name to Structure and related capabilities such as Structure Checker and Standardizer, structure searching, tools for structure-activity analysis, SharePoint, Desk-top tools for end-users, Marvin, and the lightning-fast partner presentations.
The meeting started with Ashley George from GlaxoSmithKline being asked questions by Alex Drijver about the challenges facing pharma and how ChemAxon can help. Among the many challenges Ashley emphasized externalization of capabilities and patent cliffs that will limit resources as the major challenges. In addition, IT must embrace the newer technologies that are/will be used to access IT resources. To emphasize this latter point, Ashley used an iPad for his formal presentation. (Learn more about the presentation here.)
GlaxoSmithKline chose to meet these challenges by limiting the number of software vendors that it uses. For chemical structure handling it uses ChemAxon. This choice was based on a common philosophy between ChemAxon and GlaxoSmithKline, mutual trust, and ChemAxon’s agility and responsiveness. GlaxoSmithKline has adopted the philosophy that as much as possible, the applications will be outside their firewall so that vendors can support them. Only confidential data will be inside the firewall. He then described their effort to make GSK’s antimalarial testing structure-activity data available to outside researchers. Rejecting the original idea of supplying an sdf file with all of the information, they decided to place it on an external site, specifically the Amazon cloud. It uses Instant JChem for access to the structure-activity information. In addition other files, communications, schedules, etc. can be stored and accessed there. The goal is to open this site to 60-80 users who might access it from an internet cafe or with a 3G enabled device. This pilot project is also a template for how GlaxoSmithKline will share information with other outside collaborators.
Continuing with GlaxoSmithKline, Jonathan Lee presented the ChemAxon Registration Service that was developed in collaboration with GlaxoSmithKline. The goal was to replace and improve the current registration service. It is designed as a service module that is responsible for the submission and administration of compounds into a registration store. Input to the service can be directly from CambridgeSoft’s ELN, from a chemist web client, or bulk loading. Once potential compounds are identified, the registrar uses a web client to validate or correct the entries. The registration service accepts the compounds, applies the business rules, and deposits them in the data store. Special features of the service include the ability to change structures, an audit history of changes, a salt/solvent dictionary, the ability to assign users and roles, and the ability to configure Standardizer and Structure Checker to meet a company’s business rules. Tautomer recognition and stereochemistry intelligence are built in. As well as single compounds (salt and solvent removed), the application can register multicomponent and other exotic species such as alternative structures, formulations, mixtures, small Markush structures and indefinite locants. The Registration Service uses structures in the MOL V3000 format and chemically significant metadata. It uses Service Oriented Architecture to store the structures in an Oracle or MySQL database using a Tomcat Web Container with JChem Base. The application will see various enhancements including Marvin input format, full Oracle DB support, pluggable registration ID generation, and still more stereochemistry features. Registration of biologicals is also on the horizon. (Learn more about the presentation here.)
Catherine Reisser presented the first of several Evotec talks. It was entitled “Migrating to ChemAxon – the Good, the Bad, and the Ugly” which summarized a seven-to-eight year project. The case for migration was to optimize resources while providing access to more users, preferably with a multi-site license. It was required that any replacement application would make it easier to transfer data between sites or with clients, provide good support, and have the same or better performance, functionality and user-friendliness. The first project was to replace their library enumeration application, which was based on SMIRKS and SMILES. For this they selected the KNIME/Infocom application that uses ChemAxon nodes. In contrast to the Reactor GUI or command line the KNIME applications provides support for multistep reactions. This was a smooth transition. The second project migrated a small activity database from a local ISIS Base to Instant JChem. This is currently the preferred tool for chemistry and biology project databases. They then tackled migrating Chemical Stores, their application that contains information on 10k bottles associated with 4k structures. Although it was not possible to directly export from the current application, with some effort they were able to export from the database. This application is about to go live, even though the new application at the moment does not allow tracking of compounds in the lab. It will provide access to more catalog data and be integrated with Evotec’s ordering and quote system. Lastly, JChem4XL was chosen for migrating a chemically aware spreadsheet. In spite of some resistance from chemists, the users like being able to calculate chemical properties and use R-group decomposition and do simple enumerations. Users complain about switching to Marvin, copy/paste to PowerPoint, and difficulty in installing and upgrading. ChemAxon has been on the ball to help solve these problems. In summary, key to a successful migration is to focus on a small team of computer-literate scientists, to provide equivalent or better functionality especially if the new software facilitates daily tasks and integrates with other tools that they already use, and to allow plenty of time for testing, training, and support. (Learn more about the presentation here.)
Iván Solt from ChemAxon presented an update on Structure Checker, in silico surgery for molecules. Structure Checker deals with drawing and scanning errors, inconsistent and ambiguous representations, aliases, errors in reaction mapping and many other problems that plague the recognition of a structure from a drawing. Structure Checker is a configurable program that can be run via MarvinSketch, a standalone GUI, in batch, and via Chemical Terms in Instant JChem, JChem for Excel, JChem Cartridges, and JChem Web Services. There are nine flagged problems that definitely need fixing: errors in aromaticity, chiral flags, a coordination system, a metallocene, OCR, ring strain, a reaction map, a valence error, and wedge bonds. A total of 30 other issues are checked and flagged as possible errors. New in the current version is the ability for the user to define substructures that need checking, and checkers for OCR errors, empty structures, lone-pairs, molecular charge due to missing or added protons, possible race-mates, rare and star atoms. The configurations setting allow everything from totally automatic to totally manual fixing of structures. Plans include enhancements to Reaction Checker to identify unbalanced reactions, an Invalid R-group Checker, and a Polymer Checker. (Learn more about the presentation here.)
Daniel Bonniot from ChemAxon described their applications that convert names to structures. This includes recognizing names in documents. They now provide support for generating names featuring isotopes and pseudo-asymmetric stereo centers. It uses a dictionary to convert traditional names to structures or to generate traditional names from a drawn structure as well as to systematic, IUPAC and CAS names. The current version shows continuous improvements in the number of names that can be converted to structures and the fraction of these that are correct. Since version 5.4 names can be extracted from PDF documents; it is planned to support extraction from MSWord and Excel. (Learn more about the presentation here.)
Continuing on this theme, Nicko Goncharoff and Andrew Hinton from Digital Science SureChem presented their benchmarking on name to structure tools. SureChem provides a structure and text searchable database of 12 million compounds in patents. They compared the ability of four name-to-structure tools to discover these structures from patent text. Common errors include OCR problems, ambiguous inorganic names, and incorrect nomenclature (in 26% of the molecules). They use an automated “spell checking” program to identify OCR errors. When four name-to-structure tools are used, they are able to convert 2/3 of the names. Using all four tools increases the conversions by 40%. Approximately 26% of the structures are generated from only one tool. It is a challenge that 20% of the names in a patent generate more than one structure. They indicated that the ChemAxon tool is comparable to competitions, improving rapidly, and easiest to use. However, there are still challenges in the name-to-structure arena. (Learn more about the presentation here.)
Sorel Muresan from AstraZeneca presented their evaluation of name-to-structure tools as well as CaffeineFix, a program from Roger Sayle that identifies errors in OCR of chemical names. They focus on the IBM and GVKBIO databases of patents as an early source of competitor structure-activity relationships. IUPAC names are especially difficult to recognize because of the need to match nesting brackets and parentheses. Pre-processing by CaffeineFix and the use of several name-to-structure programs greatly improves the identification of chemical structures in patent databases. (Learn more about the presentation here.)
Alex Allardyce from ChemAxon described chemicalize.org, a free web application that converts names to structures. As well, it supports structure and text searching and property calculations on the structures. The web site provides structure images as well as plots such as of pH-log D profiles. Results are downloadable. Chemicalize.org uses MySQL as the database engine, Apache Tomcat as a servlet container, and the ChemAxon tools - Marvin, Name to Structure, Document to Structure, JChem Base, JChem Web Services, Standardizer, MCES (for hit highlighting) and Calculator Plugins. They have chemicalized all Wikipedia pages that contain chemboxes. So far, 54,000 unique visitors have visited 310,000 urls. Four million chemical names have yielded 250,000 unique names from which 180,000 structures have been extracted. Most users view only a few pages, but some explore more than 20. In the future chemicalize.org will remain free. ChemAxon is working on sorting and ordering results and improving text search. They intend to provide personalization and login to include personal search history, profiles, dictionaries, and calculation/search parameter settings. In addition improvements will be made to interact better with web sites using browser plugins. (Learn more about the presentation here.)
The Partner Lightning Round Session included talks from Akos GmbH; Aureus Sciences; Biochemfusion ApS; Contur; eADMET; IDBS; Infocom; KineMatik; KNIME GmBH; Linguamatics Ltd.; Sysment and TIBCO Spotfire.
Aureus has been a partner of ChemAxon for more than seven years. It provides a database of >3 M biological data on 775,900 ligands for 1626 human targets. AurPROFILER uses JChem Base, Calculator, and similarity based structure search. It then shows the pharmacological profile of the similar compounds identified. (Learn more about the presentation here.) Biochemfusion integrates information on native and chemically modified proteins. It is based on the Sysment Notebook, which integrates ChemAxon with sequence information. This allows one to switch between sequence-oriented and structure-oriented molecular representations. (Learn more about the presentation here.) CONTUR uses ChemAxon components in its ELN. Although there had been some problems with Mac and Linux, this was solved by switching to a Rich Internet Application. (Learn more about the presentation here.) eADMET is a new company that focuses on supporting the development of QSAR models. ChemAxon tools are used to preprocess molecules and calculate their properties to add to descriptors calculated with 18 other programs. Capabilities are provided to select descriptors and build and validate models. (Learn more about the presentation here.) IDBS (InforSense) provides ActivityBase, an automated test management system; the scientific notebook, E-WorkBook; and InforSense Suite for R&D analytics and visualizations. E-WorkBook provides support for parallel synthesis by integrating ChemAxon Reactor and structure-to-name capability. IDBS also provides its users with ChemAxon tools for logP and logD calculations. (Learn more about the presentation here.) Infocom provides JChem Extensions for KNIME workflow. These support over 90% of ChemAxon’s cheminformatics functionality (everything except Instant JChem, JChem for Excel, and JChem Web services). (Learn more about the presentation here.) KineMatik’s eNovator includes their Electronic Laboratory Notebook (ELN) that uses an OpenText Content Server and Microsoft SharePoint 2010 complete with ChemAxon Marvin integration. (Learn more about the presentation here.) The KNIME presentation emphasized that besides the rich functionality in KNIME supported by ChemAxon the Marvin Chemistry Extensions are now available to all KNIME users free of charge. (Learn more about the presentation here.) Linguamatics I2E is a text mining natural-language processing platform that supports interactive querying of web documents to reveal structured relationships. It uses ChemAxon’s tools to extract chemical information from text. It has partnered with ChemAxon for the EU research project ChiKEL that will provide an automated generation of ontology hierarchy for structures found as well as integrating chemical knowledge for text mining. (Learn more about the presentation here.) Sysment presented their ELN that combines the small molecules of traditional medicinal chemistry using ChemAxon tools and biopharmaceuticals handled by Biochemfusion Proteax. It is designed to accept many different plug-in modules. (Learn more about the presentation here.) The final partner presentation described the integration of MarvinView and JChem Cartridge into TIBCO Spotfire. It shows chemical structures in multiple linked visualizations and SAR tables. (Learn more about the presentation here.)
Szabolcs Csepregi from ChemAxon presented an overview of the many new and planned developments in JChem Base, Cartridge, Web Services, and Markush search. Notable features from the overview: Structure search is faster; for example, a search of 19.5 million compounds that returns two hits completes in 0.91 seconds and one that returns 6,000 hits in 1.3 seconds. Searching of molecule, reaction, Markush, query, and mixed tables are possible. Of special interest is the ability to do a similarity search in a reaction database. Web applications can use Java Server Pages with Marvin applets or AJAX with JChem Web Services without needing Java. New is the ability to use Composite database engine. In addition, homology groups can now be searched by properties. Search in version 5.4 uses MCS routines to color similarity search hits for easier visualization, yields more consistent R-tables from symmetrical scaffolds, and enhances multi-threading. There is now an enantiomer stereo search option, and ECFP and FCFP similarity searches are available. JChem Base and Web Services now allow one to filter out duplicate hit structures as a table option. New JChem Web Services include molecule search in lists, retrieve or export table data, and Markush search and enumeration. Within Markush, one can include properties for homology atoms such as cycloalkyl (C3-7, SAT). For Markush there are also new homology groups and improved R-group hit visualization, for example to show only relevant R-groups. In the just released version 5.5 Markush search performance has been further improved. It also supports sophisticated formula search such as excluded atom type, polymers, isotopes, etc. and supports new stereochemistry types such as syn/anti, cis/trans of cumulenes. In 5.5 Markush search performance is enhanced up to 7-fold over 5.4, there is support for large combinatorial Markush structures with up to thousands of R-group definitions, and simple R-group queries of Markush databases are supported. Work is underway to further speed up Markush searches and to add new query features such as full support of explicit hydrogen atoms, additional atom query properties, and the option to switch on or off translation of homology groups. Relevancy ranking of Markush search hits is also on the priority list as is MCS search type for Markush searches. Plans for JChem Base, Cartridge, and Web Services include supporting a computational cluster, R-group decomposition on GUIs and cartridge, pivot layout of R-group decomposition, and API support for arbitrary table structure using JChem Cartridge index tables. (Learn more about the presentation here.)
Brian Larner from Thomson Reuters reminded the audience that patents are written to be as obscure as possible. However, they contain the first release of important technical information about a company’s invention—sometimes years before a scientific paper is published, if it ever is. (“70-90% of the information in patents is never published anywhere else.”) The task of Thomson Reuters is to translate the information in a patent into a concise summary of the invention written in plain English so that it will be useful to others. Thomson Reuters is the provider of the Markush structures for the Markush Structure Searching project. (Learn more about the presentation here.)
Tímea Polgár then provided an update of the ChemAxon Markush structure enumeration and search of the Thomson Reuter’s Merged Markush Service (MMS) database, Derwent World Patent Index of the patent data, and Derwent Chemistry Resource of exemplified structures. All three are accessed through Instant JChem. Marvin is the query builder that supports ten special query features including atom and bond lists, atom and bond topology, stereochemistry, link nodes, and position variation bonds. More query features will be in the Summer 2011 release. Hit lists can be viewed with relevant R-group visualization, conditional formatting, sorted, archived, and combined with other hit lists. Hits from substructure searching of Markush structures are displayed aligned with the query, with the query colored in the hit, and as a new Markush that summarizes the variable positions. Markush enumeration in both Marvin and Instant JChem supports full enumeration, enumeration of selected parts only, random enumeration, and property filters for enumeration. It can also be used to calculate the exact size of huge Markush libraries. JChem Markush enumeration, search, and structure enumeration are also available in KNIME. (Learn more about the presentation here.)
Dr. Guy de Weck from the pRED Informatics group of F. Hoffmann La-Roche described the workflow to establish if a chemical structure is novel. This involves searching MMS as well as MARPAT and Reaxys. The previous workflow is extremely labor-intensive with a maximum throughput of 60 Markush structures per day. He then showed how ChemAxon technology helps to assess structural proximity in patent space. A substructure search of the demo database of 579 Markush structures with ChemAxon tools produced 97 hits. The Markush of one of the hits contained 41310 structures, which, when enumerated with the “Markush reduction according to the hit”, produced eight structures, which were all close to the exemplified structures in the patent. The Markush of the second hit contained 1042 structures, which, when enumerated according to the hit, yielded six structures, which were distantly related to the exemplified structures. The Markush of a third hit contained 1016 structures, which enumerated to two Markush structures. These had to be partially enumerated in order to assess how close they are to the exemplified structures. The use of ChemAxon tools saved time in the novelty searches. He also suggested improvements to the Markush Enumeration tool, many of which had been implemented or planned by ChemAxon: Supply the ability to enumerate only one R group at a time. When the option “show R groups” is selected, the partially enumerated structure should be visible even if the R group list extends over one page. Provide an explorer-like display of enumerated structures. Display search and filtering steps in the results window. Display the exemplified structures of the patent highlighting the search query. Provide a federated novelty search over all relevant sources. (Learn more about the presentation here.)
I opened the Wednesday morning session with a talk entitled “Perspiration, Inspiration, and Happenstance in Scientific Discoveries”. Although the origin of QSAR at Pomona College has been described many times, these tales ignore the fact that 14 years elapsed from the start of the collaboration between Hansch and Muir until the first QSAR publication appeared. Happenstance or luck played a role in that Toshio Fujita was very knowledgeable about the structure-activity relationships of plant growth regulators when he became a post-doc with Corwin Hansch. An even more remarkable piece of luck was that Donald McIntyre, a geology professor at Pomona, was enamored with computers in 1960 and even programmed the computer to solve the first QSAR equation. Fujita had the inspiration to change to focus from relationships of bioactivity with one property to consider the simultaneous influences of several properties. The story of the invention of SMILES was an inspiration of Dave Weininger to provide a tool for chemical structure entry into a computer that does not involve drawing the structure, but merely typing some characters. The development of the CLOGP program required a lot of work (perspiration) but also the inspiration that the computer can also work on partial structures. The happenstance that led to Aladdin, one of the first 3D searching programs, was that John van Drie was available as a part-time contractor. In this case the inspiration was that we were already storing coordinates of molecules in a chemical information database and that medicinal chemists had been asking for new templates that would hold pharmacophoric atoms in the correct geometry. Pure happenstance led Bob Pearlman to invent CONCORD for his solvation studies without knowing that others were developing programs to do 3D searching of chemical structures. Lastly, happenstance led me to read a paper by Brint and Willett on algorithms that identify 3D maximum substructure; inspiration led me to realize that a pharmacophore is a 3D maximum substructure with the distances between pharmacophoric atoms and not bonded ones. This led to DISCO, the first computer program that recognizes 3D pharmacophores within a set of unrelated active molecules. (Learn more about the presentation here.)
Zoltán Simon from Delta Informatika described DrugPredict, an online service that predicts polypharmacology. To develop the method they first predicted the docking affinity of each of the FDA set of small-molecule drugs to each of a set of 154 carefully selected protein structures from the Protein Data Bank. This set of 154 descriptors (Interaction Profile Matrix) is used to develop models for each of the 181 FDA recorded effect/no effect of each drug. The result is an effect probability matrix. They show that 40% of the models produce areas under the ROC curve of 0.99 and all of the models areas greater than 0.85. Delta Informatika has also generated effect probability matrices for 100,000 drug-like molecules and offers automated processing of user-entered structures. The system facilitates candidates for drug repositioning as well as discovery of novel bioactive compounds. The chemical information system is based on JChem Base. (Learn more about the presentation here.)
Steve Muchmore from Abbott Laboratory presented a talk “Blobs of Hope and Other Flights of Fancy”. The work stems from the observation that the Rule of Five is too permissive and yet the hard cutoffs of it and other rules lead to unnecessary reduction in information content and discontinuities. The approach taken by the Abbott group is to generate probabilities of a desirable effect for each combination of properties. Probabilities for permeability (PAMPA), stability in human liver microsomes, and rat bioavailability, etc. have been developed from in-house data. Medicinal chemists can then see their compounds plotted in the physicochemical space of interest superimposed with contour plots that give the probability of a favorable result. Such a plot reveals not only the position of the compound(s) of interest, but the direction of property change that will increase the probability of a more favorable result. Another advantage of the approach is that probabilities of different effects can be combined, using Belief Theory for example. (Learn more about the presentation here.)
Aleksander Mendyk for Jagiellonian U. Medical College described their work on predicting hERG channel inhibition of compounds. They used the data from the CompTox project, which lists 1969 records that describe the hERG effects of 200 drugs under various conditions. The test set was 193 records for 25 substances. To model the response they used 30 parameters that describe the experimental conditions, and 107 chemical properties calculated with ChemAxon cxcalc software. Models of the concentration-inhibition curves were built with various Random Forest and Artificial Neural Network algorithms from the Weka software package. Both algorithms produced models from ten-fold cross-validation with an RMSE of 0.2 on a scale of 0 – 1 for percent of hERG channel inhibition. (Learn more about the presentation here.)
Ian Berry from Evotec described their vision for the use of SharePoint with cheminformatics. He first pointed out that before leaping into a SharePoint project one must be certain of its role. At Evotec after spending 10 – 12 months to set it up, SharePoint has replaced their old intranet. It will be the collaboration tool for all projects at all sites. They plan to make it available to clients. For cheminformatics it is easy to connect to the JChem Cartridge, use query forms to extract and filter data, and display the results. Their wish-list is to be able to draw structures in lists, libraries, discussion groups, and Wikis; to calculate properties; to name molecules; to do structure search in lists and documents; and to generate simple plots. A clear advantage of SharePoint is that it is one place for all types of information. This supports rapid communication and the ability to build on previous experience. (Learn more about the presentation here.)
Tamás Pelcz from ChemAxon presented two talks on Sharepoint. JChem for SharePoint covers all ChemAxon components not related to Microsoft SharePoint Search. Originally, JChem for SharePoint includes an editable ChemAxon structure field, a Chemical Terms field, and a structure filter web part that does an atom-by-atom search. The new version has full Firefox and Google Chrome support. Structure editing will now be in-place using a connected Marvin Sketch or via OLE to edit structures with a desktop editor. There are two types of structure fields—the linked structure field and a calculated structure field that could contain for example the Bemis Murcko Molecular Framework of the structure. The demonstration showed importing an SDF file while adding an atom count column, changing a structure triggering changes in calculated properties, exporting to an SDF file, importing and exporting JChem for Excel workbooks. Visualization is accomplished with Visifire–Silverlight or Microsoft Charts. The new, soon to be released version 1.0, will also solve the currently unanswered questions include scalability, response time, and if chemistry applications will overload the infrastructure leading to crashes. Plans include the support of third-party editors, improved structure filters such as and/or, the ability to save and load queries, and integration of Structure Checker and Standardizer. (Learn more about this presentation here.)
In his second talk Tamás Pelcz discussed JChem Search for SharePoint. It in an extension to Microsoft SharePoint Search and FAST Search that provides the ability to index and query chemical structures. So far they support indexing of JChem content in lists, blogs, discussion boards, and Wikis; of structure files; of IUPAC names or SMILES in documents; of words and synonyms naming structures; of OLE objects for structure drawing; and of JChem for Excel. The objective is a pluggable, extensible indexing architecture that includes storage of structure indices, parsers for keywords and text, and recognizers of structures in images. The plans for indexing also include a custom connector to external applications such as ELNs and enhancements to the Excel add-in to identify the exact cell location of the hit and to support ISIS for Excel. Querying in JChem Search for SharePoint uses web services to hide the complexity of merging structure and keyword hits. The results show not only the structure hit, but also the document source. One can refine the hits based on the type of file in which the structure was found or the format of the stored structure. Plans for querying include a structure only search, a structure property refiner, and a document preview. ChemAxon now plans to do benchmarks of indexing and querying as well as large scale deployment testing. This will also include providing installers and configuration tools. (Learn more about this presentation here.)
Dragos Horvath of CNRS-Université de Strasbourg described the integration of ChemAxon tools into an interactive structure standardization editor to be used for careful building of QSARs. It is especially important that molecules that are used for a QSAR are represented in the computer consistently. They used ChemAxon tools to remove counterions, neutralize charged molecules, and to highlight problematic structures. It was also necessary to remove a covalent bond in structures that should be salts. Tautomerization presented the biggest problem. In their hands the major or dominant tautomer of a molecule produced by ChemAxon depended on the exact representation of the input SMILES: A structure presented with alternating double and single bonds yielded a different major tautomer than did the same structure presented with aromatic bonds. Additionally, they felt that keto-enolization was overdone, particularly in cases of amides. In some cases Standardizer changed stereochemistry at double bonds. (Learn more about the presentation here.)
Iván Solt from ChemAxon presented Reactor. He reminded the audience that it is accessible as a stand-alone application, in JChem for Excel, and in Instant JChem, as well as an API. Reactor provides highly automated enumeration using reagents loaded from structure files and chemical intelligence by smart reaction rules that are evaluated with calculated or imported properties. Recently the reaction library of 145 commonly used organic chemical reactions has been augmented with approximately 100 additional reactions. Although the reaction rules are helpful, sometimes the chemist wants to select compounds manually, an option supported by Reactor. As well as the structure of a product, Reactor carries along user-selected properties of the reactants. This allows the chemist to avoid syntheses that use expensive or hazardous reagents and to locate the reagents. In addition, the product is given an IUPAC name and the identification of the reaction used. A new feature is the ability to include reactions that yield no products. Also new is the capability to specify the syn or anti addition to a double bond. Reactor in Instant JChem provides direct access to reagent databases and straightforward post-processing and visualization. Reactor in JChem for Excel also supports sequential and combinatorial enumeration as well as direct post-processing of products. Future releases will show continued expansion of the reaction library and improved prochiral reaction support. (Learn more about the presentation here.)
Bob Marmon from Evotec described how they use KNIME to provide desktop tools for chemists. KNIME helps streamline routine processes such as updating and querying databases and sharing SAR reports. It also supports data-mining and SAR analysis to ultimately lead to QSAR models or provide data for more sophisticated CADD tools. One example of a KNIME workflow gathers data from Excel spreadsheets, calculates properties, and loads the structures and properties into an Instant JChem database for sharing with others on the same project. At Evotec every project uses an Instant JChem database for their data. This is handled by a shared JChem cartridge database that uses one Oracle schema per drug discovery project with built-in Instant JChem security and roles used to manage users. Users at both the Abingdon and Hamburg sites contribute to this database. Evotec also uses Instant JChem to store synthetic ideas, which KNIME extracts monthly to present the ideas in PowerPoint. The SAR data tree contains joined chemistry and biology tables, allowing many assay results for each assay type for a particular compounds. Evotec has also developed KNIME workflows to create SAR tables and reports from databases as well as specialized reports that concentrate on potent compounds based on a particular scaffold. In addition they have KNIME workflows for searching the ChEMBL database, for enumerating libraries limited by reagents available in-house, to order compounds that were hits in virtual screening, and for auditing users of the ELN. They use SharePoint to share workflows, tips, and reports for other users. KNIME nodes are being developed to run Linux-based CADD programs and to replace the Evotec desktop modeling tool and to integrate modeling tools into workflows. (Learn more about the presentation here.)
Tim Dudgeon from ChemAxon described how enhancements to the already powerful Instant JChem desktop application provide more ways to view and work with data. Chart widgets for histograms, scatter plots, and radar charts are fully integrated with both selection and query. Color-coding based on other properties is also supported. One can now use configurable conditional formatting in grid and form views to identify cells that match a user-defined rule. More form widgets are now available: structure matrix, multifield sheet, and tabbed pane. Form Builder is improved by being more configurable and easier to use. A further improvement is the addition of scripted calculated fields that can do such simple things as adding values from two columns to more complex actions such as aggregating data from related tables or call out to external services. Also new in Instant JChem is support for scripts that act on data trees or schema. This allows custom import and export, data migration, and custom data processing. Work on scripting continues with plans for sharing of scripts, better editor support, and support for using external libraries. Reactor in Instant JChem is also improved with the use of Chemical Terms and charts and by copying fields from reactants to products. Instant JChem now supports training a computational model. With 5.4 and 5.5 it is now faster to search Markush structures in Instant JChem. Markush enumeration can be filtered using chemical terms expressions and homology group expansions are available. Along with performance improvements, versions 5.4 and 5.5 have more options for security including user roles. Beyond 5.5 users will see an Instant JChem server, improved visualization widgets, cherry picking, scripting, manipulation of data, and forms display. On the chemistry side, plans are to include clustering and grouping, R-group analysis, chemical space analysis, and library design. (Learn more about the presentation here.)
Tamás Pelcz from ChemAxon described the challenges of migrating from other Excel add-ins to JChem for Excel. In Excel, Shape based add-ins, such as those from ChemDraw and Accord for Excel; structures are visible even to those that don’t have the application. However, such add-ins can be slow and support only a few hundred structures. On the other hand, Excel window rendering add-ins such as ISIS, Isentris for Excel, and JChem for Excel, are fast and support an almost unlimited number of structures. However, structures are not visible to those without the add-ins and copy/paste and printing have conversion overheads. In the first case study, JChem for Excel was integrated with GlaxoSmithKline’s own add-in, Helium. An important issue was that OLE conversions didn’t work—ChemAxon improved the Excel add-in that so that structures and data are copied to the clipboard in a single step. ChemAxon also provided Active X integration with ChemDraw at GlaxoSmithKline’s request. Many other requests were also accommodated. The second case study involved supporting Evotec’s migration from ISIS for Excel. Many issues were solved, as was the implementation of coexistence with ISIS for Excel. Evotec requested some new features including directing importing from a file to start at the current cell and exporting the selection only. A final case study was GNF migrating from Accord for Excel. There were issues with the structure sheet, structure drawing and printing. GNF requested the ability to turn on and off drawing in the API and to perform R-group decomposition on the same page. This migration was a partial success. In summary ChemAxon is open to new feature requests and will work closely with the customer to work on features or issues that have the highest value. During such a migration customers will see a new release every two months but also access to development releases. In the past year rendering in JChem for Excel has moved from .NET to C++, support for Office 2010 is now available, and new functions such as MCS, Structure Checker, and MarkushMatch have been added. In addition, import from databases is improved. Plans include an installation validation checker, improved load times, caching of function results, and asynchronous structure and other functions. In addition, plans are to use a database to store structures and converting the add-in to support both Excel shapes and rendering. The plans also include the ability to copy and paste between other Office applications. Importing from databases, web services, and Instant JChem will also be possible. SharePoint integration is also planned. (Learn more about the presentation here.)
Karen Worsford from GlaxoSmithKline described Helium, a data-driven user tool for SAR analysis. As part of their effort to decrease the IT resources that support SAR in Discovery, they needed a tool to gather and explore the data, with the objective to replace ISIS Base. The first effort used TIBCO Spotfire, which successfully demonstrated data retrieval and functionality. However, Spotfire met with resistance from the average bench chemist. The ultimate decision was to integrate Helium into JChem for Excel, Instant JChem, and TIBCO Spotfire. The success of the integration was attributed to the close working relationship with a Lead End User and a larger End User Group, and weekly End User Group meetings. The project success can also be attributed to the support, responsiveness, and excellent communication with ChemAxon. Helium uses Web Services to supply non-biological data and Oracle to supply biological data to users at any site. Helium is kept up to date in Excel using Microsoft ClickOnce. Client code regularly checks for updates and installs. Helium is kept up to date in Spotfire with the server checking the status at each log-on. The Helium user interface provides a datatype sensitive task panel. Helium integrates structure retrieval from IDs; biological data from IDs; identity, substructure, and similarity searches of compounds; calculating properties of compounds; standardizing compounds. Although it is a significant improvement over previous tools, Helium’s retrieval of >30,000 structures from compound IDs takes approximately a half hour. The benefit of Helium merged with JChem for Excel is that it brings new capabilities for SAR analysis in a familiar and portable program. Conditional formatting, column filtering, R-group decomposition are all powerful tools added to Excel. Helium for Excel replaces six legacy applications and is currently used by more than 1200 researchers. Approximately 500 users have installed helium for Spotfire. (Learn more about the presentation here.)
Ian Berry from Evotec summarized their seven-year experience with ChemAxon. The first migration was of their supplier database from Daylight. This was driven by cost and loss of UK support of Daylight. They now use ChemAxon tools in all of their applications because of the breadth of their offerings and excellent support. Their registration system enforces exclusivity of each compound, is available to all chemists, has project and user security, and supports preferences for displaying and exporting data. EVOsource includes more than 14 million structures contained in 391 catalogues from 121 suppliers. The database includes not only supplier information but also hazard data. Selected compounds can be ordered from stores or a supplier or sent out for a quote. EVOsource uses the JChem Cartridge, Marvin Sketch and View, Structure to Name, Standardizer, and Instant JChem. They use Instant JChem for chemistry and biology project databases. They plan to migrate the remaining ISIS databases to Instant JChem this year. It was simple to move from ISIS4Excel to Instant JChem. The move has the additional advantage that calculator plug-ins are available. Evotec also provides JChem for Excel, as this is the preferred application for chemists. They incorporated Reactor to provide rows from one reagent, columns from the second, with each cell containing the resultant product. After an initial search in Instant JChem the structures and identifiers are exported and loaded into JChem for Excel, property calculations are made, and a table formatted. He concluded by reporting that they have now purchased perpetual licenses for the entire ChemAxon suite. This recognizes the great support and the way that ChemAxon helps Evotec deliver value to their clients. (Learn more about the presentation here.)
Heike Nau from Elsevier Information Systems described how collaboration with ChemAxon expanded Marvin to support structure indexing for the Reaxys database. This database combines the information in CrossFire Beilstein, CrossFire Gmelin, and the Patent Chemistry Database. It is a web-based workflow solution for chemists to find chemical information. The database is built from manual procedures that extract chemical structures, properties, and reaction details from documents. The challenge is that the database covers a broad variety of compound classes including inorganic compounds—the latter include coordination compounds, catalysts, clusters, biosensors, and solid-state compounds. Hence, there are heavy demands on the structure editor. MarvinSketch met some of these needs, but fell short in support of molecules that can only be drawn and displayed in 3D and also for stereochemistry beyond tetrahedral. ChemAxon took on the task of improving MarvinSketch. Now it can generate 3D structures and conformations, support editing in 3D, and provide 3D depth cueing. As a result Marvin has been integrated into Elsevier software. It is used as a .NET and integrated as a component dll. Although at first there were issues with synchronizing the Marvin rendering of a structure with the information in the MOL file, this was solved. The sketcher has been used in production for eight months without problems. The launch was smooth with very little training and high acceptance of users. (Learn more about the presentation here.)
The meeting concluded with Alex Drijver reminding the group that the ChemAxon brand is informal and approachable. Its philosophy is to develop and enhance products in partnership with customers—to respond to their needs to so continually improve the ChemAxon product line. It prides itself on the continuous enhancement and bug fixes of its software. It also takes great pleasure in the collaborations with customers, who provide valuable feedback and suggestions for enhanced functionality. Certainly, the fruitful collaboration with users was reinforced by comments of many of the speakers at the meeting. Although headquartered in Hungary, ChemAxon is an international company with personnel in Hungary, US, India, China, and several European countries; customers worldwide; and partnerships with both large companies and small biotechs. (Learn more about the presentation here.)
The social program also lived up to expectantions. Both conference hotels feature thermal spas. They are located on Margitsziget, an island that sits in the Danube not far from the city center. The island is named for Saint Margaret who lived in a convent on the island. Landscape parks mostly cover the island, but it also includes medieval ruins of various nunneries, chapels, and cloisters and a beautiful Japanese garden. Preceding the training day, those delegates present gathered at the Grand Hotel for a buffet dinner of Hungarian delicacies as well as local beverages. Preceding the meeting itself, buses transported the delegates to the ChemAxon offices for the traditional garden party on the terrace. Although the temperature was on the cool side, the enthusiastic exchange of ideas by the delegates warmed the occasion. We appreciated that the indoor room in which the wine was served had apparently been cleared of desks and computers—witness the hand-written time-table for new features that was taped to the wall: some features were checked off, and some were still to be accomplished. The Gala Dinner was held in perfect summer evening weather at the Széchenyi Bath and Spa. This magnificent 1913 building features high ceilings, tile mosaics, and marble floors. After a quick drink, we entered a large room that had tables of flip-flops and racks of terry robes, both emblazoned with the ChemAxon logo. Dinner was served on a terrace overlooking the large outdoor swimming pool and was followed by music from a lively band. Chemoinfomaniacs know how to have a good time! The social program concluded following the formal meeting for those delegates remaining. They took taxis to downtown Budapest, went on a short walk narrated by Alex Drijver, and ended up at a local restaurant. In short, the social program gave delegates a taste of Budapest and many chances to talk with other attendees.
In summary, the European ChemAxon User Group Meeting showed again the enormous talent, openness, and enthusiasm that characterizes the company. The event at the Szechenyi Bath and Spa exemplified the spirit of the meeting: organized but not stuffy; purposeful but not rigid; original but not irrelevant. I especially appreciated hearing about such unique efforts as enhancing Marvin to support drawing complex inorganic structures and www.chemicalize.org whose goal is to provide chemical structures where ever a chemical or common name appears in a document. ChemAxon continues to show the openness, enthusiasm and talent that typify the company and has propelled it into a leading cheminformatics software vendor.