Robots, fish and integration at first US UGM
To acknowledge its expanding world-wide user base and in recognition of the fact that 41% of its business is in the US, ChemAxon held its first North American User Group Meeting to complement the highly successful User Group Meetings that are held annually in Hungary. We met at the Seaport Hotel and World Trade Center in Boston’s newly redeveloped waterfront. The conference center is literally on the water and right next to a pier that serves ferries that transport passengers to Provincetown. The smell of sea air was definitely present. Fittingly, the conference banquet was held a short walk down the pier at an excellent seafood restaurant. To celebrate its Eastern European identity, Becherovka, a liquor popular in Hungary, flowed freely at the dinner.
Preceding the meeting itself was a mixer at the Massachusetts Institute of Technology Museum where we explored the exhibit "Robots and Beyond: Exploring Artificial Intelligence at MIT". The venue was well-suited to networking, as was the excellent food, which was served buffet style at stations spread out in the exhibit space. After dinner a cozy lounge area provided another venue for chatting.
The twelve ChemAxon employees present (of a total of 40) provided the enthusiastic spirit for which the company is known. For example, the requisite meeting T-shirt brands the participants as Cheminfomaniacs, which is a fit description. The sixty non-ChemAxon attendees represented approximately fifty organizations - large pharmaceutical and biotech companies as well as universities and other non-profits. Although most were from North America, one participant traveled from Australia specifically to attend the meeting. In contrast to many other user group meetings I have attended, this one was characterized by enthusiasm for the software and a high level of respect for the company. There were no hostile questions or back-biting comments from the audience.
Exactly half of the presentations were by users. The unwritten theme of the meeting was integration of multiple sources of information. Sometimes the integration was of chemical databases from various sources into one user interface, other times the integration was of software from various vendors, and other times the integration was of disparate types of biological data with chemical structures. The encouraging sign is that all of these types of integration are facile and that ChemAxon recognizes that its software can play an important role in the effort.
The meeting was preceded by two simultaneous day-long workshops. The first involved end-user training, which I attended. Each student had a computer to use to follow the training. The set-up was excellent - the ChemAxon tools were on a local network and access was flawless. Because there were only six students in this session, everyone was able to perform the exercises and get help with questions or problems. Additionally, every step was well documented in the training material. The training focused on the Marvin user interface including Calculator Plugins and Instant JChem. Most of the session involved query building and management. Approximately twenty students attended the developer workshop, which I understand was not interactive but never-the-less very helpful.
The day and a half meeting had 30 presentations divided approximately equally between ChemAxon staff and users or partners. The focus of the first day was on enterprise platform development and the second (short) day on end user tools and data analysis.
The presentations started with a strategic overview from Alex Drijver, the newly appointed CEO of ChemAxon who returned to ChemAxon after stints at other Hungarian life science services companies. He emphasized the customer focus of the company. It now has two sales staff in the US and one of the developers has relocated from Hungary to San Diego to better serve the US customers and partners. Alex’s other emphasis was on the performance of the ChemAxon products; scalability, speed, quality, and ease of deployment. None of the participants argued with his claim of best in class products and best in class customer support, and evidence of ease of deployment was provided by the various user presentations. ChemAxon’s goal is to be recognized as "best of breed" in cheminformatics and an ideal partner for specialized development. It must accomplish this without losing its reputation for quality, customer orientation, and responsiveness. ChemAxon is known as both a provider of end-user solutions and of tool-kits with its end-user applications built on its own tool-kits.
First on the schedule of user presentations, but moved to a different time-slot because of technical problems, was Peter Condron of the Agency for Science and Technology in Singapore who repeated his remote presentation that had been presented earlier this summer at the European UGM. More details are available in the report of that meeting. The bottom line is that with 1.5 people in 1.5 months they designed, developed, tested, and deployed a chemical registry system based on the JChem Cartridge. It is integrated with the eNovator electronic laboratory notebook and in-house developed programs to manage HTS plates and results.
Lutz Weber of OntoChem GmbH, also on remote presentation, repeated his presentation from the European UGM. He emphasized the challenge of searching their database of 10 billion synthesizable, drug-like molecules that cannot be described by Markush definitions. The molecules are generated by ChemAxon’s Reactor and filtered to remove those that are not druglike. They successfully built their database using JChem and the Oracle cartridge. He tested JChem 3.2 with 200 million compounds and the results returned in seconds. He demonstrated this showing a search on the fly. Of special interest is their claim that topological torsions, Toto’s, (first published by Lederle and Merck scientists several years ago) better describe the similarities of molecules vis-à-vis their biological properties. Hence OntoChem use ToTo searching to discover new series of biologically active molecules. Although ToTo searches are approximately three-fold slower than a JChem similarity search, it takes only two minutes to search a 20 million compound database. They discovered novel Mdm2 inhibitors by this method.
Trung Nguyen from the NIH Chemical Genomics Center described a platform for HTS data analysis and visualization that is based on the JChem API. It will be open sourced when launched, probably later this year. The database integrates multiple sources for compounds and associated data including not only their results but also vendor catalogues and public repositories. The data can be viewed by assay or as tables with details of the activity measurements in columns of compounds active in a particular assay, of compounds from one cluster, or of compounds that contain a specific fragment. A particularly novel display shows pie charts of the activity in different assays of compounds that contain specific fragments. Structural representations for hierarchical clustering include MCS, path-based fingerprints, PubChem substructure keys, and atom pairs. He showed a very impressive circular dendogram of the clustering hierarchy of chemical compounds.
Jim Bullington presented the Palatin Technologies, Inc. cheminformatics solution that was developed with DeltaSoft using the JChem chemistry cartridge. In six months they moved from hand-drawn structures and biological data in many formats to an integrated registration system that includes structure linked views of the biological data. The system includes unique numbers for every compound with notations for salt and lot information; inventory information; filters so that collaborating companies to see only what is allowed in their particular contract; text summaries and images of dose-response curves and pharmacokinetics curves; and summary views of the biological properties of all salt and batch variants of a compound.
Kaisheng Chen from the Genomics Institute of the Novartis Research Foundation described an automated system that annotates chemical structures with information from their internal database, SureChem (patents), NCBI databases, WDI (drugs), Wikipedia, KEGG (gene annotation), and DrugBank. He showed an example of a set of compounds with similar biological profiles. Because some of these are dihydrofolate reductase inhibitors and the molecules are structurally similar, the remaining compounds are also likely to be inhibitors (Plouff et al. PNAS, 2008, in press).
Carlos Faerman from Vertex Pharmaceuticals described how Vertex uses Instant JChem to deploy commercial databases to medicinal chemists. With the ChemAxon software it was easy to provide a variety of vendor databases for searching by medicinal chemists and biologists. The Vertex users are extremely pleased with the ability to perform a federated identity, substructure, or similarity search across all or just selected databases. The advantage to the user is that the search query is entered only once rather than being repeated for each database. The advantage to the cheminformatics person is that each database is maintained separately, which makes it easy to deploy a new version.
Oleg Ursu from the Division of Biocomputing at the University of New Mexico also integrated different sources of biological data on the same compound. He used Instant JChem to integrate structures and biological data from the NIH Molecular Libraries-Small Molecule Repository, in-house screening data, and Wombat. The work included implementing a custom search application based on the JChem API. As an example he showed pipelining to generate a cluster of compounds identified in an in-house GTPase assay that showed that four were also active in PubChem GTPase assays, four were active in other PubChem assays, and five were reported in Wombat against still other biological targets. The pipeline also generated the Omega conformers for ROCS searching. His plans include automatic synchronization with PubChem and integration with other databases.
Matthew Pustelnik of Takeda San Diego, which is the center for the company’s structure-based drug design, discussed the development of a compound registration system using ChemAxon’s Oracle Cartridge. The system was built in web 2.0 technologies and was developed entirely on ChemAxon toolkits because the alternative commercial systems did not meet their requirements. The system is being deployed globally throughout Takeda.
In the poster session Junfeng Gao described The University of Minnesota Biocatalysis and Biodegradation Database that can be used to predict the pathway of degradation of chemicals in the environment. It uses ChemAxon's Reactor to compare input structures with their 250 biotransformation rules. Sophisticated logic prunes the output to the most likely structures. A poster from Qiner Yang of Kalypsys described their in-house system to manage structure-activity data from their in-house ultra high-throughput screening system. They initially started with the IDBS cartridge, but recently have added the JChem cartridge to the system. Although usually only one chemical search cartridge is active, they built a special application to compare the search results of the two systems to teach users about the subtleties of specifying a query in these systems. The poster sessions also included ChemAxon’s presentations on Markush structure representation, MCS clustering, Chemical Terms, and tautomer generation.
The integration theme was emphasized by the short partner presentations. Mike Burke from Agilent Technologies described their Kalabie Electronic Notebook and its integration with ChemAxon and other tools. The electronic notebook documents and manages experiments while facilitating collaboration between multiple R&D disciplines. Various vendor chemical cartridges are supported for use by customers to plan syntheses; to register compounds and batches; and to calculate properties.
Sean Ekins from Collaborative Drug Discovery, Inc. (CDD) emphasized that the company fosters collaboration by providing data security but also inter-scientist networking. The software integrates chemical, biological screening, ADMET data, docking, QSAR, and systems biology. It uses JChemBase with Rails via a Ruby-Java bridge and the Marvin applet. Ongoing collaborations include those for malaria, tuberculosis, and a GPCR Ki database. Although CDD’s current emphasis is on neglected infectious diseases, its infrastructure is also suited to for-profit organizations.
Yvonne Shimshock described DeltaSoft’s ChemCart, which provides a web interface to not only structures and data but also documents and images - such a form can be created and deployed worldwide in less than five minutes. This has facilitated development of applications such as compound registration, sample or reagent inventory, or a structure-activity browser. It is built on the interchangeable component approach; hence, any of the chemical cartridges or drawing tools may be used.
Dean Misenhimer from Kinematik described their eNovator product suite that includes an electronic lab notebook; project, portfolio, quality and learning management systems; and modules for clinical trials. ChemAxon tools are integrated into the electronic laboratory notebook.
Nicko Goncharoff described SureChem’s system that contains the full text of 8.6 million patents that contain 8.9 million unique structures but 500 million occurrences of structures. It is available as a patent search portal or as a web service that can be integrated into other tools. SureChem selected ChemAxon as the database tool after a 3-way comparative benchmark. It uses JChemBase, Standardizer, and MarvinSketch. Substructure or similarity searching against 9 million structures takes a few seconds.
James Baxendale described the Seurat system from Synaptic Science. Seurat was developed and used at Celera for approximately three years and then spun out when Celera abandoned small-molecule drug discovery. Seurat is designed to integrate many data sources to provide coherent information, including sophisticated visualizations, for all disciplines working on a discovery project. It is easy for the IT organization to adopt and for users to build "live views" such as smart spreadsheets that can change as the issues become clearer. Seurat uses JChemBase and the ChemAxon clustering and property calculation tools.
Derek Debe from Abbott Laboratories discussed Abbott's integration of Synaptic Science's Seurat with ChemAxon's LibMCS clustering tools. Enhancements were made to the LibMCS API to enable it to return a single column of float numbers conveying the cluster number and cluster member. This format allows the LibMCS clustering to be recovered within a spreadsheet environment with a single column sort. Seurat's native ability to cluster very large arrays of sparse assay data was also discussed. Abbott has collaborated closely with Synaptic Science, helping them evolve Seurat's functionality and usability prior to a significant roll-out of more than 500 users. Derek Debe from Abbott seconded the claims of ease of integration of the Seurat/ChemAxon system with existing databases and tools.
Gregory Smith from (the new) Tripos talked enthusiastically about their use of the Eclipse open-source software to provide customers with flexible options for deploying its core discovery informatics technologies and applications. (http://www.eclipse.org/) The Eclipse community is focused on building an open development platform comprised of extensible frameworks, tools and runtimes for building, deploying and managing software. It is extended and supported by a large group of technology vendors, start-ups, universities, research institutions and individuals.
On a totally different subject, Kevin Hebbel from Pfizer and Matthias Nolte from chemITment presented the basis of the Pistoia Alliance, an Open Source initiative to streamline non-competitive elements of drug discovery IT. Other partnership members are GSK, Novartis, and AstraZeneca. The partnership will develop a common foundation of data standards, ontologies and web-services to foster interoperability between diverse software implementations. Inquiries about participating are welcome.
The presentations from the ChemAxon staff usually included a description of the software as well as a discussion of the new features and development plans. JChem Base, JChem Cartridge for Oracle and Instant JChem are the three products for dealing with chemical databases. New features announced at the meeting, available in JChem 5.1, include position variation bonds in Markush structures and queries, the option to search for diastereomers, and an option to check for sp-hybridization in sub-structure searching. Promised for later this year are a web services interface for JChem Base, an API for a compound registration system, flexible 3D pharmacophore searching, and JChem for Excel.
JChem for Excel, approaching launch, is being implemented using .NET open source technologies. The initial features will be searching JChem databases and cartridges, R-group decomposition, calculation of molecular properties such as log D and IUPAC name, use of and import and export of SDF and MRV (a CML-based Marvin file format) files. A graph of log D as a function of pH can be produced. JChem for Excel will also use Reactor functions to populate the cell with the product, the reaction scheme, the SMILES or IUPAC name of the reactants and product, or just the SMILES or IUPAC name of the product. It will, for example populate a table grid of the products of a reaction in which each column refers to one of the lists of starting reagent and each row refers to the other list.
The Markush capabilities have been expanded in version 5.1 of the software; drawing of Markush structures, position variation bonds in the Markush query and database, coloring and alignment in the display of search results. Further developments are planned.
Although the capabilities of Reactor are well developed, future plans include continuing to improve the reaction library and the types of reactions that can be included. A new development is the use of Reactor to predict metabolic transformations. The first phase of the project is a library of P450 biotransformations. A key component of the calculations will be the relative speed of the biotransformation of each biotransformation rule - development of estimates for this factor is in process. Before release each reaction must be tested and refined and the predictions validated with published data.
Although Marvin is a well-established program that is familiar to users, ChemAxon has continued to develop it. The user interface of Marvin can now be reconfigured to more closely resemble that of other chemical drawing programs - this should help new users make the transition to ChemAxon tools. There was much excitement with respect to the name-to-structure component. Currently, IUPAC names are processed either in batch mode or entered into Marvin Sketch; later versions will also process common names. A handy new feature is the option to preview a structure file before it is opened in Marvin. Plans for future releases include a .NET version, ChemDraw import/export, multistep reaction support, trainable pKa, and shape descriptors in topology analysis.
ChemAxon has taken over the maintenance and support of the ChemAxon Pipeline Pilot Component Collection. This will result in improved error reporting. The collection currently includes Marvin sketcher and viewer; Standardizer; Chemical Terms filter; Reactor; insertion, search, and retrieval of structures from JChem chemical database; calculation of major protonation form and distribution of microspecies; and the following calculator plugins: HBA, HBD, isoelectric point, logP, logD, pKa, polarizability, refractivity, topological polar surface area and Burden eigenvalue descriptor (BCUT). ChemAxon plans to add the following Pipeline Pilot components: JChem Oracle Cartridge, Instant JChem, name to structure conversion, tautomer and conformer generation and maximum common substructure clustering.
LibMCS, the maximum common substructure-based clustering method of ChemAxon’s JKlustor continues to be improved. Searches are now considerably faster than last year. For example, although the timing will depend on the diversity of the set, average sructure size, and MCS/libMCS parameter settings, datasets of 10,000 molecules can now be clustered by libMCS in approximately two seconds and those of 30,000 molecules in 10 seconds. These improved timings mean that it can now be used the interactive analysis of HTS results. Amazingly, LibMCS is now somewhat faster than Jarvis-Patrick for clustering a library of 100,000 molecules, taking less than an hour whereas Jarvis-Patrick takes 85 minutes. Development plans include integration into Spotfire and Instant JChem as well as a new viewer. Longer-term goals are to recognize disconnected maximum common substructures and to allow molecules to be members of more than one class.
Instant JChem now boasts form-based queries and query-by example. It now supports saving both hit lists and queries, list logic on hit lists, editing hit lists, and restricting queries to a particular list. Current developments include support for JChem Cartridge for Oracle, more flexible and more field and widgets types.
In summary, the first ChemAxon US User Group Meeting provided attendees with both a view of the existing and new capabilities of the company's software but also inspiration from other users as to how the software might be used in their own institution. Hopefully the next user meeting will diverge from the emphasis on structure-data integration to highlight user experiences with the powerful capabilities in Reactor, the Calculator Plugins & Chemical Terms Language, Screen, and Fragmenter. None-the-less, the experiences reported by users and the advances reported by ChemAxon support the view that ChemAxon provides "The Solution" for many cheminformatics problems.