Wendy Warr's Report: Cheminfo Stories 2020
ChemAxon’s annual user group meetings in Europe and the United States are always a highlight of my calendar, so I was dismayed when the pandemic crisis forced me to cancel my trip to Budapest in May. All was not, however, lost since this year’s meetings did go ahead virtually and I have had the pleasure of viewing the virtual meeting and writing this report. It was not the same as meeting in person and enjoying the networking in Budapest, Boston and San Diego but it did have the advantage that speakers were able to join from all around the globe, which would have been nearly impossible at a multiday, in-person conference. The format of the event was a series of eight webinars broadcast between May 26 and June 9. Each episode was broadcast twice a day, with a live question and answer session after the presentations.
ChemAxon Roadmap and Product Portfolio
Overview and Roadmap
Tamás Mihalovits gave an introduction. ChemAxon has successfully adapted to the new COVID-19 situation. The transition to home offices was smooth and there were no business continuity issues. The annual growth rate for January to May 2020 has been over 19% instead of the expected eight or nine percent. ChemAxon is supporting education and research with free software for studying COVID-19. ChemAxon’s focus for 2020-2021 is to give customers even more value than before. ChemAxon has offices and distributors in 11 locations, customers in 130 countries, 600 clients, and more than 600,000 users worldwide. A new office is being opened in Basel, Switzerland. ChemAxon is nearly 22 years old. It supports more than 30 products and has 160 employees, 60% of whom work in R&D. It is privately owned, and Ferenc Csizmadia is still in charge. It recently obtained ISO 9001:2015 quality management system certification.
The company aims to build ChemAxon’s future on three equally important pillars: toolkits, integrated solutions, and professional services. In the field of toolkits ChemAxon want to maintain and enhance their market leader position. That means further investments in the toolkit technology, new toolkit components, and a focus on easy-to-use APIs to fulfill the needs of integrators. ChemAxon will continue to offer ready-to-use, end-to-end integrated solutions with the goal of covering the complete, early-phase discovery workflow. They also continue to enhance the transition from traditional solutions to hosted services. The third pillar is professional services: offering help and expert knowledge from project scoping, through project management and implementation, to the maintenance of production systems. .
Jan Christopherson and David Malatinszky gave an overview of ChemAxon’s product portfolio. Table 1 shows the solutions’ position in the lifecycle: those that are young where the full set of valuable features is still being explored, established tools whose features are still growing, and tools that, for technological or other reasons, are focused on sustaining the current feature set.
Table 1. ChemAxon Portfolio
The second generation JChem engine is the backend to the JChem PostgreSQL cartridge and the Chemical Oracle Language (Choral) cartridge. Choral helps bring searches to the cloud, with AWS Relational Database Service (RDS) compatibility. This cloud-readiness is mirrored in JChem Microservices, which enable the power of the JChem engine in a modular, distributed fashion, enabling users to set up their web applications in a highly available fashion.
In the chemical database management suites, Plexus Connect builds on the power of Instant JChem and delivers content via a thin web client. It now has an integrated form editor, introducing a greater ability to modify the layout of the application, as well as the possibility to create forms directly on the client. Compliance Checker now includes a Harmonized Tariff Schedule (HTS) code generator which is also available as a standalone product.
Property calculation and fingerprinting technologies (Screen Suite) are gaining greater interest with the growth in application of AI and machine learning to cheminformatics, where they provide a basis for training models. ChemAxon have recently increased the ease with which the calculators can be trained and have released an accuracy reporting tool. There has been an explosion in interest in DNA-encoded libraries (DELs), and huge virtual libraries. Reactor provides a powerful virtual synthesis engine to allow rule-based library generation. Work is underway to transform the desktop tool to a web-accessible interface. A new product, Design Hub, has been recently released; there is much to be said about it later in this report.
The field of cheminformatics is slowly, but surely, embracing cloud deployment and software-as-a-service (SaaS) product models. Synergy, ChemAxon’s cloud platform, fits the bill. The data visualization tool in this has been replaced by Tableau.
Rule based search
Large diverse libraries such as ZINC (containing 730 million molecules) are used widely for high throughput screening (HTS) and virtual HTS (not to be confused with the Harmonized Tariff Schedule). Combinatorial libraries are even bigger. Massive DELs are growing in importance. Meg McCarrick of ChemAxon gave more detail in a later talk. DELs and other combinatorial libraries can be represented as reactants plus reactions, or as Markush (R-group) space, or, in theory, as enumerated compound space, but it is impractical to enumerate massive libraries fully: the cost of storing all combinations is very high while search speed is still low. Thus, ChemAxon has experimented with rule-based search. Tamás Varga presented proofs of some concepts.
The rules use a reaction center and a list of reactants. The products can also be represented as a Markush structure, where the reaction center is the scaffold, and the reactants are “clipped” to R-groups. Searching on a rules-based dataset (Table 2) can be done by searching on the product space by substructure or similarity search. Searches return the most relevant R-group combinations by some kind of sorting method. So far, ChemAxon has tested on only scaffolds with a single core and two or three R-groups.
Table 2. Proof of concepts by ChemAxon
The LEAP2 method1 was initially reported in 2011 but ChemAxon have done a new implementation. N hits from similarity search on R1 reactants, N hits from similarity search on R2 reactants, and N hits from similarity search on R3 reactants are used in enumeration. The first N hits from a similarity search (of Chemical Hashed Fingerprints, CHFP) are then used. In addition to the LEAP2 method, there are two rules-based search methods: a Markush analysis method and a substructure search method. The current performance and capabilities of the methodologies are compared in Table 2.
AWS Lambdas are cool
AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume. András Stracz of ChemAxon explained that Lambdas are cool because they are serverless, and they zip with a single method, automatically deployed to a worker instance. They take a maximum of 15 minutes per call, on a single CPU core, with a maximum of 3GB RAM. Unused or idle functions are stopped. The default scaling is to 1000 parallel executors. Lambdas are ideal for jobs where the load wildly fluctuates and is made up of small pieces. Java, NodeJS, Python, Ruby, Go, and C# are supported.
Lambdas do, however, present three potential problems. One is the 50MB maximum deployment size, including the AWS SDK. The second problem is “cold start” and “secondary cold start”. The third is the cost: up to $0.000048 per second, or $4.14 per day. (Bear in mind that an Amazon t2.small instance costs less than $1 a day, or $156 a year upfront.) András discussed the applicability of Lambda to a typical Design Hub application on one CPU, using 300-400MB RAM for 0.2-5.0 seconds. Two thirds of plugin calls are possible with Lambda implementation. Consider 2 million plugin calls in one year by 150 chemists, or 400,000 plugin calls in one year by 25 chemists, or 12,000 plugin calls in one year by five chemists. One million plugin calls is a lot.
Firstly, there is the problem of deployment size. The ChemAxon implementation (12KB), plus AWS SDK (8MB) plus JChem (60MB without Marvin JS editor or database drivers) adds up to 69MB, but there is a 50MB limit. The solution is to upload to Amazon Simple Storage Service (Amazon S3) and then deploy from there: the size limit in this method is 250MB. The solution to the start-up delay is Amazon provisioned concurrency, a feature that keeps functions initialized and ready to respond in double-digit milliseconds.
The problem of performance and cost is illustrated by a single, conformer sampling plugin for 1 million molecules, locally on one 1 CPU core, using 3GB on average for 195ms. The AWS memory required (2GB) on average is billed as 230ms. ChemAxon has developed a proof-of-concept that costs $1.25 per year per plugin for warm-keeping, $6.84 for 1 million executions, and $0.20 per 1 million requests on the API gateway. This could be 20 times cheaper than expected.
Thus, the three obvious problems are not going to be problems. You can use ChemAxon toolkits with ease and with benefits for software architecture and scalability.
ChemAxon is well-recognized as a provider of a toolkit for calculated physicochemical properties and descriptors. Ákos Tarcsay said that ChemAxon is now responding to user suggestions for enhancements. Users now want to use in-house data to augment predictions of physicochemical properties and to build models for new targets, and they would like ready-to-use models for AMDET end-points. ChemAxon thus decided to develop new training functionality to achieve higher local accuracy compared to general models by improving their readily available training functionality, and to build a module to predict molecular properties by actively learning from input data and thus provide models for novel end-points.
As an example, ChemAxon selected the challenging hERG off-target end-point. Ákos presented satisfactory preliminary results obtained with the training engine prototype. Early adopters are now invited to test the new interface to train existing models (solubility, pKa, logP, and logD), and to train and run models for custom end-points. They will also be able to test a new hERG model and help ChemAxon shape the whole engine and its application.
Library Enumeration and the future of Reactor and Plexus Design
Árpád Figyelmesi explained why the Reactor product family is ripe for enhancement. The Reactor desktop GUI is not very intuitive, with a long learning curve, and it uses outdated UI technology. The current web-based alternative to Reactor, Plexus Design has limited Reactor functionality and limited UI configurability. It is based on JChem Web Services Classic, which is technologically behind modern web solutions, causing performance and database maintainability problems. New challenges include scalability to enumerate billions of compounds, handling of consecutive reactions, and compatibility with modern web architectures, while supporting the aging desktop environments. ChemAxon has thus made Library Enumeration plans for the third and fourth quarters of 2020. These include a web-based alternative to desktop Reactor, with a completely new, simple and intuitive interface, designed to be an integratable and customizable web component. It will support consecutive reactions and large-scale enumeration. All Reactor interfaces including desktop Reactor will be replaced with the new component. User feedback is invited.
Marvin, the next generation of chemical drawing
In 2019, ChemAxon started to work on “Marvin NG”, the successor to both Marvin JS and MarvinSketch. The new software will be released at the end of 2021, it is hoped. Efi Hoffmann reported that in 2019 the main focus was to find a solution on the web for publication quality drawing. In the first two quarters of 2020, the focus was on big mechanism drawing, to meet the requirements of medicinal chemists. A beta version of the new software will be available in Chemicalize. In the third quarter of 2020, the focus will be on customization, and on smart helpers for mechanism drawing (e.g., the catalytic cycle of [NiFe] hydrogenase). A beta version of the new software will be integrated with JChem for Office. In the fourth quarter of 2020, there will be a version of the editor to help analytical chemists in their everyday work with basic calculations.
To avoid misclassification of imported goods, a general nomenclature is needed. The Harmonized Tariff Schedule (HTS) was introduced in 1988, identifying with a tariff code each commodity, from live animals through human hair, to pharmaceutical products. The Harmonized Commodity Description and Coding System is maintained by The World Customs Organization in Brussels. It is available in hard copy and in the form of an online database, containing explanatory notes and classification opinions. Ákos Papp outlined the many sections, chapters, headings and subheadings. U.S. HTS codes are controlled by the United States International Trade Commission. ChemAxon have converted all these HTS codes into a tree representation, and created a knowledge base editor to assign structural queries to each of the branches and leaves. ChemAxon’s HTS code generator is called cHemTS. It is now included in Compliance Checker, and is also available as a standalone product.
Enterprise Chemistry Backend
András Volford introduced this session. There are different interfaces all having the same JChem search engine: JChem Base, JChem cartridges, and JChem web services. JChem Web Services Classic will be withdrawn in March 2022. JChem Microservices can handle large datasets. The architecture is modular: the modules that you need find each other. JChem Microservices is scalable and can be embedded in any cloud provider infrastructure. ChemAxon’s second generation Oracle cartridge, Chemical Oracle Language (Choral) can handle “no structures” and have enhanced stereochemistry (as opposed to a chiral flag). The same applies to the JChem PostgreSQL cartridge. A chemical structure is not stored for some entities but the corresponding data are searchable. The v2000 molfile chiral flag representation is converted to a v3000 enhanced stereochemical representation on the fly. Tamás Csizmazia had outlined some search engine improvements in the previous session. The second generation JChem Base engine used in the JChem PostgreSQL cartridge, JChem Choral, and JChem Microservices backend offers a distributed environment and rule-based search. Making distributed search available reduces the costs of development by simplifying, makes it easier to introduce new features, and allows optimization of code and data. The Hazelcast platform allows transaction handling possibilities, a distribution algorithm, and high speed with huge datasets. Rule-based search is discussed elsewhere in this report.
Migration of a central compound management system to state-of-the-art technology
Quattro research, founded in 2004, is a company with 35 employees in Munich and San Francisco. The company develops IT solutions for R&D. Markus Weisser, the managing director, discussed a solution developed for one of their customers. The customer wanted to replace a BIOVIA Isentris and BIOVIA Direct compound management system being used by hundreds of people worldwide. The system was to migrate from the desktop to the web, powered by ChemAxon technology. There was to be a central repository, integrated with a storage solution and laboratory hardware, and catalog data (e.g., the BIOVIA Available Chemicals Directory (ACD) or Molport) for sample ordering. Quattro Web/CM is a system for management of chemical and biological inventories. An ELN designed for multiple scientific disciplines within R&D, quattro/LJ, is linked to Web/CM. Quattro research also offers solutions for registration and management of biomolecules based on the open Hierarchical Editing Language for Macromolecules (HELM) notation and editor. Quattro Web/CM allows users to register and manage chemical and biological concepts in a web application, handle batches and containers, and export data with a few clicks. It is easy to use and lightweight, with in-built interfaces to liquid handlers, scales, and automated storage systems. It offers a configurable ordering system, configurable storage locations, and plate handling. Screening and standard compounds are distinguished, and hazard information, shopping carts, well plates and barcodes are handled. In Web/CM, the parent “concept” (e.g., a chemical structure, antibody, cell line, protein, or plasmid) has batches as children, and each batch has containers as children. Marvin JS is the default editor. There is support for all major cartridges, not just the JChem cartridge. There are advantages to using a web application rather than the desktop. No installation is required. The solution is platform-independent and responsive, and minimal hardware is required. Quattro’s web solution is cloud-ready, and a chemical editor is included. The customer’s in-house compound inventory is searchable by structure (or substructure), text, field, or list. Hit lists from searches can be sorted. For catalog compounds, regulatory information, vendor information, availability, and site dependence are handled. A shopping cart can be assembled for external orders if the compound is not available in-house. Regulated or toxic compounds are highlighted. The first cartridge tried by quattro research was JChem for Oracle. It was difficult to use for “trivial” compounds in large datasets. Complex joins in query optimization were tricky. Therefore, a few months ago, quattro research started to evaluate JChem Choral, the next generation JChem engine available as an Oracle cartridge. It imposes no limitations on the results set, and data returned are sorted by relevance. In a substructure search for aniline, aniline itself was in the top five hits. (The top four were radiolabeled versions of aniline.) Joining 40 million ACD articles with 13 million ACD structures gave results for aniline in less than a second. In summary, quattro research’s customer changed from Isentris to quattro Web/CM. Marvin JS is tightly coupled into Web/CM. JChem Choral solved some cartridge problems. Returning results in order of relevancy is a great improvement in the cartridge technology. Data transfer between partners is another typical challenge.
Using ChemAxon tools to automate Nimbus’ SDfile curation
Nimbus Therapeutics is a virtual biotech. It uses a structure-based drug discovery engine to design potent and selective small molecule compounds targeting proteins which are known to be fundamental drivers of pathology in highly prevalent human diseases, and which have proven difficult for other drug-makers to tackle. The company’s LLC and subsidiary architecture enables diverse and synergistic partnerships to deliver breakthrough medicines. The nature of its distributed resource model means that it gathers data from dozens of CROs and other partners. Rebecca Carazza (Director, Research Informatics) described Nimbus’ automated workflow NIMBEye. CROs deposit data in Egnyte. (Egnyte is a software company that provides a cloud platform for enterprise file synchronization and sharing. It offers storage, collaboration, and sharing capabilities using a cloud infrastructure, and users can access files from on-premises and cloud environments.) AWS Lambda finds content on Egnyte, creates a Jira ticket, and adds metadata. (Jira is a proprietary issue tracking product developed by Atlassian that allows bug tracking and agile project management.) Metadata added from the configuration file drive scripts for preprocessing. If preprocessing is successful, the data are moved to validation by ChemAxon software. If preprocessing fails, a parser and warnings and errors drive tags, and push tickets to Nimbus and the Zifo R&D support team for fixing. Errors are fixed and the file is saved as a new version. The process is repeated until the data have been preprocessed successfully, and then validated by ChemAxon software (or fixed again), and loaded to the Assay Capture and Analysis System (ACAS). The ticket is then closed. SDfile curation presents a number of challenges. There are different ways of representing the data because they come from multiple sources. Capturing salts accurately is an issue. Structures may have multiple tautomer forms. Applying quality control to hundreds or thousands of structures is time consuming. Also, the V2000 format for SDfiles lacks enhanced stereochemistry. Automation has many benefits: the cost of human resource is minimized; content is standardized for internal use, and preparation for transactions; a flexible configuration allows for additional sources and mapping; the assignment of a stereo category for a molecule is streamlined; and the whole curation process is less error prone. Nimbus chose to use ChemAxon nodes in KNIME to automate SDfile curation. JChem for Office is used to view SDfiles in Excel. Rebecca showed an Excel record for a vendor-supplied SDfile with a salt embedded with the structure, which was not the major tautomer. Stereo details were missing. For the longer NIMBEye record, ChemAxon software extracted salt data from the structure and captured it in a data field, changed the structure to the major tautomer, updated the field name, and assigned a stereo category. Additional fields required by the database (supplier, chemist, and notebook data) were added via the vendor and project configuration. The SDfile was thus standardized and ready for registration. The key ChemAxon nodes used are Standardizer (to strip salts, perform 2D clean and enhance stereochemistry); MolConverter (to produce the V3000 SDfile format, and SMILES); MolSearch; Set, Invert, and Extract Atom Selection; Chemical Terms (to identify chiral centers and perform stereo analysis); Tautomer; and Elemental Analysis (to produce a molecular formula). Variables and links to configuration files are passed from Jira to KNIME using AWS Lambda scripts. Tags are added to alert the end user when a tautomer is different from the original. MolConverter converts the original molecule into SMILES. The major tautomer is identified and converted to a SMILES string which is compared to the original SMILES. Updated tautomer forms are tagged to notify the end user that the structure has changed. MolSearch indexes the atoms. Set Atom Selection identifies the parent molecule. “Invert then Extract” separates the salt.
Elemental analysis of the extracted salt converts it to a formula. Automatic assignment of the required stereo category is based on ChemAxon stereo analysis and Java code (Figure 1).
Figure 1. Assigning parent stereo category.
Rebecca concluded by acknowledging Zifo R&D for masterminding the KNIME workflow, with support from ChemAxon, and Bolt Engineered for NIMBEye integration of the KNIME workflow.
Navigating massive virtual (and real) libraries
Large diverse libraries such as ZINC (containing 730 million molecules) are used widely for high throughput screening (HTS) and virtual HTS (not to be confused with the Harmonized Tariff Schedule). Samples are expensive to acquire, analyze, and screen. Combinatorial libraries are even bigger: Enamine REAL, for example, offers 1.2 billion druglike molecules. It is impractical to enumerate massive libraries fully. DELs are growing in importance. DEL technology involves the conjugation of chemical compounds or building blocks to short DNA fragments that serve as identification barcodes, and in some cases also direct and control the chemical synthesis. The technique enables the mass creation and interrogation of libraries via affinity selection, typically on an immobilized protein target. A full library of millions of compounds can be screened, highly efficiently, in one mixture. Scientists need to manage DEL information as more libraries are designed. DELs and other combinatorial libraries can be represented as reactants plus reactions, or as Markush (R-group) space, or as enumerated compound space. Markush technology can handle enormous chemical spaces. Meg McCarrick summarized work that ChemAxon has been carrying out in this field. In an exploratory workflow in medicinal chemistry the core of a molecule with nanomolar activity in a PubChem assay may be explored in ChEMBL. The hits are filtered and a similarity search for the Markush structure is carried out to find similar structures in existing DELs. The similarity search may be carried out using ChemAxon’s MadFast, LEAP2, or rules-based search (RBS) technologies. Enumeration is necessary for MadFast to be applicable. ChemAxon have further refined and improved upon the idea of searching in nonenumerated space with two new methods aimed at dealing with huge combinatorial libraries, as described by Tamás Varga earlier. Both methods do the search of R-groups and core (or capped reactants and reaction center) with enumeration on a small subset including the most relevant R-groups. Rules-based MCS similarity search gives hits sorted by MCS similarity relevance; rules-based substructure search gives hits sorted by CHFP similarity relevance. LEAP2 and RBS MCS are not off-the-shelf ChemAxon products but are available through ChemAxon consulting services. DELs present some specific challenges. The DNA copy number may not track with binding affinity, and there can be false negatives and false positives. It can, therefore, be useful to retrieve not only hits from the DEL but also to expand beyond those hits for potential follow-up. This means looking for similar reactants, or similar compounds (using MadFast, RBS MCS, or LEAP2), or performing RBS substructure search to search across DELs for similarities. In the example Meg discussed, the RBS MCS results seem quite good at picking the most relevant structures compared to LEAP2. Searching large libraries can be a challenge but ChemAxon has several tools to help. They include not just the technologies mentioned in this talk but also Markush tools to design, edit, view, and search huge libraries. Having the right tools can make it much easier to navigate huge libraries.
Chemical Data on your Desktop
JChem for Office and Instant JChem
JChem for Office Lite has been released. It has functions for copying, pasting, and editing structures in Word, PowerPoint, and Outlook. There is no JChem ribbon, so it loads faster than the full version, but the copy-paste function still works with all editors currently supported in JChem for Office. You can copy and paste a structure into Excel. Ákos Papp gave a demonstration. He also demonstrated Biomolecule Toolkit integration in Excel, with HELM to Structure and Structure to HELM. Finally he demonstrated how two people can co-edit with desktop Excel in Office 365 even if they are using a mixed environment (32-bit Office and 64-bit Office). Tamás Juhász of ChemAxon gave a talk on impurity identification using LC-MS and in silico reaction enumeration. He summarized the benefits of using JChem for Excel for impurity identification. Chemists are familiar with using Excel. Developers need to send fewer samples for detailed structural identification so they get answers faster. Authorities might require mass balance data; the ChemAxon approach could be help in the mass balance calculation. Complicated cases might require Instant JChem or other ChemAxon products. Lukáš Marek said that Instant JChem now has support for JChem PostgreSQL and the Choral cartridge, and search hits can be relevance ranked. Instant JChem also has form editor improvements; improved query, export and schema loading performance; and some IT improvements “under the hood”. Lukáš went on to outline the future of Plexus Connect. Initially, Plexus Suite was intended to integrate multiple ChemAxon tools into a single coherent package, but as time went on, all the products developed in their own way and the whole Plexus idea was revised into a cloud solution currently called Synergy. Plexus Connect is now a web-based query and reporting tool for Instant JChem forms.
A shared Instant JChem database to improve the drug discovery workflow
Mel Manalo, who leads the Research Operations Team at MyoKardia, and Kevin Sayo, Research Informatics Analyst (Chemistry) presented a typical workflow, issues, and solutions put forward for a medicinal chemistry project in support of programs to design small molecules for synthesis. When a new project begins, a team is formed to support it. Individual chemists look at a target to design molecules. Project lead optimization teams meet weekly to review new compound ideas. Project teams nominate and select compounds for synthesis. Selected compounds are assigned “external” IDs for internal synthesis or for sending to a CRO. Compounds successfully synthesized are finally registered in ChemAxon’s Compound Registration for corporate ID assignment. By the end of 2019, this workflow had been operational for over 6 years, but it was still very manual. Presenting structure ideas for progression was daunting due to the number of structures that the team needed to assess on a weekly basis. The metadata information to support a project (such as biochemical potencies, and chemical properties such as logP and pKa) that comes with each structure needs to be presented in an efficient manner for the team to review the number of compounds proposed. The team also needs a platform that is easily incorporated in their workflow to search for compounds that have already been submitted for consideration, for synthesis. In the manual workflow, chemists would often propose ideas that had already been presented by others, months ago. It was difficult to present an individual’s ideas: ChemDraw files were collected into Box, but were not searchable. It was difficult to capture team decisions on potential compounds. Also, external CRO compound IDs had to be generated manually. To develop a solution, the research informatics team set up a project to capture all concepts in a central database. The solution had to have added metadata (tracking scientists, dates, etc.) to ensure data integrity. It also needed methods of reviewing concepts in a group setting, and tracking group decisions to progress a concept to synthesis. Other requirements were automated generation and assignment of external IDs; easy query and reporting of all the concepts generated; and integration with the current research informatics architecture. Finally, the solution needed to be implemented quickly because MyoKardia was taking on more CROs. An early proof-of-concept was created using an out-of-the-box Instant JChem project. End users tested the proof-of-concept and saw huge potential to consolidate and manage concepts, but expertise was needed to customize the project (with NetBeans, Groovy scripting, Spring Framework, script hooks, Active Directory authentication, and row level security). So, the MyoKardia Research Informatics team met with ChemAxon in December 2019 to define the scope of the work. The project was kicked off in January 2020 and the system (Figure 2) was put into production in February 2020. Mel paid tribute, in particular, to the skills of Norbert Sas and Lukáš Dopan of ChemAxon.
Figure 2. Instant JChem as a concepts database.
Customizations were carried out. An out-of-the-box “molecule matrix” and other widgets were configured for use in a project team setting. Concepts are now referred to by ID and can be displayed according to project, date, scientist, and status. Molecules are depicted in grid view and users are given visual cues (with green highlighting) to compounds chosen for progression. There are custom views for teams and individuals. At the bottom of the screen are decision buttons to automate generation of sequential external IDs; buttons to control external, in-house built applications; and a control for making a compound obsolete. Instant JChem is integrated with web applications by Java pipes and NetBeans links. The collaboration with ChemAxon was successfully completed. Mel and Kevin praised John Yucel and Tim Parrot of ChemAxon for the part they played in the project. This resulted in a cohesive platform in which all MyoKardia ideas are then curated in a client-server system that is backed by an Oracle database for structure and data integrity, security, permissions, indexing and reporting. Medicinal chemistry project teams do transactional activities in the system on a weekly basis. The MyoKardia chemistry department recognizes the system as an easy way to capture and manage new concepts in real time. The next steps are to track ideas conceived by multiple scientists; to track the status of concepts being synthesized at CROs; to achieve full integration with ChemAxon’s Compound Registration application; and to determine the feasibility of integration with an electronic laboratory notebook.
Designing New Molecules
Exploring activity cliffs using graph databases
Jan Christopherson of ChemAxon presented some of his work exploring matched molecular pairs (MMPs),2 and in particular the concept of activity cliffs,3 using ChemAxon tools in graph databases. The MMPs were generated using the ChemAxon JChem Extensions in KNIME. Neo4j was the primary interface used to interact with the graph networks generated in the analysis. Visualization was carried out with Cytoscape, plus extension applications named “chemviz2” and “Cypher queries” in order to generate visualization with chemical structures depicted. Cypher is Neo4j’s graph query language. The data studied were pIC50 values for JNK1 and JNK2 inhibitors from CHEMBL. While graph databases may be limited relative to relational database systems when it comes to certain aspects of scaling, they are excellent for exploring data that are highly related. ChemAxon have developed a proof-of-concept search cartridge that interfaces with Neo4j and provides chemical searchability. The highly relational nature of matched molecular pairs makes them a natural fit for representation in a graph database. In one image Jan showed a set of activity cliffs detected by simply assessing the relationships’ activity values. He then used the features of a graph database to explore more fluidly the space around the activity cliff, and by carrying out chemical similarity searches. Having identified an MMP that leads to activity cliff behavior, he explored around the cliff. Jan also did some work on activity pairs3 but his limited dataset did not reveal any concrete examples. While a reaction similarity search has not yet been implemented in the Neo4j cartridge, a good workaround is to set up a separate graph database, using the line graph of the original database. The same was done to the dataset by creating nodes that represent the transformations. If Jan found an interesting transformation he could run a substructure search with this transformation as the query, to find other similar queries of interest. In a typical MMP database, you have to run all the searches first and then create a graph based on the results. Jan’s methodology can create a general MMP analysis. You can carry out simple exploration of related chemical entities, and add multiple activities and activity changes in one or more relationships. You could also add Structure Activity Landscape Index (SALI) relationships.
Marvin Live: the collaborative design platform at UCB
UCB (Union Chimique Belge) is a global biopharmaceutical company focused on transforming the lives of people living with severe diseases in immunology and neurology. UCB employs more than 7,600 people in 40 countries. More than 25% of the company’ revenues are plowed back into R&D. There are two main research centers, in Slough, United Kingdom and Braine-l’Alleud, Belgium. UCB also acquired three sites in the United States and one in London, United Kingdom. Collaborative work across all these sites and time zones is a challenge.
Judi Neuss, an IT specialist at Slough, and Karine Poullennec, a medicinal chemist at Braine-l’Alleud described ChemAxon’s role in a solution addressing this challenge. Between 2014 and 2017, UCB used ChemAxon’s Compound Registration and JChem Engines. In 2018-2019, the two companies embarked upon a new collaboration around the design-make-test-analyze (DMTA) cycle. At UCB the “hypothesis” is a key to the cycle. Once a hypothesis has been established, new ideas to address it are tested. Molecules are then assessed for progression, before prioritization for synthesis. Tracking and biological testing follow. The data are then analyzed, closing the cycle.
UCB required a live discovery platform to enable scientists to share hypotheses and together create ideas across teams located in different time zones. Marvin Live was the preferred solution for a number of reasons. It has a simple and intuitive interface; strong underlying chemistry, and the ability to integrate a virtual registration system; a good selection of plugins and web services which could be configured to suit UCB; nice collaboration features; and a realistic price. A few features UCB required were missing in the version ChemAxon originally demonstrated but Marvin Live was evolving fast and UCB could take advantage of new product developments.
So, UCB committed to running a six-month pilot project, beginning in January 2019. The dual aims were to assess the value of using a collaborative design platform for real projects, for a period long enough to cover several design cycles; and to initiate improvements in cross-team, cross-site, and cross-functional collaboration. The team for the trial consisted of 22 people (medicinal chemists, computational chemists, biologists, ADME specialists, and structural biologists) in two research project teams. Touchscreens were installed in the meeting rooms to enhance the collaborative design experience. At the end of the pilot project, the participants were asked for feedback through a survey and the results were used to evaluate Marvin Live against success criteria.
UCB configured a virtual registration system for assigning virtual compounds IDs. They set up idea properties (project, hypothesis, author, status, priority etc.); plugins and web services (structure checker, preferred property calculation, regulatory status check, submission for docking, search of the virtual registry system, and retrieval of assay data); an Oracle database; and LDAP authentication. New product developments were table view; an editable view to show the key challenge that the ideas in a room were designed to solve; and new functionalities to make Marvin Live work better with touchscreens. Table view is a tabular overview to track design status and prioritization as ideas are progressed to synthesis.
There are many ways a scientist can use Marvin Live. The “one project, one public room per chemical series” is the only idea repository: Excel is not allowed. This is used and updated daily. There are regular dedicated Marvin Live design meetings where ideas are discussed and prioritized.
Centralization and the one-stop shop were liked. Over 90% of users thought that Marvin Live was easy to use; 60% found it useful. Other positive feedback was that Marvin Live enables everyone, even juniors, to contribute to designs, and the use of Marvin Live triggers discussion. On the negative side, users said that it was time-consuming to update rooms, table view was not flexible or customizable; filter functions were limited; and Marvin Live does not capture generic ideas, and does not have enough links to assay data.
On balance, the steering group decided that the pilot project was a great success. All features implemented in the pilot project were moved into production, and more ADME and local mode predictions were added. Master Data Management (MDM) was used for discovery project names.
There are some bugs, and it is still slow to update the rooms. UCB would like better ways to record the hypothesis and to be able to attach files to the hypothesis. They want some improvements to make the table more like Excel (e.g., better sorting and filtering, and the ability to move the columns around). They would also like more integration with data analysis, molecular modeling, and other software; global search; and a tree map to follow idea devolution to see if the idea worked.
Nevertheless, Marvin Live has created new ways of working in UCB; the numbers of users and of designs have increased; and there is an increasing “ping pong” of ideas. The system improves collaboration, transparency, and progression of ideas.
Workshop on Design Hub
Marvin Live has evolved into ChemAxon’s Design Hub. A two-hour workshop focused on how solutions in the new Design Hub can be used in lead optimization and compound tracking. Dóra Barna of ChemAxon gave an overview of the application. In the hypothesis part of the DMTA cycle, scientists need to analyze previous cycle data, prioritize compounds using ideas from the literature, check the status of a project, find out why a certain compound was made and what was learned, trace elusive files, and so on. Marvin Live already featured molecule design but Design Hub adds many more facilities (Figure 3). One is handling the hypothesis: all the evidence supporting an idea (the scientific rationale behind it). Plugins for docking and clustering are new. Teams, projects, registration, status tracking, and assignment have been added. Commenting, file attachments, and tags are new. Universal search and role-based access control are other added features.
Figure 3. Design Hub technology.
The design history from legacy projects can be searched based on description, keywords, and structural information. A hypothesis is made from the scientific rationale and tested by the design of compounds. For the plugins, NodeJS and Java starting kits, a Python helper library, and serverless support (AWS Lambda and AWS Fargate) are available. More than 50 examples have been published, with source code on GitHub. Searches of eMolecules can be carried out. The hERG assistant has a Matched Molecular Pairs based knowledge base. The model is trained on data from a patch clamp assay in ChEMBL. There is a REST API to JChem Base. The hERG assistant will soon be extended with the new ChemAxon hERG predictor. There is also a REST API to Compliance Checker. An open source docking engine, with a REST API to ChemAxon property predictors, produces a protein-ligand interaction display. You can also use your own docking engine. Enamine REAL substructure and (very fast) similarity searches are carried out with a REST API to JChem Microservices. Soon there will be an “all public database search as a service”. After design, compounds are prioritized for synthesis. The progress of projects can be followed. Design Hub has features for projects, virtual registration, graphical hypotheses, virtual and real data, and compound grouping, and kanban (a framework used to implement agile software development) for productivity. Design Hub Basic and Professional Subscriptions are cloud-only and vary in the number of plugins and users allowed. Design Hub Enterprise is on-premises or hosted.
Capture, Retrieve, and Analyze Chemical Data
ChemAxon Synergy. Research data management in the cloud
ChemAxon Synergy is an evolving cloud platform (Figure 4). Csaba Peltz ran through some of the current modules: capturing compound data, checking and fixing structures, similarity search, uploading and normalizing assay results, and using Excel-based templates for automation.
Figure 4. ChemAxon Synergy.
Tableau, with embedded chemical intelligence, is used for visual data exploration. There are plans for R-group decomposition in future, and for expansion into biologics. Design Hub is in the Synergy network. Moving forward, a gap will be plugged by connecting laboratory workflows: managing samples, capturing experiments, exchanging data, and integrating instruments, and other applications. Katarina Kubasch of ChemAxon demonstrated chemical intelligence in Tableau powered by JChem Microservices. SMILES, InChI, and .MRV files can be read and SAR tables can be constructed. Katarina showed a compound view, dynamic scatter plots, and heat maps. She compared two compounds, the choice of what to visualize for the compounds, and the ability to change axes. Potential users can request access to the demonstration online.
A new approach to an ELN for the chemical enterprise and ChemAxon solutions for its chemical functionality
Adama Agricultural Solutions is one of the world’s leading crop protection companies with headquarters in Tel Aviv, Israel. Michael Grabarnik of ADAMA gave a presentation and demonstration of ADAMA’s Skyline Electronic Laboratory Environment (ELE) developed by Comply. Skyline supports all R&D functions from feasibility studies of chemical synthesis up to final formulated product development and its delivery to manufacturing plant. It supports all development workflow and communication between different functions without external tools (Figure 5). It includes safety and equipment management.
Figure 5. Structure and workflow in R&D.
Skyline is a project-oriented system: all the information about the project is concentrated in one place. “Private” information was eliminated from ELE: all information is available to all users according to their access rights and is easily sharable and searchable. The Inventory system relies on Marvin JS and JChem Web Services for chemical structural drawing, presentation, and search. ELE includes an interface to query and retrieve chemical structures and physical properties of the compounds from the Chemical Abstracts Service (CAS) database. Chemical reactions in experiments are drawn using Marvin JS and may be drawn automatically by uploading chosen materials and their parameters from Inventory. Skyline has very effective search functionality in the ELE database and attached documents, facilitated by ChemLocator. Michael demonstrated substructure search in the inventory system. Relevant compounds were chosen from the presented list and a new search in the entire database for these specific compounds gave a list of results with links to specific items (experiments, samples, analysis etc.) in the database.
József Dávid said that ChemAxon customers had not found SharePoint easy to use, so ChemAxon developed ChemLocator. Every year more than 20,000 new compounds are published in medicinal and biological chemistry journals.4 Soon it will be very difficult to find and digest them all manually. ChemLocator is a web-based search tool to find chemistry in unstructured data, using ChemAxon’s Chemical Name and Structure Conversion, and related technologies, and free text indexing. It creates a database of chemical structures and related data, and it does not duplicate documents. It solves the problem of locating and registering structures from multiple documents in multiple locations. The documents and user interface are linked to an API in the ChemLocator Docker architecture allowing business logic to be applied before construction of a PostgreSQL database.
Use of ChemAxon Marvin JS and JChem library to support the development of a new web application for iPPI-DB
Protein-protein interaction (PPI) targets present a challenge. It is difficult to find nonpeptide compounds that modulate them. Not all PPIs are suitable as therapeutic targets; not all PPI targets are equally suited for small molecule modulation; and chemical libraries are not designed for PPI. There have, however, been some successes: Venetoclax has just been approved by the FDA. A talk by Olivier Sperandio of the Pasteur Institute concerned recent enhancements to iPPI-DB, a database of modulators of protein-protein interactions (PPIs).5,6 It contains only small molecules (no peptides). The data are retrieved from peer-reviewed scientific articles or patents. The data stored are very varied and include structural and pharmacological information, binding and activity profiles, pharmacokinetic and cytotoxicity information when available, and some data about the PPI targets themselves. Olivier presented a new web application, a query interface using Marvin JS and JChem. Compounds can be displayed in grid or table view. Search fields are classified according to physicochemical properties, chemical similarity, chemistry rules, activity and efficiencies (ligand efficiency (LE) and ligand-lipophilicity efficiency (LLE)), mechanism of action, and external links. Physicochemical properties are calculated using ChemAxon Calculators and Predictors. Olivier showed plots of the druglike distribution of properties for all the compounds in the database. In table view, columns can be added and sorted. In the card view for each compound there are tabs for compound data (identifiers and external links); physicochemistry (property values, colored by value range, and a radar plot); pharmacology (LE versus LLE, and assay and activity data); and similarity (a table of similar compounds in DrugBank). Results can be exported as a .csv file. The web application also allows contributors to login and submit data to iPPI-DB. A curator validates contributions before they are added to the database. PubChem and DrugBank are automatically updated once new entries are added to iPPI-DB. In future the database will also consider the target itself. Better functionality to mine the information will be offered. Targets and compounds will be cross-referenced even more efficiently.
Chemicalize Professional. Hosted services and web components to enhance cheminformatics on your own website
Chemicalize Professional provides hosted search, chemical drawing, compliance checking, and calculations as embeddable web components and hosted backend services. Users do not need to care about deployment, maintenance, or updates: all services are managed by ChemAxon and hosted on the highly available AWS infrastructure of the Chemicalize platform. József Dávid presented a real-life use case: a chemical supplier’s web-based sample ordering system. Benefits of Chemicalize Professional include chemical features right into your site; increased speed in building the site; availability of the newest developments; freedom from IT and operation costs; subscription based SaaS; popular ChemAxon features; the latest versions always available; pay as you go; minimal learning curve; easy integration; and scalability and security.
Compliance Checker goes hosted
Ákos Papp of ChemAxon was the speaker. The Compliance Checker system relies on a web service-based technology, and it can be accessed from the client computer through its GUI, or it can be integrated into the internal system to check the compliance of the molecules in a compound database automatically. The deployment can be on-premises, or it is possible to move the web service and the related computation resources to the cloud.[Compliance Checker[(https://chemaxon.com/products/compliance-checker) may be available as a hosted solution, where the licensing of the software is separate from the hosting costs, or it can be provided in the form of SaaS, where the users can choose between the different levels of performance and services through the corresponding subscription types, which include both the hosting and software license and maintenance cost. ChemAxon will provide two major subscription categories: Standard and Lite. In the Standard category, the basic subscription is for small and medium enterprises (SMEs) for up to 10 users; the medium subscription (with higher performance than the basic option) is for SMEs with 25 users; and the professional subscription enables an unlimited number of users from a single site of the company. The premium level is for global, unlimited access, with higher uptime and performance, plus premium support. Lite subscription types allow for checking 20,000 compounds a month, or 80,000 compounds quarterly, or 500,000 annually. cHemTS will also be available by SaaS subscription soon.
ChemAxon has 60 partners. There were 20 partner presentations at this meeting. Greg Landrum of KNIME extracted chemical structures from the PDF of a J. Med. Chem. paper, selected relevant ones using descriptors from RDKit, carried out Library MCS clustering, found the centroid of an interesting cluster, and picked all the compounds from the paper with that substructure. He made a Markush structure from them, carried out Markush enumeration, and identified 685 new compounds that could have been in the publication.
Jane Reed and Andrew Hinton from Linguamatics illustrated how I2E adds context to the chemical names and structures extracted from documents by ChemAxon software. A talk at the Linguamatics user meeting showed how Roche established compound-target-disease relationships, reducing two weeks’ work to a few hours’ effort. Andrew gave a demonstration of how I2E associated structures from some MEDLINE abstracts with progesterone receptor activity, and dosage, treatments and causes.
Bérénice Wulbrecht of ONTOFORCE demonstrated how a DISQOVER knowledge graph combined with ChemAxon chemistry gives an exclusive navigation experience. She connected substances with clinical trials, phases, assays, diseases, countries, patents, proteins, phenotypes, and so on. DISQOVER connects up siloed data and makes linked data.
Patcore and ChemAxon have been collaborating for 15 years. Patcore wrote the original Compliance Checker software. Abe Yuichiro presented PatCore’s Transformer2 tool for matched molecular pair analysis. It gives compound modification ideas from structures, by using bioisosteric transformation rule bases such as EMIL and BIOSTER, as well as in-house rules. Feasible candidates can be filtered based on properties, structure frameworks, and substructures.
Tableau has been in the machine learning and AI business for 17 years. The company helps people to see and understand data. Alex Bougrov of Tableau said that ChemAxon has added chemistry to Tableau. Katarina Kubasch of ChemAxon repeated the demonstration that she had given earlier in the meeting.
Michael Boruta of ACD/Labs described the Katalyst D2D web-based software application that offers a single interface for high throughput experimentation “from design to decide”. It uses Marvin JS, and ChemAxon software for enumerating reactions.
BioSymetrics is a biomedical AI company founded in 2015. Mikalai Malinouski outlined the company’s ContingentAI product and, in particular, the mechanism of action prediction platform. DeltaSoft has been run by scientists for scientists for 20 years and has thousands of users worldwide. The company offers both solutions and consultancy. Diana Soto outlined the suite of ChemCart products.
Inquiro by Dexstr is an insight engine for life sciences which incorporates Marvin JS, JChem, and Document to Structure. It places data from all types of files and locations into a single repository and enriches them with an ontology. Stéphane Rouillé presented a COVID-19 use case, visualizing data in a relation graph and heat map.
Ian Peirson of IDBS illustrated stoichiometry calculations, design of experiments, and registration of chemicals and biologicals in E-WorkBook. These features benefit from Marvin JS, JChem, Compliance Checker, Plexus Design, Compound Registration, Biomolecule Toolkit, and BioEddie.
Jessica Baycroft said the combination of IFI Claims with ChemCurator offers the best tools, data and results when it comes to extracting chemical information from patents. IFI Claims’ algorithms ensure high quality for the data. Machine translation has been improved.
[Iktos]http://iktos.ai/ applies AI to chemistry. The company’s Spaya retrosynthesis software, a free application online, is based on the Pistachio dataset from NextMove Software, and Mcule compound sourcing for commercial compounds. Vinicius Barros discussed rule extraction, atom mapping, and the Monte Carlo tree search algorithm.
The Mcule compound sourcing service incorporates ChemAxon Calculators and Predictors and Compliance Checker. Gergő Prikler of Mcule described this and the EU-supported ULTIMATE database project offering 122 million novel, synthetically feasible compounds, which have an 80% success rate of synthesis, and can be shipped within two to six weeks. API access through Marvin Live is possible.
Santiago Dominguez of Mestrelab Research presented Mbook Analytical: a web-based, cloud environment to configure instruments, request analyses and update results in the Mbook ELN. Mnova Gears connects the analytical chemistry data software Mnova to Mbook Analytical. The ELN uses Marvin JS and JChem.
MolPort offers a database of 7.6 million verified commercially available compounds. Andrii Lozoniuk emphasized the importance of high quality data. Molport has been a ChemAxon software user for more than 12 years and recommends Marvin JS, JChem, Compliance Checker and standardizers. MolPort data are available on the web; and by FTP; and by API in KNIME, Pipeline Pilot, Excel, Marvin Live and other applications.
Lutz Weber of [OntoChem](C:\Users\Wendy\Documents\ChemAxon\ChemAxon 2020\OntoChem) demonstrated SciWalker Open Data, a web search engine that implements advanced information retrieval and extraction from abstracts, full text articles, patents, and web pages. It uses a set of multihierarchical dictionaries for annotation and ontological concept searching. It can be used to annotate any public or internal repository of heterogeneous documents. The search interface allows queries with logical combinations of free text and ontological terms.
Kevin Cramer presented the Exemplar LIMS and ELN systems from Sapio Sciences which use [MarvinSketch]https://chemaxon.com/products/marvin) and other ChemAxon software to register and search compounds and reactions. He demonstrated stoichiometry calculation, the construction of a chain of reactions, and linking a reaction to a new experiment.
SciBite technology unlocks data and knowledge from life science texts. Its semantic platform makes extracting and analyzing data easier. Sam Shelton said that VOCabs and SciBite’s term identification, tagging and extraction (TERMite) are integrated in ChemLocator. SciBite’s VOCabs contain more than 20 million synonyms across more than 80 life science topics, including genes, drugs, diseases, and adverse events.
Matjaž Hrèn demonstrated the SciNote ELN which keeps all project data safely in one place and interconnected. It is flexible enough to accommodate both small organizations and those with many users. Marvin JS is integrated.
ChemAxon Compound Registration is integrated by RESTful API with the Mosaic product suite. Marcus Oxer of Titian described the use of Mosaic for sample management, covering inventory tracking (with real-time information on sample location, container information and substance metadata), ordering, sample processing, assay requesting, integration with automated storage and sample handling systems, tracked shipping, and connectivity.
IP and Markush Technology
Cheminformatics and IP
Árpád Figyelmesi reported that a new Markush search engine to be released in quarter 3 of 2020 will be 100 times faster in full structure search. Memory requirements will be reduced by 90% and there will no longer be impossible structures. This will open up the way for new representation features. ChemAxon has also reported a novel algorithm for automatically generating Markush structures from series of specific compounds.7 This method can effectively be used to assist patent drafting or to compose combinatorial libraries based on several molecules of interest. This Markush Editor algorithm is available in multiple ChemAxon software products.
See intellectual property (IP) differently. The power of using visualized IP strategy and intelligence to guide molecular research and drug discovery
Two cofounders of Accencio, Kevin Brown and Kevin Brogle, knew from their own experience in the biopharmaceutical industry pre-2017 that patent prosecution, freedom to operate, and medicinal chemistry strategy processes are overly burdensome and inefficient; and that patent landscaping, data gathering, and attempting to explain what was uncovered to support various groups using traditional methods are inefficient. Kevin Brown told ChemAxon users that the solution is to move away from traditional methods of information analysis and delivery by creating easy-to-use tools that synthesize all information necessary for strategic decision making. Data must be visualized through an IP lens. To create a lens, Accencio find the IP relevant to a specific area of interest (e.g., a biological target, a drug product, or a therapeutic area) and start by developing visualizations showing the structural and other relationships. They add other curated information as necessary, and then end users interact with these visualizations, with or without add-on, in-depth expert analysis, to guide their decision-making. Accencio’s IP-GeoScape is a multidimensional lens offering a visual landscape of molecular IP space, revealing structural relationships between molecules within a biological target area of interest. Kevin Brown presented a case study in small molecule analysis and licensing. The medicinal chemistry department of an Accencio client was asked to analyze a targeted IP space to assist the business development and legal functions in partnering one of its chemistry programs. Analyses were performed using IP-GeoScape (Figure 6). The client owned 40 patents or patent applications and more than 500 unpublished research molecules in the medicinal chemistry space. The client was able to understand the space visually and identified high-value, and in some cases previously unknown, potential joint venture (JV) partners.
Figure 6. IP-GeoScape lens prepared for the purpose of identifying potential licensing partners for an Accencio client.
In Figure 6, the marking is essentially the colored “dot” seen. Each marking represents a specific molecule which is exemplified within a particular patent. If the image were enlarged, it would be clearer that each marking has a specific shape. The color denotes the company which holds the patent; the shape denotes the specific patent in which the molecule represented is exemplified. Each marking is interactive: the user can click on one or more markings to allow the full molecular structure or structures to pop up on the screen. One of the potential partners (“Company 1”, (Figure 7)) had a large footprint, was dominant in the southern hemisphere, and had robust R&D activity in 2009-2014. There was some overlap with the client’s unpublished IP, especially in terms of IP recently published by Company 1. There was a significant structural difference between Company 1’s clinical candidate and the client’s clinical candidate. Company 2 (Figure 8) had a relatively small footprint. Overlap with the client’s unpublished IP was significant, especially with the most recently published IP of Company 2. Infringement analysis showed that Company 2 exemplified molecules claimed by older client patents. The client entered into a JV agreement with Company 2.
Figure 7. Company 1 compared with client.
Figure 8. Company 2 compared with client.
To create its IP-GeoScape lens, Accencio identify all patent publications associated with a biological target area of interest and they extract the molecular structures exemplified within the patent claims using ChemCurator. This is followed by the use of their proprietary mathematical approach to cluster molecules and create the lens, an interactive visual molecular landscape. Users access the lens and analysis generated through the AccencioView innovation platform. In many cases, Accencio are extracting thousands of compounds from hundreds of patents for their IP-GeoScapes. Extracting structures from so many patents without ChemCurator would take so long that most projects would be impossible, and purchasing the necessary structures from a data provider would be cost-prohibitive. With ChemCurator, extraction times are reduced significantly compared to various manual methods. This efficiency has been critical to the success of the IP-GeoScape product in general, and its current work on COVID-19.
IncoPat together with ChemAxon: your partners on the road of innovation
incoPat, a patent database vendor from China, has formed a partnership with ChemAxon, as a result of which, incoPat will be able to provide a better service for their clients, especially those from the pharmaceutical industry. Since 2011 the incoPat database has grown into a global collection of patents from 120 countries with integrated functions including patent search, analysis, and monitoring. There is also a module called incoFolder allowing users to save patents as a folder, make comments, and index and share the folders with team members or clients with incoPat accounts. June Tian explained what the company offers. The backbone of the database is the full text of patents, with English translations. There are also value-added data such as legal, operational and market data. Moreover, incoPat collects some unique Chinese-related data such as Patent Review Procedure information including annuity payment, office actions, and all notification files of Chinese patents. These can be difficult for foreigners to access on other platforms. Also included are patent Chinese Custom Registration information and National Declassification patents. There are more than 295 searchable fields. You could search for Chinese patents with awards issued by the government; you could analyze the type of Chinese patent applicants (e.g., academic, corporate, or individual); you could analyze Chinese patents by provinces, cities and even counties; you could check the patent life or examination duration; and you can check if any Chinese patents are for sale. For any patents approved by the U.S. FDA, incoPat provides FDA application number, approval date, Chinese and English names of the drug, active ingredient, brand name, company, highest status, dosage form and route, strength, patent expiration, marketing status, target, and indication. All these fields can be searched. Searches for patent invalidity or validity are conducted to validate the claims made by a patent or to invalidate one or more claims of a competitor’s patent. They are the first step taken by a company when faced with a patent infringement lawsuit. June’s example concerned “CN2894703Y, Sterilizing-disinfecting Device with Automatic Delivery and Assembling-Disassembling Unit”. She input the patent number into the AI search module of incoPat and the system generated a map showing the relationships of different parts of the device. The relationship between parts could be changed or new parts could be added to make the results more accurate. Next, some relevant keywords could be selected to rank the results according to relevance. In the example, the system provided 2000 similar patents, and the patent that was the subject of the invalidation search was ranked number one. AI search is very useful if you want to conduct invalidation, novelty or clearance searches. Graph search is also possible. If you upload a picture onto incoPat and select the Locarno classification (LOC) number, the system will generate all related designs. June illustrated this with a novelty search for a dosing bottle. Using incoPat you can also do industry analysis, look for potential technology licensing partners, investigate legal or operational information, search for R&D collaborators, look for patents for sale, and monitor competitors. June presented a competitor intelligence study of Eli Lilly. Since incoPat has standardized the name of companies, you can find information on all Lilly-related companies, including their branches and subsidiaries, and you can also search the patents purchased by Lilly. Moreover, you can set up your own company name database adding and saving all Lilly-related company names. Thus you can obtain a full list of Lilly patents, read the details, and get a general picture, for example, of the validity of those patents. You can visualize the patent portfolio and technologies to see where the hotspots and blank areas are, and check for patents with high value, or patents with litigation events. Thanks to the technological support of ChemAxon and Marvin JS, incoPat recently released a chemical structure search module. Users can now search chemical patents by Chinese or English names, CAS RN, molecular formula, or chemical structure or substructure. They can also convert names and structures into SMILES for patent matching. Potential users are welcome to contact incoPat for a free trial.
This year’s virtual meeting was rather longer than the usual real one, and that is reflected in my slightly longer report. The partner session was particularly long (three hours) compared with the face-to-face equivalent, but it is always a useful addition to the meeting. I have reported it in less detail, however, in order to concentrate on the all-important user talks, and the large number of software enhancements and strategic announcements from ChemAxon as a company. I was particularly interested in ChemAxon’s prompt response to user feedback and the rapidly changing software technology environment. What I missed, of course, was meeting everyone in person in Budapest. I am really hoping that ChemAxon will invite me again, and that we will all have a real meeting in 2021.
(1) Hu, Q.; Peng, Z.; Kostrowicki, J.; Kuki, A. LEAP into the Pfizer Global Virtual Library (PGVL) Space: Creation of Readily Synthesizable Design Ideas Automatically. In Chemical Library Design. Methods in Molecular Biology (Methods and Protocols) Vol. 685; Zhou, J., Ed.; Humana Press: New York, NY, 2011. (2) Hussain, J.; Rea, C. Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets. J. Chem. Inf. Model. 2010, 50 (3), 339-348. (3) Stumpfe, D.; Hu, H.; Bajorath, J. Evolving Concept of Activity Cliffs. ACS Omega 2019, 4 (11), 14360-14368. (4) Krallinger, M.; Rabal, O.; Lourenco, A.; Oyarzabal, J.; Valencia, A. Information Retrieval and Text Mining Technologies for Chemistry. Chem. Rev. 2017, 117 (12), 7673-7761. (5) Labbe, C. M.; Laconde, G.; Kuenemann, M. A.; Villoutreix, B. O.; Sperandio, O. iPPI-DB: a manually curated and interactive database of small non-peptide inhibitors of protein-protein interactions. Drug Discovery Today 2013, 18 (19-20), 958-968. (6) Labbe, C. M.; Kuenemann, M. A.; Zarzycka, B.; Vriend, G.; Nicolaes, G. A. F.; Lagorce, D.; Miteva, M. A.; Villoutreix, B. O.; Sperandio, O. iPPI-DB: an online database of modulators of protein-protein interactions. Nucleic Acids Res. 2016, 44 (D1), D542-D547. (7) Kovács, P.; Botka, G.; Figyelmesi, Á. Automatic generation of Markush structures from specific compounds. World Pat. Inf. 2019, 57, 59-69.