MadFast Similarity Search
Blazing fast similarity searching tool
MadFast is a high-end toolkit for ultra fast chemical similarity search. It relies on optimized multi threaded-implementation and in-memory data storage. The outstanding search performance extends the chemical space available for live search to hundreds of millions of compounds. Rapid fingerprint generation and short initialization time, along with a large set of comparison methods, provide you the possibility to optimize the similarity space. MadFast is a Java application that is available via versatile interfaces: command line, REST API and Web UI.
- Web UI client library and extension point added
- Web UI improvements
- Raw file handling as a layer for exposing custom content
- Asynchronous server loading to follow server startup through the REST API / Web UI
- Asynchronous search calls for slower (>200ms) operations to follow progress / cancel through the REST API / Web UI
See History of changes for the description of all changes.
Similarity based overlap analysis
Similarity based overlap analysis (full matrix calculation) of large libraries, up to millions of compounds is possible with the fast multi query similarity search implementation. Additional properties of the input molecules can also be used in the overlap visualization. Find out more about storing additional data, or in the overlap analysis documentation.
1M by 1M exhaustive similarity search using 1024 bit binary fingerprint takes
- ~30 minutes on c3.8xlarge AWS instance
- ~8 minutes on x1.32xlarge AWS instance
Real time similarity search
Visualization of the similarity search results on a web interface lets users experience the real time responsiveness during similarity search on a large number of structures.
To showcase how fast you can get results, MadFast delivers the 40 most similar structures in
- ~80 ms per 16 M structures (using 1024 bit binary fingerprints on an Amazon r3.8xlarge machine)
- ~5 sec per 1 billion structures (on the same machine)
- 250-350 MB of memory usage per million molecules (using 1024 bit binary fingerprints)
- 1 million structures per minute preparation (import) speed.
Ad-hoc focused chemical space analysis
MadFast enables the utilization of various descriptors, descriptor configurations, and comparison metrics. The web-based interface is designed to display search results from multiple data sets with dissimilarity distribution histograms.