Hierarchical clustering of chemical structures by maximum common substructures

Posted by
Miklós Vargyas
on 13 09 2012

Cluster analysis has been shown to be successful in the categorization of physico-chemical and biological properties of compounds. However, conventional approaches to clustering molecular structures, where chemical graphs are transformed into sequences of numbers, seldom meet chemists' expectations. Graph based techniques that cluster compounds with respect to common structural motifs are gaining in popularity as these can better mimic human categorization. One such graph based method, called LibraryMCS, which clusters compounds according to their maximum common substructures (MCS) in a hierarchical manner is presented. Unlike some other graph based clustering methods, LibraryMCS neither involves a similarity based pre-clustering step nor relies on predefined fragments. Recent evaluation by different research groups indicated that LibraryMCS was capable of producing high quality clusters agreeing with human categorization within practicable time (approximately 1000 structures/s). The presentation will recount and demonstrate typical usages of LibraryMCS: virtual HTS hit set profiling, R-group decomposition by learned scaffolds, perception of novel scaffolds, reverse engineering of combinatorial libraries, diversity assessment of large chemical library and compound acquisition.

Download the presentation in pdf