Clustering by scaffolds

Posted by
Filip Sedlák
on 03 09 2013

This post is a tutorial to a new component in IJC, the Tree Table. Do you have some interesting use cases? Share them in comments below this article.

So what can we do with this Tree Table? We will cluster structures based on their scaffolds. The Tree Table doesn't perform any chemical calculations. It can group by molecular scaffolds as well as by dates of sample measurements or by any other data so we will have to first generate the scaffolds.

Let's do it!

Generating scaffolds

Select the data tree or entity containing the structures and edit it. Add new Chemical Terms field to the structure entity. Chemical Terms is a simple computation language which allows to transform structures or to calculate their properties. We'll enter the formula for generating Bemis-Murcko framework of the molecule.

bmf()

This algorithm keeps only the rings and their linkers. If there are none, a simple methane structure is returned. There are also other variants.

There is still one small change we have to make to the formula. The .mrv format in which the structures are stored by default is not canonical. That means we can have multiple representations of the same molecule. To be able to group, we need a canonical format of the structure. ChemAxon unique SMILES (smiles:u) guarantees a single SMILES representation for a given structure. Our Chemical Terms formula will look like this:

molFormat(bmf(), "smiles:u")

Screenshot of interface with Chemical Terms

Once the field is created, we should tell IJC to treat its contents as structures, not as mere texts. We do this by setting an appropriate mime-type in the field's extra options. Don't forget to hit Apply in the bottom right. This will ensure the field is rendered as structures. If we wouldn't do this, we would have to change the renderer in the widget settings.

Configuring Tree Table

Once we have the scaffolds, we can go grouping. In a Form view for the Entity containing scaffolds, add the Tree Table widget.

A configuration dialog appears. Don't panic! This is not a rocket science! IJC asks you which fields it should use as categories or grouping values and which fields should be displayed as regular columns. We'll select only the scaffold field for grouping. This will create groups of structures which have the same scaffold.

Add the fields which interest you as "Value fields" and you're done. When you switch to the Browse mode, you should see a Tree Table grouped by structural framework.

If you don't see the frameworks but SMILES strings, right click within the Tree Table, select "Customize Widget Settings" and switch the column's renderer.

In that customizer, you can also select to compute some statistics for the regular fields, such as sum, mean, minimum and maximum.

You can set conditional formatting for Tree Table in the same way as for the other widgets, so the result might look like this:

The left column shows the groups which can be expanded. The members of the group share the same scaffold. The number in parentheses aside of the scaffold displays the count of group members (number of structures with the same scaffold). We compute mean of "logP / MW" for each group as well as sum of number of assay values. These values are displayed in the row with scaffold.

TL;DR

  • Use Tree Table for grouping by values of one or more columns.
  • For grouping by structures, you need to use a canonical format such as smiles:u.
  • You can compute aggregated values for groups and use conditional formatting.
  • Step by step instructions are described in the article.