Community

FAQs: Chemical Space & Selection

How do you assess chemical similarity?

E-mail Print PDF

The chemical similarity of compounds can be based on molecular structure or any measured or predicted properties of the molecules, or a combination of the two. The most common approach is to use the molecular structure to assess the chemical similarity within a set of compounds. This approach defines the similarity in terms of the patterns of atoms present in the structures. The patterns of atoms along ‘paths’ through the 2D chemical structure are encoded in a binary ‘fingerprint’ and the similarity of two compounds defined in terms of the Tanimoto index. The advantage of a path-based fingerprint approach to similarity and diversity is that it provides a ‘generic’ method of comparing compounds. No assumption is made regarding the characteristics of molecules that will correlate most strongly with the biological activities of interest. Also, similarity assessed in this way usually corresponds well to a chemist’s view; compounds from within a chemical series will typically have a high Tanimoto index.

 

What are the axes of the ‘chemical structure’ space plot?

E-mail Print PDF

It is not possible to quantify the axes of the structural chemical space plot as they do not correspond to a particular property or descriptor. The structural chemical space plot is essentially a two-dimensional approximation of a multi-dimensional space, where similarity between each molecule in the data set represents a single dimension in the multi-dimensional space. Using an approximate form of Principle Component Analysis (PCA) (full PCA would be prohibitively slow for large numbers of molecules), this is reduced to a two-dimensional space that maximizes the visible variation. As a result, the two dimensions are effectively a function of the molecules’ similarities and do not have an explicit meaning. The dimensions are best considered as distance measures such that the closer molecules are in the plot, the greater their similarity. Bear in mind that the more molecules there are in the plot, the more approximate the visible distance will be for any two individual points, although the overall trend covers the diversity of the entire set.

Last Updated on Tuesday, 29 September 2009 20:11
 

How is the ‘property space’ plot created?

E-mail Print PDF

Property chemical space plots employ a simple PCA approach and enable visualization of the distribution of compounds with respect to their properties. Groups of compounds with a similar profile of properties will cluster together in this type of space. Additionally, if specific descriptors are expected to correlate with an important property, these may be imported into StarDrop and used to define a chemical space. Thus, the diversity of selections with respect to these descriptors may be visualized to aid in library design.

 

Can you define chemical space using your own descriptors?

E-mail Print PDF

Yes, StarDrop can build chemical space projections using any combination of continuous, categorical or structure data, including imported data.

Last Updated on Tuesday, 29 September 2009 19:56
 

What is the benefit of a ‘mixed’ structure/property space plot over the other two options?

E-mail Print PDF

Space plots defined using both chemical similarity and user-selected properties tend to better separate different chemotypes if they exhibit a similar spectrum of properties.

 

How many compounds can I use in a chemical space plot?

E-mail Print PDF

The chemical space display has no hard limits. However, generating the underlying chemical space plot will take a prohibitively long time for more than ~10,000 molecules. For sets bigger than this, it is worth using the Selection tool to perform a random selection of up to 10,000 molecules and use this selection to generate the chemical space. It is then possible to project any number of molecules into this space. However, the chemical space plot has been designed to work most efficiently for displaying sets of up to about 100,000 molecules on typical desktop hardware.

 

Can you run Chemical Space without a server?

E-mail Print PDF

Yes, this runs on the local machine.

 

What are the requirements to run a ‘biased’ selection?

E-mail Print PDF

The data set should contain structures and have had a scoring profile applied. This enables molecules to be selected that have the best possible balance between performance against the scoring profile and structural diversity. In this case the user can determine the bias between 'Rank' and 'Diversity'.

 

How is chemical diversity assessed?

E-mail Print PDF

Chemical diversity is defined in terms of the patterns of atoms present in their chemical structures. The patterns of atoms along ‘paths’ through the 2D chemical structure of a compound are encoded in a binary ‘fingerprint’ and the similarity of two compounds, A and B, can then be defined in terms of the Tanimoto coefficient. The advantage of a path-based fingerprint approach to similarity and diversity is that it provides a ‘generic’ method of comparing compounds.

 

Why is the compound selection based on a genetic algorithm?

E-mail Print PDF

The number of possible selections increases exponentially with the size of a virtual library, e.g. there are 2.6x1023 ways of choosing 10 compounds from a library of 1,000. Therefore, when considering diversity, it rapidly becomes impossible to perform an exhaustive search for the optimal selection for a given set of criteria. Instead, a ‘stochastic’ approach must be taken, which cannot guarantee to identify the optimal solution but will find the optimal or a near-optimal selection with high probability. Genetic algorithms are a well known and robust approach commonly used in this context.

 

Should we wait for the selection algorithm to reach 1?

E-mail Print PDF

The maximum value that can be achieved for the optimal selection will depend on the balance of rank and diversity requested and the characteristics of the compound set from which the selection is being made. Commonly, this optimal value will be less than 1, unless your data set is small. Even in the cases where it is possible it could take a long time to achieve so we recommend you wait until the plot reaches a plateau.

 

What is the appropriate balance of quality and diversity?

E-mail Print PDF

It is usually beneficial to explore the sensitivity of a selection to the degree of bias chosen before making a final decision. Often, a significant degree of added diversity can be explored for a small decrease in the overall quality of the compounds selected. In this case, it is advisable to spread the risk across diverse compounds, provided synthetic resources permit. Conversely, in some cases, the selection of compounds will remain the same until a large bias toward diversity is selected. In this case, the selection of a diverse set may require an unacceptable decrease in the overall quality of the compounds. As a general rule, at the earlier stages of a project where little is known about the SAR of the target, it is advisable to bias a selection in favour of diversity. Typically a diversity:rank ratio of 80:20 will sample across the extremes of chemical diversity, whilst still ensuring that top scoring compounds are represented within the selection. As the project moves towards the candidate stage, it will become more important to bias the selection towards ‘good’ compounds. In this case a diversity:rank ratio of 20:80 may be more appropriate. Note that a diversity:rank ratio of 100:0 will select molecules on the basis of their structural diversity and the newly selected set will mirror the diversity of the original set (assuming a reasonable sample size of molecules has been selected). A diversity:rank ratio of 0:100 will select molecules entirely on their 'Rank' and the top molecules will be selected.

 

Can I do a random selection?

E-mail Print PDF

Yes, there is an option for random selection. Unlike the biased options, random selection option does not need either chemical structure information or scoring profile results to be run.

 

Can I save the selection?

E-mail Print PDF

Yes, any selection automatically creates a new data set than can in turn be saved as a StarDrop file or exported as .sd or .csv files.

 

What is the maximum number of compounds that can be selected?

E-mail Print PDF

Selection of compounds with a consideration of diversity can be computationally demanding. The computational cost scales as the square of the number of compounds being selected. In practice, a reasonable limit is approximately 1,000 compounds.

 


Login






Forgot login?
No account yet? Register

Latest Forums

Read more >

Popular Downloads

Read more >