Welcome to the Optibrium Community





Forgot login?
Register

FAQs

Search


How can I define a Scoring Profile for a property?

Tuesday, 29 September 2009 21:22
E-mail Print PDF
Ed Champness

A Scoring Profile contains a list of properties and for each property a function defining what an ideal value, or range of values, might be. The scoring function can be a simple threshold criteria, a desired range of values, a trend across a range or combinations of these. These can be simply created, saved and visualised within StarDrop. An intuitive user interface provides a key to interpret the results of scoring compounds and to easily adjust the criteria and importance of each property within a Scoring Profile.


Do I need to create the Scoring Profile every time I want to score the data?

Tuesday, 29 September 2009 21:22
E-mail Print PDF
Ed Champness

No. You can save Scoring Profiles for later use, either as individual files or as part of StarDrop projects. Keeping a saved copy of a Scoring profile in a shared folder can help to ensure that everyone working on the project is using the same profile to score the data.


How do you account for the uncertainty in Scoring Profiles?

Tuesday, 29 September 2009 21:21
E-mail Print PDF
Ed Champness

The StarDrop Probabilistic Scoring algorithm accounts for the uncertainty in experimental results and in silico predictions. Although this effect is complicated to visualise for multi-dimensional data, it may be illustrated using the a simple threshold criteria for a single property. As a prediction or measurement gets closer to the threshold there becomes a significant probability that the ‘true’ value for the property actually exceeds the threshold line, as shown in the diagram. As this happens, the score begins to increase as the probability of success increases. Similarly, the maximum score is only achieved when the value itself is far enough above the threshold that there is a negligible chance, even when accounting for the errors in the measurement or prediction, that the true value will fail to exceed the threshold.


Can we compare two compounds based on their score values?

Tuesday, 29 September 2009 21:19
E-mail Print PDF
Ed Champness

The quality of two compounds can be compared using the scores for those compounds. However, as a rule of thumb, one cannot differentiate between those compounds unless their scores differ by more than the sum of their standard deviations. One approach to visualizing whether compounds can be confidently differentiated is to plot a graph, as shown to the right. In this figure, known as a snake plot, the compounds are plotted in order of decreasing priority along the x-axis and their scores, with error bars illustrating the standard deviation of the scores, on the y-axis. In this example, we can clearly see that there are groups of compounds that can be confidently separated in terms or their likelihood of success, but within these groups the available data do not distinguish between compounds.


How does StarDrop deal with missing data when applying Probabilistic Scoring?

Tuesday, 29 September 2009 21:18
E-mail Print PDF
Ed Champness

If data are missing for a particular compound it can still be scored. The algorithm treats the missing value as a value which is highly uncertain, resulting in an average contribution for the score for that property but with a very high degree of uncertainty. This helps to highlight compounds where data on a critical decision-making criterion are absent but that have an otherwise good balance of properties.


Why do you advise against a sequential filtering approach to compound prioritisation?

Tuesday, 29 September 2009 21:17
E-mail Print PDF
administrator

One common approach to prioritizing compounds is sequential filtering, whereby each property value is compared in turn with a required threshold value. Those compounds that fall on the wrong side of the threshold are rejected. Those compounds that pass then progress to the next filter in the series and hence the number of compounds is iteratively reduced. However, this process has a number of serious shortcomings:

  • As the number of compounds considered at each filter is always reduced, often very few, or even no, compounds emerge from the sequence of filters, having passed all of the thresholds. In this situation, it is difficult to select alternative compounds, as incomplete information has been generated on the full compound set.
  • Filters make artificial distinctions between compounds with property values that cannot be resolved within the uncertainties of a prediction or experimental measurement.
  • Errors rapidly accumulate as the number of filters increase. For example, if we consider 5 filters, each with an accuracy of 90%, the probability of a compound with ideal properties being correctly passed through all filters is only 59%. However if 10 such filters are applied, this drops to 35%, meaning that this process is more likely to incorrectly reject an optimal compound than pass it.





Latest Forums

Read more >

Popular Downloads

Read more >