Tuesday, 29 September 2009 21:22
administrator
A scoring profile can be defined as any function defining the variation of score with property value as a series of linear segments, e.g. a simple threshold criteria, a desired range of values, a trend across a range or combinations of these. These can be simply created, saved and visualised within StarDrop. An intuitive user interface provides a key to interpret the results of scoring compounds and to easily adjust the criteria and importance of each property within a Scoring Profile.
Last Updated on Tuesday, 29 September 2009 21:26
Tuesday, 29 September 2009 21:22
administrator
No. You can save the Scoring Profiles for later use. Keeping the profile in a shared project folder can help to ensure that everyone working on the project is using the same profile to score the data.
Tuesday, 29 September 2009 21:21
administrator
The StarDrop Probabilistic Scoring algorithm accounts for the uncertainty in the experimental results or in silico prediction. Although this effect is complicated to visualise for multi-dimensional data, it may be illustrated using the a simple threshold criteria for a single property. As a prediction or measurement gets closer to the threshold there becomes a significant probability that the ‘true’ value for the property actually exceeds the threshold line as shown in the diagram. As this happens, the score begins to increase as the probability of success increases. Similarly, the maximum score is only achieved when the value itself is far enough above the threshold that there is a negligible chance, even when accounting for the errors in the model or measurement, that the true value will fail to exceed the threshold.
Last Updated on Tuesday, 29 September 2009 21:58
Tuesday, 29 September 2009 21:19
administrator
The quality of two compounds can be compared using the scores for those compounds. However, as a rule of thumb, one cannot differentiate between those compounds unless their scores differ by more than the sum of their standard deviations. One approach to visualizing whether compounds can be confidently differentiated is to plot a graph as shown to the right. In this figure the compounds are plotted in order of decreasing priority along the x-axis and their scores, with error bars illustrating the standard deviation of the scores, on the y-axis. In this example, we can clearly see that there are groups of compounds that can be confidently separated in terms or their likelihood of success, but within these groups the available data does not distinguish between compounds.
Last Updated on Tuesday, 29 September 2009 21:25
Tuesday, 29 September 2009 21:18
administrator
If data are missing for a particular compound, this can still be scored. The algorithm treats this as a value which is highly uncertain, resulting in an average contribution for the score for that property but with a very high degree of uncertainty. This helps to highlight compounds where data on a critical decision-making criterion is absent but that have an otherwise good balance of properties. To get the best out of the Probabilistic Scoring algorithm, we advise that a set should contain less than 20% of missing data points per column and each compound should not have more than 20% of its data missing.
Tuesday, 29 September 2009 21:17
administrator
One common approach to prioritizing compounds is sequential filtering, whereby each property value is compared in turn with a required threshold value. Those compounds that fall on the wrong side of the threshold are rejected. Those compounds that pass then progress to the next filter in the series and hence the number of compounds is iteratively reduced. However, this process has a number of serious shortcomings:
- As the number of compounds considered at each filter is always reduced, often very few, or even no, compounds emerge from the sequence of filters, having passed all of the thresholds. In this situation, it is difficult to select alternative compounds, as incomplete information has been generated on the full compound set.
- Filters make artificial distinctions between compounds with property values that cannot be resolved within the uncertainties of a prediction or experimental measurement.
- Errors rapidly accumulate as the number of filters increase. For example, if we consider 5 filters, each with an accuracy of 90%, the probability of a compound with ideal properties being correctly passed through all filters is only 59%. However if 10 such filters are applied, this drops to 35%, meaning that this process is more likely to incorrectly reject an optimal compound than pass it.
|
|
|
|