Welcome to the Optibrium Community





Forgot login?
Register

FAQs

Search


Where do the data used to build the QSAR models come from?

Tuesday, 29 September 2009 21:12
E-mail Print PDF
administrator

The QSAR models are based on either in-house measurements or literature data. Larger data sets are often available from the published literature, particularly for properties requiring in vivo measurements. Regardless of the source of data, strict quality control by both experimental and computational scientists is always applied to any data sets used to develop StarDrop models.


What kind of molecular descriptors are used?

Tuesday, 29 September 2009 21:11
E-mail Print PDF
administrator

The QSAR models are based on chemically meaningful descriptors to provide guidance regarding the effects of chemical modifications on the predicted property. 1D and 2D descriptors such as molecular weight, volume, lipophilicity terms, count of atoms and count of fragments are used in all models. 3D descriptors are not used in the current model suite. Due to the additional computational cost they would incur, they would only be included when they offered a significant improvement in the accuracy of the model.


How do you validate your models?

Tuesday, 29 September 2009 21:11
E-mail Print PDF
administrator

The QSAR models are validated with independent test sets, i.e. sets of molecules not used to train the models. Such validation results are statistically rigorous tests of the predictive ability of the model, as opposed to cross correlation or training set results, which are often reported for other models. These latter values measure the ability of the model to fit the data in the training set, which is typically better than the true predictive ability of the model.


How do StarDrop models compare to other models?

Tuesday, 29 September 2009 21:02
E-mail Print PDF
Nick Foster

In independent tests, the StarDrop predictive ADME models have matched or exceeded the performance of other commercially available or published models. The results for the StarDrop models are shown in the tables below. If comparing the StarDrop models with other in silico models, it is important to note that the reported validation results here are for independent test sets (see “How do you validate your models?” above). It is also important, when comparing StarDrop predictions with experimental data, to ensure that this is a like-for-like comparison. For example, HIA predictions cannot be compared directly with oral bioavailability data on which both absorption and first-pass clearance will have an impact. Similarly, the StarDrop predictions for aqueous solubility are unlikely to correlate well with data obtained in a high-throughput kinetic assay based on dilution of DMSO compound stocks and will not predict the solubilities of different salt forms.

Summary of Statistical Results for Continuous QSAR models

Model Definition N R2 RMSE
logP Predicts the logarithm of the octanol/water partition coefficient for neutral compounds 2950 0.92 0.44
logD@pH7.4 Predicts the logarithm of the octanol/buffer at pH 7.4 distribution coefficient 257 0.88 0.58
logS Predicts the logarithm of the intrinsic aqueous solubility, S in uM, for neutral compounds 663 0.82 0.70
logS@pH7.4 Predicts the logarithm of the solubility, S in uM, in phosphate buffered saline at pH7.4 96 0.74 0.61
log([brain]:[blood]) Predicts the logarithm of the Brain/Blood ratio 75 0.72 0.36
hERG pIC50 Predicts the pIC50 values for inhibition of hERG K+ channels expressed in mammalian cells 33 0.72 0.64
2C9 pKi Predicts the pKi values for CYP2C9 affinity 25 0.64 0.60

a) N = number of compounds in independent test set.
b) R2 gives the correlation between calculated and experimental values for the compounds in the independent test set.
c) The root mean squared error (RMSE) statistic gives the error for the corresponding correlation coefficient. When possible, the RMSE values are calculated for compounds within (IN) or outside (OUT) the chemical space of the model. “unknown” is reported if there are not enough training and test compounds outside the chemical space to calculate an RMSE value.

Summary of Statistical Results for Classification QSAR models

Model Definition N Accuracy Specificity
HIA category Returns a binary prediction for human intestinal absorption, based on a threshold of 30% absorbed 245 ‘-‘ 66% ‘-‘ 91%
‘+’ 99% ‘+’ 95%
BBB category Returns a binary prediction for Blood/Brain barrier penetration 52 ‘-‘ 91% ‘-‘ 91%
‘+’ 83% ‘+’ 83%
P-gp category Returns a binary prediction for P-gp transport 51 ‘yes‘ 79% ‘yes‘ 85%
‘no’ 82% ‘no’ 75%
PPB category Returns a binary prediction for human plasma protein binding, based on a threshold of 90% absorbed 159 ‘high‘ 81% ‘high‘ 74%
‘low’ 87% ‘low’ 91%
2D6 affinity category Returns a 4-class prediction for 2D6 affinity 45 Root mean square error = 0.87 classes

>a) N = number of compounds in independent test set
b) The accuracy for each class is reported as the percentage of compounds correctly classified.
c) The specificity refers to the percentage of correct classifications within the overall set of compounds predicted to be in that class.


Global vs local models?

Tuesday, 29 September 2009 21:02
E-mail Print PDF
Ed Champness

The QSAR models are ‘global’ models of ADME properties, i.e. compounds used to build the model (training compounds) cover as wide a range of chemical diversity as possible. In contrast, local models are specific to one region of the chemical space. As a drug discovery project progresses, the chemistry under consideration often focuses on a small number of chemical series in which the molecules are structurally similar. Global models may lack the resolution required to distinguish between molecules with subtle differences and once in vitro data have been generated for a chemical series, the ability of the corresponding models to discriminate within the series should be tested. If a model is found to lack discrimination, it may be possible to develop a local model to improve resolution within this limited range of chemistry. Local models may be integrated within StarDrop (see “Can we use our own models in StarDrop?” below) and the Auto-Modeller module provides the ability to generate models using your own data.


Are your models predictive of our chemistry?

Tuesday, 29 September 2009 21:01
E-mail Print PDF
Ed Champness

Each model result is provided with an estimate of the uncertainty in the prediction, in the form of a root mean square error (RMSE) for continuous data or a probability for classification data. The uncertainty is based on how similar the structure is to those molecules used when building the model. If your chemistry differs significantly from the compounds used to build the models, it is less likely that the models will be predictive and this can be seen explicitly from the uncertainties returned (see “How do you estimate uncertainties?” below). As noted above, the models supplied with StarDrop are ‘global’ and should readily distinguish ‘long range’ trends across a wide range of chemistries, but they are less likely to be able to differentiate between similar molecules in the same series.


How do you define the chemical space of a model?

Tuesday, 29 September 2009 21:01
E-mail Print PDF
Ed Champness

The 'chemical space' of a model represents the range of model descriptors that are well represented by the molecules in the training set for that model. This is defined using a Hotelling’s T2 statistical test.


How do you estimate uncertainties?

Tuesday, 29 September 2009 21:01
E-mail Print PDF
Ed Champness

Within the chemical space of the predictive ADME QSAR model, there are a statistically significant number of data points in the independent test set, and hence the performance of the model on this set provides a good estimate of the uncertainty. For compounds outside but ‘close’ to the chemical space of the model, extrapolation beyond the training set increases the uncertainties in these predictions. Where possible, this is estimated using the performance of the model on compounds in the independent test set that lie outside of the chemical space. For compounds that differ significantly from the training set, no valid estimate of the uncertainty can be made. The uncertainty in the predictions for these compounds is reported as maximal i.e. for continuous models an infinite uncertainty and for classification models an equal probability for each class.


Can we use our own models within StarDrop?

Tuesday, 29 September 2009 21:00
E-mail Print PDF
Ed Champness

Yes. We provide example code and a technical description in the StarDrop Scripting and Customisation Guide explaining how to add your models to the StarDrop client or to the StarDrop model server, which will enable all users of StarDrop within your organisation to run your own models. We recommend this is done by a software developer or someone with a strong computational background.


Can we use other third-party models?

Tuesday, 29 September 2009 21:00
E-mail Print PDF
Ed Champness

Yes, if they can be run from a command-line (rather than within a desktop application) and can therefore be accessed from the code we provide to access and run your own models (See “Can we use our own models within StarDrop?” above). A straightforward alternative is to import the prediction from the third-party models as a .csv or .sd file.


What information would third-party models need to provide to employ the scoring capability of StarDrop?

Tuesday, 29 September 2009 21:00
E-mail Print PDF
Ed Champness

The models would need to provide a measure of uncertainty and you would need an understanding of the output in order to choose appropriate scoring thresholds.


Is it possible to integrate proprietary data into StarDrop models?

Tuesday, 29 September 2009 20:59
E-mail Print PDF
Ed Champness

We offer to integrate proprietary data for additional compounds on a consultancy basis. When we do this, the new data will be checked to ensure the quality and compatibility with the data points on which the original model was built. Alternatively, our computational chemists can build a custom model of your own data set. If the new data are derived from a limited range of chemical diversity, the resulting model will give optimal predictions for compounds of similar chemistry and should also provide greater resolution than a global model. However, the converse may also apply and the resulting model is likely to have reduced accuracy in other areas of chemistry. The optional Auto-Modeller module provides you with the ability to generate models using your own data.


Can you run ADME QSAR models without a server?

Tuesday, 29 September 2009 20:59
E-mail Print PDF
Ed Champness

Yes, the ADME QSAR models can run directly within the StarDrop client application on a desktop/laptop machine without access to a server.






Latest Forums

Read more >