If data are missing for a particular compound, this can still be scored. The algorithm treats this as a value which is highly uncertain, resulting in an average contribution for the score for that property but with a very high degree of uncertainty. This helps to highlight compounds where data on a critical decision-making criterion is absent but that have an otherwise good balance of properties. To get the best out of the Probabilistic Scoring algorithm, we advise that a set should contain less than 20% of missing data points per column and each compound should not have more than 20% of its data missing.




