# Bayesian network integrated testing strategy (ITS-3)Summary of defined approach

This method as described by Jaworska et al [1] is designed to improve precision and accuracy for predicting and replacing the *in vivo* Local Lymph Node Assay (LLNA) assay. It makes a prediction for skin sensitisation potency calculated in the form of a probability distribution over 4 sensitisation classes (non-sensitiser, weak, moderate and strong sensitiser). Classification of a compound is based on its pEC3 value, which is the result of the LLNA assay or the result of Bayesian network (BN) prediction. The probability distribution is then transformed to a Bayes factor to remove prediction bias from the training set distribution and to give a quantitative measure to the level of uncertainty which can then be used in an objective manner to assign a confidence level to the predictions. Our implementation of ITS-3 is based on 207 chemicals for which physico-chemical data, *in vitro*, and *in vivo* data are available. We use Chemaxon tools to calculate water solubility at pH 7, logD at pH 7 and LogKow. These descriptors in addition to the TIMES *in silico* model prediction, and our proprietary protein binding estimate are used in combination with the information from the validated *in vitro* assays provided by the user to arrive at a prediction.

BN approach uses a knowledge-based network of connections among the nodes. Each node represents a single model variable and each connection between the two nodes represents how the variables influence each other and the final pEC3 value. A training set of compounds is employed to estimate the influences among the model variables. Subsequently, the *in vitro* data and physico-chemical properties of the testing set are assigned to the input nodes and the probability distribution over pEC3 values is computed based on the established connections/influences among the variables.

Table 1 shows the prediction accuracy for SaferSkin™ as compared to the results published by Jaworska et al. The statistics of the original network are based on the published confusion matrix.

Class | Jaworska et al | SaferSkin™ |
---|---|---|

C1 | 92% | 95% |

C2 | 82% | 77% |

C3 | 70% | 68% |

C4 | 72% | 79% |

overall | 79.6% | 79.6% |

### Advantages of the Bayesian network approach include:

- It tolerates missing information.
- It conveys a probabilistic hypothesis of skin sensitisation based on accumulated evidence from data.
- It assesses the uncertainty in prediction given the input data.
- It suggests the experiments to maximize information gain and reduce uncertainty through the value of information (VoI) analysis.

## Experimental values used in SaferSkin™

Input parameters | Provided by user | Calculated by application |
---|---|---|

SMILES | ||

Molecular descriptors | ||

TIMES prediction | ||

Michael acceptor | ||

Water solubility @ pH7 | ||

LogKow | ||

Protein binding | ||

logD @ pH7 | ||

Experimental values | ||

DPRA assay: DPRACys, DPRALys | ||

KeratinoSens™ assay: KEC1.5, KEC3, IC50 | ||

h-CLAT: EC150, EC200 and CV75 |

- (1) J. S. Jaworska, A. Natsch, C. Ryan, J. Strickland, T. Ashikaga and M. Miyazawa;
*Archives Toxicol.***2015**