December 3, 2011

Why SMARTCyp works - challenges in reactivity models applied to P450s

During the last couple of years I've been developing the SMARTCyp methodology ( for prediction of site-of-metabolism in drug metabolism mediated by the cytochromes P450 enzyme family. Here I will describe why the SMARTCyp approach works, and why many other models applied to the same problem often fails when they try to include reactivity.

  1. SMARTCyp is simple
    The application of the reactivity model is through simple fragment matching towards a library of SMARTS strings. Each SMARTS string describes a number of reference calculation and assigns the average energy of these reference calculations to any atom that matches.
    The advantage of this is that we get rid of the problem of generating many different 3D structures to make sure the conformation is correct, and we get rid of the problem of possible hydrogen bond donors/acceptors messing up reactivity calculations by causing strange structures (this can happen when the oxygen radical attacks a site close to a donor/acceptor).
  2. SMARTCyp reactivities are very accurate
    For the so far 200+ fragments we have computed the reactivities are as good as can possibly be done today (DFT with B3LYP, large basis sets and transition states verified by frequency calculations). As long as the fragment of interest in a drug molecule is relatively similar, the predicted reactivity will be quite good (>1 kcal/mol error is unusual).
    This is because our calculations use a full heme model, and compute the transition state for the actual reaction mechanism, no simplified model or other assumptions are made. Other reactivity models use intermediates and also do not use a full heme model. Such models are relatively accurate for hydrogen abstraction reactions (that is hydroxylation of aliphatic carbon atoms or dealkylation reactions), but for any other type of reactions the models are usually pretty bad.
  3. P450 enzymes are flexible
    While it has been shown that CYP3A4 is highly flexible and reactivity is the only major determinant of site-of-metabolism in this enzyme, it's my belief that most other CYPs also are more flexible than most people believe. The crystal structures of several of the human CYPs do not have active sites large enough to explain all the substrates known to be metabolized by them. Hence, the approximation that reactivity is very important is very good for most CYPs (however, it's not sufficient for CYP2D6 and CYP2C9). Unpublished data show that SMARTCyp can be applied to all isoforms except 2D6 and 2C9 with high accuracy. The two exceptions stem from the fact that these are the only two isoforms with charged amino acids in the active site, leading to large contributions from binding. But even these binding contributions can be described by a simple 2D model.
    We recently showed that even for CYP2D6 we can build a good model based on SMARTCyp reactivities using only three descriptors (implemented in SMARTCyp 2.0). While some may claim such a model is too simple, the simplicity is actually an advantage. Simple models are usually robust, the dependence on data sets can be smaller if you have fewer descriptors that you're trying to fit. And hence we have less noise in the data. When compared to ensemble docking supplemented with reactivities from SMARTCyp 1.5, the new SMARTCyp 2D6 model is as accurate,but with a fraction of the computational cost, and much much easier to apply for a medicinal chemist.