Training pKa and logP prediction
pKa and logP prediction methods are based only on a limited number of molecule types in the training set. The accuracy of these models is not always satisfactory. Practically, in most cases only those types of structures will be predicted correctly which were present in the training set. We decided to develop a training method for the pKa and the logP calculations to allow users to build models relevant for their structures.
The identification of acidic and basic ionization centers is defined in our default pKa prediction module. 120 predefined atom types are implemented in the logP prediction model. The learning algorithm is based on a linear regression method called as Single Value Decomposition (SVD). The training set, a collection of experimental pKa or logP values, should be provided by the user. The collected data should be imported as an SDF or MRV file, which can be compiled for example using Instant JChem.