Predicting Thermodynamic Properties from Molecular Structures by Recursive Neural Networks. Comparison with Classical Group Contributions Methods.

The recursive neural networks deal with prediction tasks for compounds represented in a structured domain. These approaches allow combining, in a learning system, the flexibility and general advantages of a neural network model with the representational power of a structured domain. As a result a completely new approach to QSPR/QSAR analysis is obtained through the adaptive processing of molecular graphs whose performance is even better than that of traditional approaches. In this paper a Recursive Cascade Correlation model (RecCC) has been applied to the analysis of the Gibbs free energies of solvation in water of 179 monofunctional open chain organic compounds. An original representation of the molecules in terms of labeled directed positional acyclic graphs has been developed by individuating a limited set of constituent atomic groups and representation rules. The descriptive and predictive abilities of the RecCC model have been tested and compared with those of a traditional group contributions method. The inherent ability of the RecCC model to abstract chemical knowledge through the adaptive learning process have been investigated by Principal Components Analysis of the internal representations developed by the neural network, finding that the model recognizes the chemical compounds on the basis of a not trivial combination of their chemical structure and target property.