A Comparative Study of Tree Generative Kernels for Gene Function Prediction

In this report we perform a comparative study of kernel functions
defined on generative models with the goal to embed phylogenetic information
into a discriminative learning approach. We describe three generative
tree kernels: a sufficient statistics kernel, a Fisher kernel, and
a probability product kernel; their key features are the adaptivity
to the input domain and the ability to deal with structured data.
In particular, kernel adaptivity is obtained through the estimation
of the parameters of a tree structured model of evolution from an
input domain of phylogenetic profiles encoding the presence or absence
of specific proteins in a set of fully sequenced genomes. We report
results obtained in the prediction of the functional class of the
proteins of the yeast S. Cervisae together with comparisons
with a standard vector based kernel and with a non-adaptive tree kernel
function. To further analyze the impact of the discriminative learning
phase, and to provide an assessment of the information retained by
the learned generative models we apply them directly to classification
through log-odds. Finally, the advantage achieved through adaptivity
for two of the new kernels is assessed through a comparison with similar
kernels based on randomly initialized generative models where no learning
is performed, and to kernels where parameters are set only on the
base of biological considerations.