A Comparative Study of Bases for Motif Inference
Motif inference is at the heart of several time-demanding
computational tasks, such as in molecular biology, data mining and
identification of structured motifs in sequences, and in
data compression, to name a few. In this scenario, a motif is a
pattern that appears repeated at least a certain number of times
(the quorum), to be of interest. The pattern can be approximated in
that some of its characters can be left unspecified (the don't
cares). Motif inference is not aimed at searching a given
pattern but, rather, at discovering all the possible patterns that
appear as motifs in the given input string. The combinatorial
explosion of these patterns makes their discover an exponential-time
computation. For this, the notion of basis has been recently
introduced to succinctly represent all of them within reasonable
time and space bounds. The goal of the paper is to shed light on the
state of the art for this emerging field and to add further
properties to what is currently known.