A personalized search engine based on web-snippet hierarchical clustering
Search engines provide the view of the Web, and their smart
ranking algorithms are their point of view. To offer the best
view, personalized ranking algorithms are currently flourishing.
They focus on the users rather than on their submitted queries, by
taking into account some contextual/profiled information.
In this paper we propose a personalized (meta-)search engine based
on the web-snippet hierarchical clustering technology (a la
Vivisimo) that is fully adaptive and non intrusive both for the
user and for the queried search engine(s). It works on the top of
16 commodity search engines and fetches 200 (or more) results from
them per user query. Our engine is able to mine on-the-fly the
fine and variegate ``themes'' behind these results and then
organize them in a hierarchy of folders that offers, at various
levels of details, an up-to-date picture of these results. Users
can therefore browse the hierarchy, select the themes that best
match the ``intention'' behind their query, and ask our engine to
personalize on-the-fly those query results according to their
choices. In this way lazy users are not limited to look at first
ten results, but immediately acquire several points of view on a
larger pool (about 200) of them!
We claim that it does exist a mutual reinforcement
relationship between ranking and web-snippet clustering from
which both of them may benefit. Our extensive experiments show
that this form of personalization is very effective in informative
queries, polysemous queries, and poor queries consisting of at
most two terms (more than 80% of the Web queries are of this
type!). In these cases, in fact, one theme might be so web-popular
to unfortunately monopolize the top-ten results of link-based
ranking algorithms.