A personalized search engine based on web-snippet hierarchical clustering

Search engines provide the view of the Web, and their smart ranking algorithms are their point of view. To offer the best view, personalized ranking algorithms are currently flourishing. They focus on the users rather than on their submitted queries, by taking into account some contextual/profiled information. In this paper we propose a personalized (meta-)search engine based on the web-snippet hierarchical clustering technology (a la Vivisimo) that is fully adaptive and non intrusive both for the user and for the queried search engine(s). It works on the top of 16 commodity search engines and fetches 200 (or more) results from them per user query. Our engine is able to mine on-the-fly the fine and variegate ``themes'' behind these results and then organize them in a hierarchy of folders that offers, at various levels of details, an up-to-date picture of these results. Users can therefore browse the hierarchy, select the themes that best match the ``intention'' behind their query, and ask our engine to personalize on-the-fly those query results according to their choices. In this way lazy users are not limited to look at first ten results, but immediately acquire several points of view on a larger pool (about 200) of them! We claim that it does exist a mutual reinforcement relationship between ranking and web-snippet clustering from which both of them may benefit. Our extensive experiments show that this form of personalization is very effective in informative queries, polysemous queries, and poor queries consisting of at most two terms (more than 80% of the Web queries are of this type!). In these cases, in fact, one theme might be so web-popular to unfortunately monopolize the top-ten results of link-based ranking algorithms.