High-Performance Data Mining with Skeleton-based Structured Parallel Programming
We show how to apply a Structured Parallel Programming methodology
based on skeletons to Data Mining problems, reporting several results
about three commonly used mining techniques, namely association rules,
decision tree induction and spatial clustering. We analyze the
structural patterns common to these applications, looking at
application performance and software engineering efficiency. Our aim
is to clearly state what features a Structured Parallel Programming
Environment should have to be useful for parallel Data Mining. Within
the skeleton-based PPE SkIE that we have developed, we study the
different patterns of data access of parallel implementations of
Apriori, C4.5 and DBSCAN. We need to address large partitions reads,
frequent and sparse access to small blocks, as well as an irregular mix of
small and large transfers, to allow efficient development of
applications on huge databases. We examine the addition of an
object/component interface to the skeleton structured model, to
simplify the development of environment-integrated, parallel Data
Mining applications.