Towards a Methodology for Parallel Data Stream Processing: application to Parallel Stream Join

This paper deals with high-performance Parallel Data Stream Processing methodologies and implementation techniques. Data Stream Processing (DaSP) is referred to in the most general and interesting sense: on-line (often real-time) applications working on multiple, nondeterministic streams, with unlimited or unknown length and highly variable arrival rate, whose elements must processed efficiently “on the fly”. Traditional high-performance solutions are not sufficient to meet the critical requirements of high throughput and low latency with acceptable memory size: typical DaSP applications require quite novel parallelism models, as well as related design and implementation techniques on the emerging highly parallel architectures. The aim of this paper is to give an original contribution to the design and implementation of parallel DaSP applications. The contribution is twofold: (1) the definition of an approach to a new general model for data-parallel DaSP computations according to a paradigm called Data Stream Parallelism, (2) the application of this approach to the parallel Stream Join problem, showing that the most interesting parallelizations in the literature are particular cases of our approach and that, compared to them, better throughput and latency are achieved by our implementation on multicore architectures.