K-ary N-trees: High Performance Networks for Massively Parallel Architectures

The past few years have seen a rise in popularity of massively parallel architectures that use fat-trees as their interconnection In this paper we formalize a parametric family of fat-trees, the k-ary n-trees, built with constant arity switches interconnected in a regular topology. A simple adaptive routing algorithm for k-ary n-trees sends each message to one of the nearest common ancestors (NCA) of both source and destination choosing the less loaded physical channels and then reaches the destination following the unique available path. Through simulation on a 4-ary 4-tree with 256 nodes, we analyze some variants of the adaptive algorithm that utilize wormhole routing with 1, 2 and 4 virtual channels. The experimental results show that the uniform, bit reversal and transpose traffic patterns are very sensitive to the flow control strategy. In all these cases, the saturation points are between 35-40\% of the network capacity with 1 virtual channel, 55-60\% with 2 virtual channels and around 75\% with 4 virtual channels and after saturation, the network throughput remains stable. Increasing the number of virtual channels slightly worsens the average network latency, but improves the bounds on the distribution. The complement traffic, a representative of the class of the congestion-free communication patterns, reaches an optimal performance, with a saturation point at 97\% of the capacity for all flow control strategies. In this case virtual channels are of little help and the average network latency experiences an increase proportional to the number of virtual channels, due to the multiplexing of several packets onto the same physical path.