Efficient C4.5

We present an analytic evaluation of the run-time behavior of the C4.5 algorithm which highlights some efficiency improvements. We have implemented a more efficient version of the algorithm, called EC4.5, that improves on C4.5 by adopting the best among three strategies at each node construction. The first strategy uses a binary search of thresholds instead of the linear search of C4.5. The second strategy adopts a counting sort method instead of the quicksort of C4.5. The third strategy uses a main-memory version of the RainForest algorithm for constructing decision trees. Our implementation computes the same decision trees as C4.5 with a performance gain of up to 5 times.