Efficient C4.5
We present an analytic evaluation of the run-time behavior of the
C4.5 algorithm which highlights some efficiency improvements. We
have implemented a more efficient version of the algorithm, called
EC4.5, that improves on C4.5 by adopting the best among three
strategies at each node construction. The first strategy uses a
binary search of thresholds instead of the linear search of
C4.5. The second strategy adopts a counting sort method
instead of the quicksort of C4.5. The third strategy uses a
main-memory version of the RainForest algorithm for constructing
decision trees. Our implementation computes the same decision trees
as C4.5 with a performance gain of up to 5 times.