MS defense: Sawhney on Analyzing the Growth of...
MS Thesis Defense
Analyzing the Growth of Hoeffding Trees
Mayank Sawhney
12:00-1:30pm Thursday 1 December 2011, ITE 346
Mining high speed data streams has become a necessity because of the enormous growth in the volume of electronic data. In the past decade, researchers have suggested various models for learning in both stationary and concept drifting data streams. Hoeffding Trees (Domingos & Hulten 2000) are one such model for mining stationary data streams. Several modifications of the nave Hoeffding Tree algorithm have been proposed to study data streams.
Our work analyzes the behavior of Hoeffding Trees when they are trained on infinite and experiments, we show that the Hoeffding bound suffers from an inherent shortcoming. Even after reaching a stage where accuracy asymptotes, Hoeffding Trees continue to grow. We examine this behavior in data streams with both nominal and numeric attributes. We also study enhancements made to the naive Hoeffding Tree algorithm and also evaluate different discretization methods.
In our work, we analyze how the Hoeffding bound relates to the information gain when splits are made and also when we send a random distribution as a data stream. We conclude that this behavior is a result of decisions made for the early growth of Hoeffding Trees and the induced randomness in an online setting. We also argue that the presence of this behavior will impact the use of Hoeffding algorithms in real world online applications.
Committee Members
- Dr. Tim Oates (Chair)
- Dr. Tim Finin
- Dr. Kostas Kalpakis
Posted: November 29, 2011, 8:33 AM