Decision Tree Learning and Impurity

Jony picture Jony · Feb 8, 2011 · Viewed 7k times · Source

There are three ways to measure impurity:

Entropy

Gini Index

Classification Error

What are the differences and appropriate use cases for each method?

Answer

David Weiser picture David Weiser · Feb 8, 2011

If the p_i's are very small, then doing multiplication on very small numbers (Gini index) can lead to rounding error. Because of that, it is better to add the logs (Entropy). Classification error, following your definition, provides a gross estimate since it uses the single largest p_i to compute its value.