Gini Impurity is a measure used in decision trees to determine the quality of a split at each node. It quantifies the likelihood of a randomly chosen element being misclassified if it was randomly labeled according to the distribution of labels in the subset. The value of Gini Impurity ranges from 0 to 1, where 0 indicates that all elements belong to a single class (perfect purity) and 1 indicates maximum impurity (uniform distribution across classes).
Mathematically, Gini Impurity can be calculated using the formula:
where is the proportion of instances labeled with class in dataset , and is the total number of classes. A lower Gini Impurity value means a better, more effective split, which helps in building more accurate decision trees. Therefore, during the training of decision trees, the algorithm seeks to minimize Gini Impurity at each node to improve classification accuracy.
Start your personalized study experience with acemate today. Sign up for free and find summaries and mock exams for your university.