Posted 2025-04-11Machine Learning

Decision Tree

Definition, implementation and pros and cons of decision tree models.

What is a decision tree?

A decision tree is a simple model for ‘supervised’ classification/regression.

Each internal node performs a Boolean test on an input feature.
Each leaf node specifies a value for the target feature.

How to select features(attributes) in decision tree models?

ID3 (iterative dichotomizer 3 迭代二分器)

ID3 split attributes based on their ‘entropy’. Used for ‘classification’ tasks.

‘Entropy’ is minimized when all values of the target attribute are the same.
‘Entropy’ is maximized when there is an equal chance of all values for the target attribute.

Information Gain (信息增益)

Difference between uncertainty of the starting node and weighted impurity of the two child nodes.

Information Gain = Entropy(parent) - [Average entropy(children)]
Each time, choose the maximal IG attribute as the internal node.

Pros and Cons

Only need to test enough attributes, but may be overfitted on small dataset
Classifying continuous data may be computationally expensive

CART (Classification and Regression Trees)

Split nodes using ‘Gini Impurity’ for classification and ‘MSE’ for regression.

C4.5

Used for classification. Split nodes using ‘Gain Ratio’, which normalizes Information Gain to address its bias toward attributes with many values.
Handles continuous attributes by dynamically creating threshold splits.

Parameters in Decision Trees

Meta-Parameters

Parameters are not set by the user; they are learned from the training data.

Hyper-Parameters

Depth
Sample to form a leaf node
Criteria to split on (entropy/ gini/MSE)

Decision Boundary

The decision boundaries in decision trees are lines parallel to coordinate axis.
Each split of a node will contribute a new line to decision boundaries.

#ML

Buy me a coffee

Decision Tree

What is a decision tree?

How to select features(attributes) in decision tree models?

ID3 (iterative dichotomizer 3 迭代二分器)

Information Gain (信息增益)

Pros and Cons

CART (Classification and Regression Trees)

C4.5

Parameters in Decision Trees

Meta-Parameters

Hyper-Parameters

Decision Boundary

Like this article? Support the author with

Comments

Tags

Archives

Links

Categories

Recents

follow.it