Could you please explain?
The choice depends on the type of Decision Tree. Same goes for the choice of the separation condition.
In the case of a binary variable, there is only one separation whereas, for a continuous variable, there are n-1 possibilities.
The separation condition is as follows:
X <= mean(xk, xk+1)
After finding the best separation, the operation is repeated to increase discrimination among the nodes.
The density of the node is its ratio of the individuals to the entire population.
After finding the best separation, classes are split into child nodes. We derive a variable out of this step. We choose the best separation criteria as:
The X2 Test – For testing the independence of variables X and Y, we use X2, only if:
Oij provides us with the left-hand side of the equality symbol and Tij provides the term on the right, independence test of X and Y is X2.
This degree of freedom is calculated as:
p = (no. of rows – 1) * (no. of columns – 1)
The Gini Index – With this test, we measure the purity of nodes. All types of dependent variables use it and we calculate it as follows:
In the preceding formula: fi, i=1, . ., p, corresponds to the frequencies in the node of the class p that we need to predict.
With an increase in distribution, the Gini index will also increase. However, with the increase in the purity of the node, the Gini index decreases.
In the case of a binary variable, there is only one separation whereas, for a continuous variable, there are n-1 possibilities.
The separation condition is as follows:
X <= mean(xk, xk+1)
After finding the best separation, the operation is repeated to increase discrimination among the nodes.
The density of the node is its ratio of the individuals to the entire population.
After finding the best separation, classes are split into child nodes. We derive a variable out of this step. We choose the best separation criteria as:
The X2 Test – For testing the independence of variables X and Y, we use X2, only if:
Oij provides us with the left-hand side of the equality symbol and Tij provides the term on the right, independence test of X and Y is X2.
This degree of freedom is calculated as:
p = (no. of rows – 1) * (no. of columns – 1)
The Gini Index – With this test, we measure the purity of nodes. All types of dependent variables use it and we calculate it as follows:
In the preceding formula: fi, i=1, . ., p, corresponds to the frequencies in the node of the class p that we need to predict.
With an increase in distribution, the Gini index will also increase. However, with the increase in the purity of the node, the Gini index decreases.