<aside> 💡 The basic intuition behind a decision tree is to map out all possible decision paths in the form of a tree.
</aside>
Entropy
the entropy in a node is the amount of information disorder calculated in each node
$$ Entropy = -p(A)log(p(A))-p(B)log(p(B)) $$
Choose the tree with the higher information gain after splitting
Information Gain is the information that can incease the lefel of certain after spliting
information gain = (entropy before split) - (weighted entropy after split)