Below are a few notes on decision trees. I still have a lot to learn but this will become my working document.

The aim with any decision tree is to create a workable model that will predict the value of a target variable based on the set of input variables.

Uses

Financial Industry

One of the fundamental use cases is in option pricing, where a binary-like decision tree is used to predict the price of an option in either a bull or bear market.

Marketing

Marketers use decision trees to establish customers by type and predict whether a customer will buy a specific type of product.

Medical

In the medical field, decision tree models have been designed to diagnose blood infections or even predict heart attack outcomes in chest pain patients. Variables in the decision tree include diagnosis, treatment, and patient data.

Gaming

The gaming industry now uses multiple decision trees in movement recognition and facial recognition.

Positives

they are easy to read. After a model is generated, it's easy to report back to others regarding how the tree works. Also, with decision trees you can handle numerical or categorized information.

Negatives

One of the main issues of decision trees is that they can create overly complex models, depending on the data presented in the training set

Types of Algorithms

ID3 - The ID3 (Iterative Dichotomiser 3) algorithm was invented by Ross Quinlan to create trees from datasets. By calculating the entropy for every attribute in the dataset, this could be split into subsets based on the minimum entropy value.
C4.5
CHAID (Chi-squared Automatic Interaction Detection)
MARS (multivariate adaptive regression splines) algorithm

Creating a Decision Tree

Decision trees are built around the basic concept of this algorithm.

Check the model for the base cases.
Iterate through all the attributes (attr).
Get the normalized information gain from splitting on attr.
Let best_attr be the attribute with the highest information gain.
Create a decision node that splits on the best_attr attribute.
Work on the sublists that are obtained by splitting on best_attr and add those nodes as child nodes.
That's the basic outline of what happens when you build a decision tree.
Depending on the algorithm type, like the ones previously mentioned, there might be subtle differences in the way things are done.

Calculating Entropy

Entropy is a measure of uncertainty and is measured in bits and comes as a number between zero and 1 (entropy bits are not the same bits as used in computing terminology). Basically, you are looking for the unpredictability in a random variable.