Decision Tree
This task will allow the user to train, test and save a decision tree for making predictions.
CONFIGURATION
OPTION | DESCRIPTION |
---|---|
File Name | The name of the file which will be used to save the decision tree model. |
Column Selector | This column selector is used to pick which columns/variables are used for calculating the Decision Tree. |
Autosave | When true, the model will be saved upon each execution of the project. If false, it will not. |
Split Rule | Selects the method by which the tree is split. |
Max Nodes | Determines the maximum number of nodes in the resulting decision tree. Lower for simpler models and higher for more complex models. Higher node models run the risk of over-fitting while lower complexity models may not accurately model the problem. |
Classify Column | This represents the column we are trying to predict. |
Destination Column | The output from decision tree will go into a column of this name. |
INPUT
Any dataset.
OUTPUT
- A decision tree model
- The decision tree performance
- A visual representation of the tree.
- A new column added to the data containing the prediction based on the model.
The following screenshot depicts the various outputs from running a prediction on the iris.csv dataset.
Here is a breakdown of the run:
- As indicated by the filename models/iris.dtree.mdl and the autosave setting of false, we are saving models manually.
- We are predicting the "Species" based upon the "Sepal Length", "Sepal Width", "Petal Length" and "Petal Width" features.
- We limit the number of nodes to 4. A simple model.
- We are using GINI impurity to determine when we split nodes.
- The model achieves a performance rating of 97.3% correct predictions on the training data using only petal length and width in the calculation.
- The visual displays the logic of the model.
- Petals with lengths <= 2.45 are classified as setosa.
- Petals with lengths greater than 2.45 are classified as...
- versacolor if their petal width is <= 1.75 and their petal lengths are <= 4.95
- virginica if their petal width is <= 1.75 and their petal lengths are > 4.95
- virginica if their petal > 1.75
Such models are excellent for discovery and imparting machine generated insight to humans.