How to Get Started

This is a quick start guide for you to try out deep forest. The full script is available at Example.

Installation

The package is available via PyPI. As a kind reminder, do not forget the hyphen (-) between deep and forest.

$ pip install deep-forest

Load Data

Deep forest assumes data to be in the form of 2D Numpy array of shape (n_samples, n_features). It will conduct internal check and transformation on the input data. For example, the code snippet below loads a toy dataset on digits classification:

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

Define the Model

Deep forest provides unified APIs on binary classification and multi-class classification. For the demo dataset on classification, the corresponding model is CascadeForestClassifier:

from deepforest import CascadeForestClassifier

model = CascadeForestClassifier()

A key advantage of deep forest is its adaptive model complexity depending on the dataset. The default setting on hyper-parameters enables it to perform reasonably well across all datasets. Please refer to API Reference for the meaning of different input arguments.

Train and Evaluate

Deep forest provides Scikit-Learn like APIs on training and evaluating. Given the training data X_train and labels y_train, the training stage is triggered with the following code snippet:

model.fit(X_train, y_train)

Once the model was trained, you can call predict() to produce prediction results on the testing data X_test.

from sklearn.metrics import accuracy_score

y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred) * 100  # classification accuracy

Save and Load

Deep forest also provides easy-to-use APIs on model serialization. Here, MODE_DIR is the directory to save the model.

model.save(MODEL_DIR)

Given the saving results, you can call load() to use deep forest for prediction:

new_model = CascadeForestClassifier()
new_model.load(MODEL_DIR)

Notice that new_model is not the same as model, because only key information used for model inference was saved.

Example

Below is the full script on using deep forest for classification on a demo dataset.

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from deepforest import CascadeForestClassifier


# Load data
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

model = CascadeForestClassifier()

# Train and evaluate
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred) * 100
print("\nTesting Accuracy: {:.3f} %".format(acc))

# Save the model
model.save("model")