Experiments¶

Baseline¶

For all experiments, we used 5 popular tree-based ensemble methods as baselines. Details on the baselines are listed in the following table:

Name	Introduction
Random Forest	An efficient implementation of Random Forest in Scikit-Learn
HGBDT	Histogram-based GBDT in Scikit-Learn
XGBoost EXACT	The vanilla version of XGBoost
XGBoost HIST	The histogram optimized version of XGBoost
LightGBM	Light Gradient Boosting Machine

Environment¶

For all experiments, we used a single linux server. Details on the specifications are listed in the table below. All processors were used for training and evaluating.

OS	CPU	Memory
Ubuntu 18.04 LTS	Xeon E-2288G	128GB

Setting¶

We kept the number of decision trees the same across all baselines, while remaining hyper-parameters were set to their default values. Running scripts on reproducing all experiment results are available, please refer to this Repo.

Classification¶

Dataset¶

We have collected a number of datasets for both binary and multi-class classification, as listed in the table below. They were selected based on the following criteria:

Publicly available and easy to use;
Cover different application areas;
Reflect high diversity in terms of the number of samples, features, and classes.

As a result, some baselines may fail on datasets with too many samples or features. Such cases are indicated by N/A in all tables below.

Name	# Training	# Testing	# Features	# Classes
ijcnn1	49,990	91,701	22	2
pendigits	7,494	3,498	16	10
letter	15,000	5,000	16	26
connect-4	67,557	20,267	126	3
sector	6,412	3,207	55,197	105
covtype	406,708	174,304	54	7
susy	4,500,000	500,000	18	2
higgs	10,500,000	500,000	28	2
usps	7,291	2,007	256	10
mnist	60,000	10,000	784	10
fashion mnist	60,000	10,000	784	10

Classification Accuracy¶

The table below shows the testing accuracy of each method, with the best result on each dataset bolded. Each experiment was conducted over 5 independently trials, and the average result was reported.

Name	RF	HGBDT	XGB EXACT	XGB HIST	LightGBM	Deep Forest
ijcnn1	98.07	98.43	98.20	98.23	98.61	98.16
pendigits	96.54	96.34	96.60	96.60	96.17	97.50
letter	95.39	91.56	90.80	90.82	88.94	95.92
connect-4	70.18	70.88	71.57	71.57	70.31	72.05
sector	85.62	N/A	66.01	65.61	63.24	86.74
covtype	73.73	64.22	66.15	66.70	65.00	74.27
susy	80.19	80.31	80.32	80.35	80.33	80.18
higgs	N/A	74.95	75.85	76.00	74.97	76.46
usps	93.79	94.32	93.77	93.37	93.97	94.67
mnist	97.20	98.35	98.07	98.14	98.42	98.11
fashion mnist	87.87	87.02	90.74	90.80	90.81	89.66

Runtime¶

Runtime in seconds reported in the table below covers both the training stage and evaluating stage.

Name	RF	HGBDT	XGB EXACT	XGB HIST	LightGBM	Deep Forest
ijcnn1	9.60	6.84	11.24	1.90	1.99	8.37
pendigits	1.26	5.12	0.39	0.26	0.46	2.21
letter	0.76	1.30	0.34	0.17	0.19	2.84
connect-4	5.17	7.54	13.26	3.19	1.12	10.73
sector	292.15	N/A	632.27	593.35	18.83	521.68
covtype	84.00	2.56	58.43	11.62	3.96	164.18
susy	1429.85	59.09	1051.54	44.85	34.40	1866.48
higgs	N/A	523.74	7532.70	267.64	209.65	7307.44
usps	9.28	8.73	9.43	5.78	9.81	6.08
mnist	590.81	229.91	1156.64	762.40	233.94	599.55
fashion mnist	735.47	32.86	1403.44	2061.80	428.37	661.05

Some observations are listed as follow:

Histogram-based GBDT (e.g., HGBDT, XGB HIST, LightGBM) are typically faster mainly because decision tree in GBDT tends to have a much smaller tree depth;
With the number of input dimensions increasing (e.g., on mnist and fashion-mnist), random forest and deep forest can be faster.

Regression¶

Dataset¶

We have also collected four datasets on univariate regression for a comparison on the regression problem.

Name	# Training	# Testing	# Features
wine	1,071	528	11
abalone	2,799	1,378	8
cpusmall	5,489	2,703	12
boston	379	127	13
diabetes	303	139	10

Testing Mean Squared Error¶

The table below shows the testing mean squared error of each method, with the best result on each dataset bolded. Each experiment was conducted over 5 independently trials, and the average result was reported.

Name	RF	HGBDT	XGB EXACT	XGB HIST	LightGBM	Deep Forest
wine	0.35	0.40	0.41	0.41	0.39	0.34
abalone	4.79	5.40	5.73	5.75	5.60	4.66
cpusmall	8.31	9.01	9.86	11.82	8.99	7.15
boston	16.61	20.68	20.61	19.65	20.27	19.87
diabetes	3796.62	4333.66	4337.15	4303.96	4435.95	3431.01

Runtime¶

Runtime in seconds reported in the table below covers both the training stage and evaluating stage.

Name	RF	HGBDT	XGB EXACT	XGB HIST	LightGBM	Deep Forest
wine	0.76	2.88	0.30	0.30	0.30	1.26
abalone	0.53	1.57	0.47	0.50	0.17	1.29
cpusmall	1.87	3.59	1.71	1.25	0.36	2.06
boston	0.70	1.75	0.19	0.22	0.20	1.45
diabetes	0.37	0.66	0.14	0.18	0.06	1.09