A Random Forest is an ensemble of Decision Trees, generally trained via the bagging method (or sometimes pasting), typically with max_samples set to the size of the training set. With a few exceptions, a Random Forest has all the hyperparameters of the Decision Tree and Bagging Classifier.
The Random Forest introduces extra randomness when growing trees. Instead of searching for the very best feature when splitting a node, it searches for the best feature among a random subset of features. This results in greater diversity, which trades a higher bias for lower variance.
Importing and Splitting the data
For this post, we will use make_moons to generate our data. I will also split the data into training and test sets using train_test_split
# Importing data
from sklearn.datasets import make_moons
X, y = make_moons(n_samples=1000, noise=0.30, random_state=42)
# Splitting the data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
RandomForestClassifier
You can find the documentation for RandomForestClassifer here.
A simple implementation of RandomForestClassifier:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
rnd_clf = RandomForestClassifier(
n_estimators=500,
max_leaf_nodes=16,
n_jobs=-1
)
rnd_clf.fit(X_train, y_train)
y_pred = rnd_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of Random Forest: {accuracy*100}%')
Output:
Accuracy of Random Forest: 92.5%
I have written separate posts on Decision Trees and Bagging Classifiers, please give that a read before this one, as Random Forest is mostly a combination of the two.
Hello There. I discovered your blog the usage of msn. That
is an extremely well written article. I’ll be sure to bookmark it and come
back to read more of your helpful information. Thanks for the
post. I’ll definitely return.