[Ensemble] Gradient Boosting
1. Gradient Boosting
-
AdaBoost는 오분류 관찰치에 가중치를 올리는 방법
([Ensemble] Intro to Boosting & AdaBoost - Data Science (whatsdata.github.io)
-
이에 반해, Gradient Boosting은 직전 단계의 오차를 학습하는 방법임.
1.1. Idea
-
$Y = h_1 (x) + e_1$
-
$e_1 = h_2(x) + e_2$
-
$e_2 = h_3(x) + e_3$
-
$Y = h_1(x) + h_2(x) + h_3(x) + e_3$
$\vdots$
-
$\hat{Y} = w_1 h_1(x) + w_2 h_2(x) + w_3 h_3(x) + \cdots + w_m h_m (x)$
- tree를 base learner로 사용한다고 쳤을 때, tree 1을 통해 예측하고 남은 잔차를 tree2로 예측하고, 2의 잔차를 3으로 예측하고.. 하면서 이를 결합한 강한 분류기 (Strong learner)를 만들어갑니다.
#
1.2. Why Gradient?
-
Gradient Boosting이라고 칭하는 이유는, Loss function이 Squared error라면 negative gradient = Residual이 성립하기 때문.
-
Loss가 다음과 같을 때, \(L(f) = \sum_{i=1}^n L(y_i ,f(x_i))\)
-
gradient를 구하면 다음과 같다.
-
즉, Graident가 residual의 음수이기 때문에 residual을 이용한다. 만일, Loss function이 달라진다면 더이상 residual을 사용하지 않고 다른 함수를 사용할 수도 있다.
-
예시) Classification $(y_i \in {0,1})$
1.3. Algorithm
wikipedia 참고
1. Initialise model with a constant value
- $F_0 (x) = arg \underset{\gamma}{min} \sum_{i=1}^n L(y_i ,\gamma) $
2. For m=1 to M :
-
Compute so-called $pseudo - residual $ (which is residual for regression case) \(r_{im} = \nabla L(F_m) = {\large \frac{\partial L(y_i , F(x_i)_m)}{\partial F(x_i)_m)} }, \quad i = 1, \cdots ,n\)
-
Fir a base learner (or a weak learner, like tree) closed under scaling $h_m (x)$ to pseudo-residuals
-
compute multipler $\gamma _m$ by solving the following one-dimensional optimization problem
\(\gamma_m = \underset{\gamma}{arg min} \sum_{i=1}^n L(y_i , F(x_i)_{m-1} + \gamma h_m (x_i))\) -
Update the model \(F_m (x) = F_{m-1} (x) + \gamma_m h_m (x)\)
3. Output $F_M (x)$
2. GB code from scratch
- train_X, train_y, test_X, test_y를 구축하여 test error를 구하는게 목적
- Initial model로는 Decision Tree Classifier를 사용하고, 이후로 훈련할 때는 Decision Tree Regressor 이용
- 101번 반복하며, 1과 0으로 나누는 Classification 학습.
- Learning rate는 임의로 0.1에 고정시킴.
import pandas as pd
import numpy as np
import os
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.datasets import load_digits
from sklearn import metrics
digits = load_digits()
train_size = 1500
train_x, train_y = digits.data[:train_size], digits.target[:train_size]
test_x, test_y = digits.data[train_size:], digits.target[train_size:]
def GradientBoost(n_estimators, learning_rate, cutoff):
# The first classifier(C_0)
first_classifier = DecisionTreeClassifier(max_depth=3, random_state=0)
# Fit
first_classifier.fit(train_X, train_y)
# Predict with probability
# from the example of classification above, we use y_i - p_i as a residual
train_pred = np.array(first_classifier.predict_proba(train_X)[:,1])
test_pred = np.array(first_classifier.predict_proba(test_X)[:,1])
# Residual
resid = (train_y - train_pred)
# Regressor tree (for B = 100)
for b in range(1, n_estimators+1):
regressor_tree = DecisionTreeRegressor(max_depth=3, random_state = 0)
# Fit
regressor_tree.fit(train_X, resid)
# Predict
reg_train_pred = regressor_tree.predict(train_X)
reg_test_pred = regressor_tree.predict(test_X)
# Update prediction using Gradient Descent Method
train_pred = train_pred + learning_rate * reg_train_pred
test_pred = test_pred + learning_rate * reg_test_pred
# Update Residual
resid = (train_y - train_pred)
# Lastly, if prediction result is over cutoff, return 1, else return 0
train_pred_result = np.array([1 if prob > cutoff else 0 for prob in train_pred])
test_pred_result = np.array([1 if prob > cutoff else 0 for prob in test_pred])
return test_y, train_pred_result, test_pred_result
GB code from sklearn
1. Load data & packages
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_digits
from sklearn import metrics
import numpy as np
digits = load_digits()
train_size = 1500
train_x, train_y = digits.data[:train_size], digits.target[:train_size]
test_x, test_y = digits.data[train_size:], digits.target[train_size:]
np.random.seed(123456)
2. Create ensemble
# Create the ensemble
ensemble_size = 200
learning_rate = 0.1
ensemble = GradientBoostingClassifier(max_depth = 3,
n_estimators = ensemble_size,
learning_rate = learning_rate)
ensemble.fit(train_x, train_y)
3. Print result
# Evaluation
gradient_digit_predictions = ensemble.predict(test_x)
gradient_digit_acc = metrics.accuracy_score(test_y, gradient_digit_predictions)
print("Gradient Boosting")
print("Accuracy: %.2f" % gradient_digit_acc)
Gradient Boosting
Accuracy: 0.88
Leave a comment