You are fitting the StandardScaler
within GridSearchCV
to the training data, whereas you are re-fitting your “manual” scaler
to the test data. With
scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.fit_transform(x_test)
you are overwriting the scaler
fit to the training data!
This is not the way using a scaler is intended to. Fit the scaler to the training data and then use this scaler to standardize your test data.
Let’s compare your output with how it should look like. First let’s extract the scaler fit in GridSearchCV
and standardize the test data with it:
gscv_sclr = mod.best_estimator_.named_steps['scaler']
gscv_test_scld = gscv_sclr.transform(x_test)
As you can see, this is not equal your manually standardized test data:
np.allclose(gscv_test_scld, x_test_scaled)
# Out: False
Now let’s fit the “manual” standardizer only with the training data and use this standardizer to transform your test data:
scaler_new = StandardScaler()
x_train_scaled = scaler_new.fit_transform(x_train)
x_test_scaled_new = scaler_new.transform(x_test)
# and compare it to the gridsearchcv scaler:
np.allclose(gscv_test_scld, x_test_scaled_new)
# Out: True
Which is equal!
Now use this correctly standardized test set to make your predictions:
# this refitting is actually not needed. it is simply here for having separate models...
manual_svr_new = SVR(kernel='rbf', gamma=gamma_space, C=C_space,
epsilon=epsilon_space).fit(x_train_scaled, y_train)
y_pred_manual_new = manual_svr_new.predict(x_test_scaled_new)
error_manual_new = mean_absolute_error(y_pred_manual_new, y_test)
# And test it:
error_manual_new == error
# Out: True
And now you’ve got the result of your pipeline.
CLICK HERE to find out more related problems solutions.