Are the pipeline steps followed when Predicting after GridSearch CV

You are fitting the StandardScaler within GridSearchCV to the training data, whereas you are re-fitting your “manual” scaler to the test data. With

scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.fit_transform(x_test)

you are overwriting the scaler fit to the training data!
This is not the way using a scaler is intended to. Fit the scaler to the training data and then use this scaler to standardize your test data.

Let’s compare your output with how it should look like. First let’s extract the scaler fit in GridSearchCV and standardize the test data with it:

gscv_sclr = mod.best_estimator_.named_steps['scaler']
gscv_test_scld = gscv_sclr.transform(x_test)

As you can see, this is not equal your manually standardized test data:

np.allclose(gscv_test_scld, x_test_scaled)
# Out: False

Now let’s fit the “manual” standardizer only with the training data and use this standardizer to transform your test data:

scaler_new = StandardScaler()
x_train_scaled = scaler_new.fit_transform(x_train)
x_test_scaled_new = scaler_new.transform(x_test)

# and compare it to the gridsearchcv scaler:
np.allclose(gscv_test_scld, x_test_scaled_new)
# Out: True

Which is equal!
Now use this correctly standardized test set to make your predictions:

# this refitting is actually not needed. it is simply here for having separate models...
manual_svr_new = SVR(kernel='rbf', gamma=gamma_space, C=C_space,
    epsilon=epsilon_space).fit(x_train_scaled, y_train)

y_pred_manual_new = manual_svr_new.predict(x_test_scaled_new)

error_manual_new = mean_absolute_error(y_pred_manual_new, y_test)

# And test it:
error_manual_new == error
# Out: True

And now you’ve got the result of your pipeline.

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top