Fit_transform() vs. transform()

  • Context: Python 
  • Thread starter Thread starter EngWiPy
  • Start date Start date
  • Tags Tags
    Transform
Click For Summary

Discussion Overview

The discussion revolves around the differences between the methods fit_transform() and transform() in the context of the PolynomialFeatures class from scikit-learn. Participants explore the implications of using these methods for feature transformation in machine learning, particularly in relation to training and testing datasets.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • Some participants explain that fit_transform() combines the actions of fitting and transforming the data, while transform() is used to apply the transformation to new data based on the previously fitted model.
  • One participant suggests that using fit_transform() on training data is a shorthand for first fitting and then transforming, which could be seen as unnecessary if the goal is only to transform the training data.
  • A participant questions what exactly is being "fitted" when using fit_transform(), proposing that it may simply be determining the combinations of features needed for transformation rather than fitting in a traditional sense.
  • Another participant agrees that the fit part of fit_transform() seems to establish the necessary feature combinations, while the transform part applies these combinations to the data.

Areas of Agreement / Disagreement

Participants generally agree on the functional roles of fit_transform() and transform(), but there is some disagreement regarding the interpretation of what "fitting" entails in this context. The discussion remains unresolved regarding the deeper implications of the fitting process.

Contextual Notes

Some assumptions about the nature of fitting and transforming in scikit-learn may be implicit, and the discussion does not clarify whether the terminology used is universally applicable across different modules.

EngWiPy
Messages
1,361
Reaction score
61
Hi,

I noticed that in some cases we first call fit_transform(), and afterwards we call transform(). Like in the following example:

Code:
from sklearn.preprocessing import PolynomialFeatures

X_train = np.array([6, 8, 10, 14, 18]).reshape(-1, 1)
X_test = np.array([6, 8, 11, 16]).reshape(-1, 1)

quadratic_featurizer = PolynomialFeatures(degree = 2)

X_train_qudratic = quadratic_featurizer.fit_transform(X_train)
X_test_qudratic = quadratic_featurizer.transform(X_test)

Why? What is the difference between the two methods?

Thanks
 
Technology news on Phys.org
I haven't used PolynomialFeatures before but fit(), fit_transform(), and transform() are standard methods is scikit-learn.

fit_transform() is essentially the same as calling fit() and then transform() - so is like a shortcut for two commands in one if you wish.

So when you do X_train_qudratic = quadratic_featurizer.fit_transform(X_train) what you are doing is fitting quadratic_featurizer on X_train and using it to transform X_train itself. This should be equal to (and is a shorthand for):

quadratic_featurizer.fit(X_train)
X_train_qudratic = quadratic_featurizer.transform(X_train)


On the other hand, when you do X_test_qudratic = quadratic_featurizer.transform(X_test) you are using a previously fitted quadratic_featurizer to transform X_test. This should fail unless you have previously called either .fit() or .fit_transform on quadratic_featurizer.

Hope it makes sense.

I am guessing what you are trying to do is actually:
quadratic_featurizer.fit(X_train)
X_test_qudratic = quadratic_featurizer.transform(X_test)

Although what you did, e.g.:
X_train_qudratic = quadratic_featurizer.fit_transform(X_train)
X_test_qudratic = quadratic_featurizer.transform(X_test)


will also work but you are unnecessarily transforming X_train by calling fit_transform() instead of fit().
 
  • Like
Likes   Reactions: EngWiPy
Smile Say Hello said:
I haven't used PolynomialFeatures before but fit(), fit_transform(), and transform() are standard methods is scikit-learn.

fit_transform() is essentially the same as calling fit() and then transform() - so is like a shortcut for two commands in one if you wish.

So when you do X_train_qudratic = quadratic_featurizer.fit_transform(X_train) what you are doing is fitting quadratic_featurizer on X_train and using it to transform X_train itself. This should be equal to (and is a shorthand for):

quadratic_featurizer.fit(X_train)
X_train_qudratic = quadratic_featurizer.transform(X_train)


On the other hand, when you do X_test_qudratic = quadratic_featurizer.transform(X_test) you are using a previously fitted quadratic_featurizer to transform X_test. This should fail unless you have previously called either .fit() or .fit_transform on quadratic_featurizer.

Hope it makes sense.

I am guessing what you are trying to do is actually:
quadratic_featurizer.fit(X_train)
X_test_qudratic = quadratic_featurizer.transform(X_test)

Although what you did, e.g.:
X_train_qudratic = quadratic_featurizer.fit_transform(X_train)
X_test_qudratic = quadratic_featurizer.transform(X_test)


will also work but you are unnecessarily transforming X_train by calling fit_transform() instead of fit().

It makes sense. I did X_train_qudratic = quadratic_featurizer.fit_transform(X_train) because later in my code I use X_train_quadratic to train a model using .fit() and then test the performance of the model on X_test_quadratic.

I have one question: the method .fit_transform() fits what to the training data X_train? For example if
X_train = [1
2
3
4]
the quadratic_featurizer.fit_transform(X_train) will result in
[ 1 1 1
1 2 4
1 3 9
1 4 16]
which is basically the value of the independent variable x_1 in the polynomial equation
[tex]y = \beta_0 + \beta_1x_1 + \beta_2x_1^2[/tex]

In this case .fit_transform() fits what to X_train?

Thanks
 
S_David said:
It makes sense. I did X_train_qudratic = quadratic_featurizer.fit_transform(X_train) because later in my code I use X_train_quadratic to train a model using .fit() and then test the performance of the model on X_test_quadratic.

I have one question: the method .fit_transform() fits what to the training data X_train? For example if
X_train = [1
2
3
4]
the quadratic_featurizer.fit_transform(X_train) will result in
[ 1 1 1
1 2 4
1 3 9
1 4 16]
which is basically the value of the independent variable x_1 in the polynomial equation
[tex]y = \beta_0 + \beta_1x_1 + \beta_2x_1^2[/tex]

In this case .fit_transform() fits what to X_train?

Thanks
 
It looks like it is not actually fitting anything-- I think it is called fit_transform() simply because scikit-learn tries to provide a uniform interface and a lot of other modules in scikit-learn use the same terminology. What fit() and the fit part of fit_transform() seems to do is simply determine the combinations of features it needs to return for the given input shape. So when you later call transform many times, it can skip that part and simply return the values.

So, in this case the fit() part figures that it is a single feature and x^0,x^1, and x^2 need to be returned and the transform() part simply returns them for each sample on that basis.
 
  • Like
Likes   Reactions: EngWiPy
Smile Say Hello said:
It looks like it is not actually fitting anything-- I think it is called fit_transform() simply because scikit-learn tries to provide a uniform interface and a lot of other modules in scikit-learn use the same terminology. What fit() and the fit part of fit_transform() seems to do is simply determine the combinations of features it needs to return for the given input shape. So when you later call transform many times, it can skip that part and simply return the values.

So, in this case the fit() part figures that it is a single feature and x^0,x^1, and x^2 need to be returned and the transform() part simply returns them for each sample on that basis.

Thanks for your replies. It is more clear now.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
5K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
3
Views
2K
Replies
4
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 17 ·
Replies
17
Views
4K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 18 ·
Replies
18
Views
4K