Polynomial Regression with Scikit-learn

In summary, the book shows two curves, but the code generated four. The figure attached shows the original three and the fourth curve which was generated when the line plt.plot(xx, yy_quadratic) was commented out.
  • #1
EngWiPy
1,368
61
Hello,

I followed an example in a book that compares polynomial regression with linear regression. We have one feature or explanatory variable. The code is the following:

Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

X_train = np.array([6, 8, 10, 14, 18]).reshape(-1, 1)
Y_train = np.array([7, 9, 13, 17.5, 18])

X_test = np.array([6, 8, 11, 16]).reshape(-1, 1)
Y_test = np.array([8, 12, 15, 18])

regressor_linear = LinearRegression()
regressor_linear.fit(X_train, Y_train)

xx = np.linspace(0, 25, 100)
yy = regressor_linear.predict(xx.reshape(xx.shape[0], 1))

plt.plot(xx, yy)

quadratic_featurizer = PolynomialFeatures(degree = 2)
X_train_quadratic = quadratic_featurizer.fit_transform(X_train)
X_test_quadratic = quadratic_featurizer.transform(X_test)

regressor_quadratic = LinearRegression()
regressor_quadratic.fit(X_train_quadratic, Y_train)

xx_quadratic = quadratic_featurizer.transform(xx.reshape(xx.shape[0], 1))
yy_quadratic = regressor_quadratic.predict(xx_quadratic)
print(xx_quadratic)

plt.plot(xx_quadratic, yy_quadratic)
plt.title("Polynomial Vs Linear Regression")
plt.xlabel("Pizza diameter")
plt.ylabel("Pizza Price")
plt.scatter(X_train, Y_train)
plt.axis([0, 25, 0, 25])
plt.grid(True)
plt.show()

However, the figure (attached) shows 4 curves not just two. Why? In the book it shows just two.
fig.jpg
 

Attachments

  • fig.jpg
    fig.jpg
    21.1 KB · Views: 3,252
Last edited by a moderator:
Technology news on Phys.org
  • #2
S_David said:
Hello,

I followed an example in a book that compares polynomial regression with linear regression. We have one feature or explanatory variable. The code is the following:

Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

X_train = np.array([6, 8, 10, 14, 18]).reshape(-1, 1)
Y_train = np.array([7, 9, 13, 17.5, 18])

X_test = np.array([6, 8, 11, 16]).reshape(-1, 1)
Y_test = np.array([8, 12, 15, 18])

regressor_linear = LinearRegression()
regressor_linear.fit(X_train, Y_train)

xx = np.linspace(0, 25, 100)
yy = regressor_linear.predict(xx.reshape(xx.shape[0], 1))

plt.plot(xx, yy)

quadratic_featurizer = PolynomialFeatures(degree = 2)
X_train_quadratic = quadratic_featurizer.fit_transform(X_train)
X_test_quadratic = quadratic_featurizer.transform(X_test)

regressor_quadratic = LinearRegression()
regressor_quadratic.fit(X_train_quadratic, Y_train)

xx_quadratic = quadratic_featurizer.transform(xx.reshape(xx.shape[0], 1))
yy_quadratic = regressor_quadratic.predict(xx_quadratic)
print(xx_quadratic)

plt.plot(xx_quadratic, yy_quadratic)
plt.title("Polynomial Vs Linear Regression")
plt.xlabel("Pizza diameter")
plt.ylabel("Pizza Price")
plt.scatter(X_train, Y_train)
plt.axis([0, 25, 0, 25])
plt.grid(True)
plt.show()

However, the figure (attached) shows 4 curves not just two. Why? In the book it shows just two.
View attachment 215921
Are you asking why the vertical line segment (gold) and the sloped segment (red) are plotted? I suspect it's due to the line plt.plot(xx, yy). You could test this by commenting that line out and seeing whether those two lines go away.
 
  • Like
Likes EngWiPy
  • #3
Mark44 said:
Are you asking why the vertical line segment (gold) and the sloped segment (red) are plotted? I suspect it's due to the line plt.plot(xx, yy). You could test this by commenting that line out and seeing whether those two lines go away.

Yes, I meant the gold and red ones. They appear along with the green one due to the following line

Code:
plt.plot(xx_quadratic, yy_quadratic)

Commenting the above line removes the three mentioned curves. But why do I have 2 extra curves?
 
  • #4
OK, I discovered my mistake. I must plot

Code:
plt.plot(xx, yy_quadratic)

and not

Code:
plt.plot(xx_quadratic, yy_quadratic)

The new figure is attached.

Thanks

fig.jpg
 

Attachments

  • fig.jpg
    fig.jpg
    20.9 KB · Views: 2,533

What is Polynomial Regression?

Polynomial Regression is a type of regression analysis used to model non-linear relationships between independent and dependent variables. It involves fitting a polynomial function to the data points in order to best explain the relationship between the variables.

What is Scikit-learn?

Scikit-learn is a free and open-source machine learning library for the Python programming language. It provides a wide range of tools and algorithms for data analysis, machine learning, and predictive modeling.

How does Scikit-learn implement Polynomial Regression?

Scikit-learn implements Polynomial Regression through the PolynomialFeatures class, which transforms the original features into polynomial features. It then uses the LinearRegression class to fit a linear regression model to the transformed features.

What are the advantages of using Polynomial Regression with Scikit-learn?

Some advantages of using Polynomial Regression with Scikit-learn include its ability to model non-linear relationships, its flexibility in choosing the degree of the polynomial, and its simplicity in implementation. Additionally, Scikit-learn offers many useful tools for data preprocessing, validation, and evaluation.

How do I choose the degree of the polynomial for my data?

The degree of the polynomial should be chosen based on the complexity of the relationship between the variables. A higher degree polynomial may better fit the data, but it can also lead to overfitting. It is important to use cross-validation techniques to find the optimal degree for the polynomial.

Similar threads

  • Programming and Computer Science
Replies
5
Views
11K
Replies
22
Views
6K
Back
Top