Polynomial Regression with Scikit-learn

  • Context: Python 
  • Thread starter Thread starter EngWiPy
  • Start date Start date
  • Tags Tags
    Polynomial Regression
Click For Summary

Discussion Overview

The discussion revolves around the implementation of polynomial regression using Scikit-learn, specifically comparing it to linear regression with a single feature. Participants explore the output of their code, particularly focusing on the number of curves displayed in the resulting plot.

Discussion Character

  • Technical explanation, Conceptual clarification, Debate/contested

Main Points Raised

  • One participant shares their code for polynomial regression and notes a discrepancy in the number of curves displayed compared to a book example.
  • Another participant suggests that the extra curves may be due to the line of code that plots the linear regression output.
  • A later reply confirms the presence of the extra curves and indicates that commenting out the linear regression plot line could clarify the issue.
  • One participant identifies their mistake in plotting and corrects their approach by suggesting a different line of code to achieve the intended output.

Areas of Agreement / Disagreement

The discussion reflects a progression of understanding as participants clarify the source of the confusion regarding the plot. There is no explicit consensus on the initial confusion, but the later correction indicates a resolution to the specific plotting issue.

Contextual Notes

Participants rely on specific lines of code and their effects on the output, indicating a dependence on the correct implementation of plotting functions. The discussion does not resolve broader questions about polynomial regression itself.

EngWiPy
Messages
1,361
Reaction score
61
Hello,

I followed an example in a book that compares polynomial regression with linear regression. We have one feature or explanatory variable. The code is the following:

Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

X_train = np.array([6, 8, 10, 14, 18]).reshape(-1, 1)
Y_train = np.array([7, 9, 13, 17.5, 18])

X_test = np.array([6, 8, 11, 16]).reshape(-1, 1)
Y_test = np.array([8, 12, 15, 18])

regressor_linear = LinearRegression()
regressor_linear.fit(X_train, Y_train)

xx = np.linspace(0, 25, 100)
yy = regressor_linear.predict(xx.reshape(xx.shape[0], 1))

plt.plot(xx, yy)

quadratic_featurizer = PolynomialFeatures(degree = 2)
X_train_quadratic = quadratic_featurizer.fit_transform(X_train)
X_test_quadratic = quadratic_featurizer.transform(X_test)

regressor_quadratic = LinearRegression()
regressor_quadratic.fit(X_train_quadratic, Y_train)

xx_quadratic = quadratic_featurizer.transform(xx.reshape(xx.shape[0], 1))
yy_quadratic = regressor_quadratic.predict(xx_quadratic)
print(xx_quadratic)

plt.plot(xx_quadratic, yy_quadratic)
plt.title("Polynomial Vs Linear Regression")
plt.xlabel("Pizza diameter")
plt.ylabel("Pizza Price")
plt.scatter(X_train, Y_train)
plt.axis([0, 25, 0, 25])
plt.grid(True)
plt.show()

However, the figure (attached) shows 4 curves not just two. Why? In the book it shows just two.
fig.jpg
 

Attachments

  • fig.jpg
    fig.jpg
    21.1 KB · Views: 3,358
Last edited by a moderator:
Technology news on Phys.org
S_David said:
Hello,

I followed an example in a book that compares polynomial regression with linear regression. We have one feature or explanatory variable. The code is the following:

Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

X_train = np.array([6, 8, 10, 14, 18]).reshape(-1, 1)
Y_train = np.array([7, 9, 13, 17.5, 18])

X_test = np.array([6, 8, 11, 16]).reshape(-1, 1)
Y_test = np.array([8, 12, 15, 18])

regressor_linear = LinearRegression()
regressor_linear.fit(X_train, Y_train)

xx = np.linspace(0, 25, 100)
yy = regressor_linear.predict(xx.reshape(xx.shape[0], 1))

plt.plot(xx, yy)

quadratic_featurizer = PolynomialFeatures(degree = 2)
X_train_quadratic = quadratic_featurizer.fit_transform(X_train)
X_test_quadratic = quadratic_featurizer.transform(X_test)

regressor_quadratic = LinearRegression()
regressor_quadratic.fit(X_train_quadratic, Y_train)

xx_quadratic = quadratic_featurizer.transform(xx.reshape(xx.shape[0], 1))
yy_quadratic = regressor_quadratic.predict(xx_quadratic)
print(xx_quadratic)

plt.plot(xx_quadratic, yy_quadratic)
plt.title("Polynomial Vs Linear Regression")
plt.xlabel("Pizza diameter")
plt.ylabel("Pizza Price")
plt.scatter(X_train, Y_train)
plt.axis([0, 25, 0, 25])
plt.grid(True)
plt.show()

However, the figure (attached) shows 4 curves not just two. Why? In the book it shows just two.
View attachment 215921
Are you asking why the vertical line segment (gold) and the sloped segment (red) are plotted? I suspect it's due to the line plt.plot(xx, yy). You could test this by commenting that line out and seeing whether those two lines go away.
 
  • Like
Likes   Reactions: EngWiPy
Mark44 said:
Are you asking why the vertical line segment (gold) and the sloped segment (red) are plotted? I suspect it's due to the line plt.plot(xx, yy). You could test this by commenting that line out and seeing whether those two lines go away.

Yes, I meant the gold and red ones. They appear along with the green one due to the following line

Code:
plt.plot(xx_quadratic, yy_quadratic)

Commenting the above line removes the three mentioned curves. But why do I have 2 extra curves?
 
OK, I discovered my mistake. I must plot

Code:
plt.plot(xx, yy_quadratic)

and not

Code:
plt.plot(xx_quadratic, yy_quadratic)

The new figure is attached.

Thanks

fig.jpg
 

Attachments

  • fig.jpg
    fig.jpg
    20.9 KB · Views: 2,625

Similar threads

  • · Replies 5 ·
Replies
5
Views
12K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
22
Views
7K