Python Polynomial Regression with Scikit-learn

Click For Summary
The discussion revolves around a comparison of polynomial regression and linear regression using Python code. The user initially encounters confusion when their plot displays four curves instead of the expected two. The code provided uses `numpy`, `matplotlib`, and `sklearn` to fit both linear and quadratic models to training data representing pizza diameter and price. The issue arises from the incorrect plotting of the quadratic regression results. Initially, the user plots `xx_quadratic` against `yy_quadratic`, leading to additional curves appearing in the graph. After clarification, the user realizes that they should plot `xx` against `yy_quadratic` to correctly display the intended comparison between the linear and polynomial regression models. The resolution results in a corrected figure that aligns with the expectations set by the book example.
EngWiPy
Messages
1,361
Reaction score
61
Hello,

I followed an example in a book that compares polynomial regression with linear regression. We have one feature or explanatory variable. The code is the following:

Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

X_train = np.array([6, 8, 10, 14, 18]).reshape(-1, 1)
Y_train = np.array([7, 9, 13, 17.5, 18])

X_test = np.array([6, 8, 11, 16]).reshape(-1, 1)
Y_test = np.array([8, 12, 15, 18])

regressor_linear = LinearRegression()
regressor_linear.fit(X_train, Y_train)

xx = np.linspace(0, 25, 100)
yy = regressor_linear.predict(xx.reshape(xx.shape[0], 1))

plt.plot(xx, yy)

quadratic_featurizer = PolynomialFeatures(degree = 2)
X_train_quadratic = quadratic_featurizer.fit_transform(X_train)
X_test_quadratic = quadratic_featurizer.transform(X_test)

regressor_quadratic = LinearRegression()
regressor_quadratic.fit(X_train_quadratic, Y_train)

xx_quadratic = quadratic_featurizer.transform(xx.reshape(xx.shape[0], 1))
yy_quadratic = regressor_quadratic.predict(xx_quadratic)
print(xx_quadratic)

plt.plot(xx_quadratic, yy_quadratic)
plt.title("Polynomial Vs Linear Regression")
plt.xlabel("Pizza diameter")
plt.ylabel("Pizza Price")
plt.scatter(X_train, Y_train)
plt.axis([0, 25, 0, 25])
plt.grid(True)
plt.show()

However, the figure (attached) shows 4 curves not just two. Why? In the book it shows just two.
fig.jpg
 

Attachments

  • fig.jpg
    fig.jpg
    21.1 KB · Views: 3,346
Last edited by a moderator:
Technology news on Phys.org
S_David said:
Hello,

I followed an example in a book that compares polynomial regression with linear regression. We have one feature or explanatory variable. The code is the following:

Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

X_train = np.array([6, 8, 10, 14, 18]).reshape(-1, 1)
Y_train = np.array([7, 9, 13, 17.5, 18])

X_test = np.array([6, 8, 11, 16]).reshape(-1, 1)
Y_test = np.array([8, 12, 15, 18])

regressor_linear = LinearRegression()
regressor_linear.fit(X_train, Y_train)

xx = np.linspace(0, 25, 100)
yy = regressor_linear.predict(xx.reshape(xx.shape[0], 1))

plt.plot(xx, yy)

quadratic_featurizer = PolynomialFeatures(degree = 2)
X_train_quadratic = quadratic_featurizer.fit_transform(X_train)
X_test_quadratic = quadratic_featurizer.transform(X_test)

regressor_quadratic = LinearRegression()
regressor_quadratic.fit(X_train_quadratic, Y_train)

xx_quadratic = quadratic_featurizer.transform(xx.reshape(xx.shape[0], 1))
yy_quadratic = regressor_quadratic.predict(xx_quadratic)
print(xx_quadratic)

plt.plot(xx_quadratic, yy_quadratic)
plt.title("Polynomial Vs Linear Regression")
plt.xlabel("Pizza diameter")
plt.ylabel("Pizza Price")
plt.scatter(X_train, Y_train)
plt.axis([0, 25, 0, 25])
plt.grid(True)
plt.show()

However, the figure (attached) shows 4 curves not just two. Why? In the book it shows just two.
View attachment 215921
Are you asking why the vertical line segment (gold) and the sloped segment (red) are plotted? I suspect it's due to the line plt.plot(xx, yy). You could test this by commenting that line out and seeing whether those two lines go away.
 
  • Like
Likes EngWiPy
Mark44 said:
Are you asking why the vertical line segment (gold) and the sloped segment (red) are plotted? I suspect it's due to the line plt.plot(xx, yy). You could test this by commenting that line out and seeing whether those two lines go away.

Yes, I meant the gold and red ones. They appear along with the green one due to the following line

Code:
plt.plot(xx_quadratic, yy_quadratic)

Commenting the above line removes the three mentioned curves. But why do I have 2 extra curves?
 
OK, I discovered my mistake. I must plot

Code:
plt.plot(xx, yy_quadratic)

and not

Code:
plt.plot(xx_quadratic, yy_quadratic)

The new figure is attached.

Thanks

fig.jpg
 

Attachments

  • fig.jpg
    fig.jpg
    20.9 KB · Views: 2,611
Learn If you want to write code for Python Machine learning, AI Statistics/data analysis Scientific research Web application servers Some microcontrollers JavaScript/Node JS/TypeScript Web sites Web application servers C# Games (Unity) Consumer applications (Windows) Business applications C++ Games (Unreal Engine) Operating systems, device drivers Microcontrollers/embedded systems Consumer applications (Linux) Some more tips: Do not learn C++ (or any other dialect of C) as a...

Similar threads

  • · Replies 5 ·
Replies
5
Views
11K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
22
Views
7K