Python Very basic website -- scraper "'str' object is not callable"

  • Thread starter Thread starter Greg Bernhardt
  • Start date Start date
AI Thread Summary
The discussion revolves around a beginner's attempt to create a web scraping project using Python, specifically focusing on retrieving data from Google search results for WordPress plugins. Key points include the use of libraries like `requests` and `BeautifulSoup` for making HTTP requests and parsing HTML, respectively. The user encountered issues with the code, particularly a `NameError` due to redefining the built-in `print` function as a string variable, which led to confusion when trying to call `print` later in the script. Suggestions were made to avoid using built-in names for variables and to follow naming conventions like camel case or underscores. The importance of using an IDE like PyCharm was highlighted, as it can flag such issues early on. Overall, the conversation emphasizes common pitfalls in Python programming and the value of adhering to best practices in coding.
Messages
19,787
Reaction score
10,739
TL;DR Summary
Making a very basic web scraper and I have the error line 48 'str' object is not callable, and I can't figure it out.
Learning Python and this is a first attempt at a project.

[CODE lang="python" highlight="48"]import urllib
import requests
from bs4 import BeautifulSoup, Comment

# desktop user-agent
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
# mobile user-agent
MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36"

getkeyword = "plugin"
query = "wordpress plugin"
query = query.replace(' ', '+')
URL = f"https://google.com/search?q={query}"

headers = {"user-agent": USER_AGENT}
resp = requests.get(URL, headers=headers)

if resp.status_code == 200:
soup = BeautifulSoup(resp.content, "html.parser")
results = []
links= []
for g in soup.find_all('div', class_='r'):
anchors = g.find_all('a')
if anchors:
links.append(anchors[0]['href'])
else:
print("Google may have blocked you, try again in an hour")

i=0
for num in links:
URL = links
resp = requests.get(URL, headers=headers)

if resp.status_code == 200:
soup = BeautifulSoup(resp.content, "html.parser")

if soup.find('title'):
title = soup.find('title').text

if soup.find('h1'):
h1 = soup.find('h1').text

if soup.find('h2'):
h2 = soup.find('h2').text

bodytext = soup.find('body').text

print("#" + str(i+1) + ": " + links)

if title:
print("Title: " + title)
if h1:
print("H1: " + h1)
if h2:
print("H2: " + h2)

if getkeyword in bodytext:
print = ("Keyword: Yes")
else:
print = ("Keyword: No")

for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):
getComments = comments.extract()
if "Yoast" in getComments:
print("Yoast: Yes")
break
print("\r\r")
else:
print("Site is unavailable")
break

i += 1

del links[/CODE]
 
  • Like
Likes sysprog and atyy
Technology news on Phys.org
A bit of Googling suggests that this can happen if you've defined a variable called str before running your script. Then python thinks you are referring to this variable and wonders why you are trying to call a variable. Try adding del str to the top of your script and running it. If that works, you should be able to remove the line you just added.
 
  • Like
Likes sysprog
Ibix said:
Try adding del str to the top of your script and running it.
NameError: name 'str' is not defined
 
Line 58 and 60 you redefined print as a string.
 
  • Like
Likes sysprog, atyy and Greg Bernhardt
jedishrfu said:
Line 58 and 60 you redefined print as a string.

Yes, I think that's it. I assume the intent was to have those lines print things, not set the variable "print" equal to things.
 
  • Like
Likes sysprog and jedishrfu
Ah - I missed that. As per the previous two comments, you've stored a string in a variable called print. When you then try to call print the second time around the loop, python tries to call that string, which doesn't work.
 
  • Like
Likes sysprog and jedishrfu
I haven’t tested your code on Pycharm but it might flag this kind of error so you can fix it sooner. I know it can flag code that violates the Pep rules.

UPDATE: I tested it on the PyCharm IDE and PyCharm underlined the print word and gave the warning that it shadows a built-in name "print". Additional commentary mentioned len and list as other commonly shadowed built-in names.

I did run it on Pythonista on iOS and it showed the print to be a keyword but didn’t warn me until I ran it And got your error.

With respect to variable naming, I always strive to use camel case Or underscores and to never use a single English word like print. Sometimes I'll keep it simple using short one letter variables for intermediate results like s1 or s2 for strings.
 
Last edited:
  • Like
Likes sysprog
jedishrfu said:
i did run it on Pythonista on iOS and it showed the print to be a keyword

"print" isn't a keyword in Python 3. In Python 2 lines 58 and 60 in Greg's script would have been a syntax error.
 
  • Like
Likes sysprog and jedishrfu
Curiously, even in Greg's listing the pretty printer plugin Geshi shows "print" highlighted as a keyword likely due to the fact that python2/3 is mapped to a generic python in the pretty printer plugin.
 
  • #10
jedishrfu said:
Line 58 and 60 you redefined print as a string.
Woohoo that was it! Always the silly things for me! Thanks!
 
  • Like
Likes sysprog, atyy and jedishrfu

Similar threads

Replies
3
Views
2K
Back
Top