Python Error of Execution While Trying to Scrape a Website

  • Thread starter Thread starter Thejaswini
  • Start date Start date
  • Tags Tags
    Error
AI Thread Summary
The discussion revolves around a user attempting to scrape a website using Python, specifically with the urllib2 and BeautifulSoup libraries. The provided code aims to retrieve data from a specific URL but encounters an error related to an 'AttributeError' indicating that a 'tuple' object has no attribute 'timeout'. This error arises during the execution of the urllib2.urlopen function. Participants suggest looking into previous threads and documentation for potential solutions, including handling timeouts and ensuring correct usage of the urlopen function. They reference links to relevant discussions and Python documentation to assist in resolving the issue.
Thejaswini
Messages
2
Reaction score
0
I am trying to scrape some website. And this the code i have
Python:
import urllib2
from bs4 import BeautifulSoup
from urllib import URLopener
from urllib import FancyURLopener
import traceback,sys

headers = {
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Origin': '[URL]http://www.nrega.nic.in/netnrega/home.aspx[/URL]',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko)  Chrome/24.0.1312.57 Safari/537.17',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Referer': '[URL]http://mnregaweb4.nic.in/netnrega/all_lvl_details_dashboard_new.aspx[/URL]',
    'Accept-Encoding': 'gzip,deflate,sdch',
    'Accept-Language': 'en-US,en;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
}
      
class MyOpener(FancyURLopener, object):
    version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'

myopener = MyOpener()
#urllib2.urlcleanup()
url = myopener.retrieve('[URL]http://mnregaweb4.nic.in/netnrega/all_lvl_details_dashboard_new.aspx[/URL]')
# first HTTP request without form data
f = urllib2.urlopen(url)
#traceback.print_exc()
def run_user_code(envdir):
    source = raw_input(">>> ")
try:
    exec source in envdir
except:
        print "Exception in user code:"
        print '-'*60
        traceback.print_exc(file=sys.stdout)
        print '-'*60

envdir = {}
while 1:
    run_user_code(envdir)
soup = BeautifulSoup(f)
# parse and retrieve two vital form values
viewstate = soup.findAll("input", {"type": "hidden", "name": "__VIEWSTATE"})
eventvalidation = soup.findAll("input", {"type": "hidden", "name": "__EVENTVALIDATION"})

print viewstate[0]['value']
formData = (
     ('__EVENTVALIDATION', eventvalidation),
    ('__VIEWSTATE', viewstate),
    ('__VIEWSTATEENCRYPTED',''),
    ('TextBox1', '106110006'),
    ('Button1', 'Show'),
)

encodedFields = urllib2.urlencode(formData)
# second HTTP request with form data
f = myopener.open(url, encodedFields)
try:
    # actually we'd better use BeautifulSoup once again to
    # retrieve results(instead of writing out the whole HTML file)
    # Besides, since the result is split into multipages,
    # we need send more HTTP requests
        fout = open('census.html','w')
except:
        print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()
But i m getting the error like
Code:
Traceback (most recent call last):
  File "C:\Python27\nerga - Copy.py", line 25, in <module>
    f = urllib2.urlopen(url)
  File "C:\Python27\lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 420, in open
    req.timeout = timeout
AttributeError: 'tuple' object has no attribute 'timeout'
Anyone can help in this

Thanks
Thejaswini
 
Last edited by a moderator:
Technology news on Phys.org
Thanks for your guidance but it's not working
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I have a quick questions. I am going through a book on C programming on my own. Afterwards, I plan to go through something call data structures and algorithms on my own also in C. I also need to learn C++, Matlab and for personal interest Haskell. For the two topic of data structures and algorithms, I understand there are standard ones across all programming languages. After learning it through C, what would be the biggest issue when trying to implement the same data...

Similar threads

Replies
4
Views
7K
Replies
17
Views
3K
Replies
11
Views
2K
Back
Top