Error of Execution While Trying to Scrape a Website

Thejaswini · Sep 8, 2016

I am trying to scrape some website. And this the code i have

Python:

import urllib2
from bs4 import BeautifulSoup
from urllib import URLopener
from urllib import FancyURLopener
import traceback,sys

headers = {
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Origin': '[URL]http://www.nrega.nic.in/netnrega/home.aspx[/URL]',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko)  Chrome/24.0.1312.57 Safari/537.17',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Referer': '[URL]http://mnregaweb4.nic.in/netnrega/all_lvl_details_dashboard_new.aspx[/URL]',
    'Accept-Encoding': 'gzip,deflate,sdch',
    'Accept-Language': 'en-US,en;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
}
      
class MyOpener(FancyURLopener, object):
    version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'

myopener = MyOpener()
#urllib2.urlcleanup()
url = myopener.retrieve('[URL]http://mnregaweb4.nic.in/netnrega/all_lvl_details_dashboard_new.aspx[/URL]')
# first HTTP request without form data
f = urllib2.urlopen(url)
#traceback.print_exc()
def run_user_code(envdir):
    source = raw_input(">>> ")
try:
    exec source in envdir
except:
        print "Exception in user code:"
        print '-'*60
        traceback.print_exc(file=sys.stdout)
        print '-'*60

envdir = {}
while 1:
    run_user_code(envdir)
soup = BeautifulSoup(f)
# parse and retrieve two vital form values
viewstate = soup.findAll("input", {"type": "hidden", "name": "__VIEWSTATE"})
eventvalidation = soup.findAll("input", {"type": "hidden", "name": "__EVENTVALIDATION"})

print viewstate[0]['value']
formData = (
     ('__EVENTVALIDATION', eventvalidation),
    ('__VIEWSTATE', viewstate),
    ('__VIEWSTATEENCRYPTED',''),
    ('TextBox1', '106110006'),
    ('Button1', 'Show'),
)

encodedFields = urllib2.urlencode(formData)
# second HTTP request with form data
f = myopener.open(url, encodedFields)
try:
    # actually we'd better use BeautifulSoup once again to
    # retrieve results(instead of writing out the whole HTML file)
    # Besides, since the result is split into multipages,
    # we need send more HTTP requests
        fout = open('census.html','w')
except:
        print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()

But i m getting the error like

Code:

Traceback (most recent call last):
  File "C:\Python27\nerga - Copy.py", line 25, in <module>
    f = urllib2.urlopen(url)
  File "C:\Python27\lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 420, in open
    req.timeout = timeout
AttributeError: 'tuple' object has no attribute 'timeout'

Anyone can help in this

Thanks
Thejaswini

jedishrfu · Sep 8, 2016

You might find somethings here in an earlier thread:

https://www.physicsforums.com/threads/python-web-scraper.745608/

Thejaswini · Sep 12, 2016

Thanks for your guidance but it's not working

jedishrfu · Sep 12, 2016

This one shows the third timeout argument on the urlopen:

https://docs.python.org/3.1/library/urllib.request.html

and here's a discussion on catching timeouts:

http://stackoverflow.com/questions/2712524/handling-urllib2s-timeout-python

Error of Execution While Trying to Scrape a Website

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Use of AI (ML/DL) in Science

Other than just FizzBuzz to test programmer candidates

Sweetspot of data compression

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect