Error of Execution While Trying to Scrape a Website

  • Python
  • Thread starter Thejaswini
  • Start date
  • Tags
    Error
In summary, an error of execution can occur while trying to scrape a website, which is the process of extracting data from a website for analysis or storage. This error may be caused by various factors such as incorrect coding, server issues, or website changes. To avoid this error, it is important to carefully review and test the code, regularly monitor the website for any changes, and use appropriate tools and techniques for scraping.
  • #1
Thejaswini
2
0
I am trying to scrape some website. And this the code i have
Python:
import urllib2
from bs4 import BeautifulSoup
from urllib import URLopener
from urllib import FancyURLopener
import traceback,sys

headers = {
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Origin': '[URL]http://www.nrega.nic.in/netnrega/home.aspx[/URL]',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko)  Chrome/24.0.1312.57 Safari/537.17',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Referer': '[URL]http://mnregaweb4.nic.in/netnrega/all_lvl_details_dashboard_new.aspx[/URL]',
    'Accept-Encoding': 'gzip,deflate,sdch',
    'Accept-Language': 'en-US,en;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
}
      
class MyOpener(FancyURLopener, object):
    version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'

myopener = MyOpener()
#urllib2.urlcleanup()
url = myopener.retrieve('[URL]http://mnregaweb4.nic.in/netnrega/all_lvl_details_dashboard_new.aspx[/URL]')
# first HTTP request without form data
f = urllib2.urlopen(url)
#traceback.print_exc()
def run_user_code(envdir):
    source = raw_input(">>> ")
try:
    exec source in envdir
except:
        print "Exception in user code:"
        print '-'*60
        traceback.print_exc(file=sys.stdout)
        print '-'*60

envdir = {}
while 1:
    run_user_code(envdir)
soup = BeautifulSoup(f)
# parse and retrieve two vital form values
viewstate = soup.findAll("input", {"type": "hidden", "name": "__VIEWSTATE"})
eventvalidation = soup.findAll("input", {"type": "hidden", "name": "__EVENTVALIDATION"})

print viewstate[0]['value']
formData = (
     ('__EVENTVALIDATION', eventvalidation),
    ('__VIEWSTATE', viewstate),
    ('__VIEWSTATEENCRYPTED',''),
    ('TextBox1', '106110006'),
    ('Button1', 'Show'),
)

encodedFields = urllib2.urlencode(formData)
# second HTTP request with form data
f = myopener.open(url, encodedFields)
try:
    # actually we'd better use BeautifulSoup once again to
    # retrieve results(instead of writing out the whole HTML file)
    # Besides, since the result is split into multipages,
    # we need send more HTTP requests
        fout = open('census.html','w')
except:
        print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()
But i m getting the error like
Code:
Traceback (most recent call last):
  File "C:\Python27\nerga - Copy.py", line 25, in <module>
    f = urllib2.urlopen(url)
  File "C:\Python27\lib\urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 420, in open
    req.timeout = timeout
AttributeError: 'tuple' object has no attribute 'timeout'
Anyone can help in this

Thanks
Thejaswini
 
Last edited by a moderator:
Technology news on Phys.org
  • #3
Thanks for your guidance but it's not working
 

1. What does it mean when there is an "error of execution" while trying to scrape a website?

An error of execution refers to a technical issue that occurs while attempting to scrape data from a website. This could be due to a variety of reasons such as server errors, coding errors, or website changes that affect the scraping process.

2. Can an error of execution be avoided when scraping a website?

While it is not always possible to avoid errors of execution, there are some steps that can be taken to minimize the risk. This includes thoroughly testing the scraping code, monitoring for changes to the website, and using error handling techniques to catch and address any issues that arise.

3. How can I troubleshoot an error of execution while scraping a website?

The first step in troubleshooting an error of execution is to identify the root cause. This can be done by checking for any coding errors, reviewing the scraping process, and monitoring the website for changes. If the issue persists, it may be necessary to consult with other experts or seek help from the website's owner.

4. Are there any legal implications of encountering an error of execution while scraping a website?

The legality of scraping a website can vary depending on the website's terms of use and the data being scraped. It is important to always obtain permission from the website owner before scraping and to comply with any legal restrictions or limitations.

5. How can I prevent errors of execution from affecting my data scraping process?

One way to prevent errors of execution is to regularly monitor the website for changes and update the scraping code accordingly. Additionally, using a robust error handling system and implementing automated testing and maintenance can help to prevent and address any issues that may arise during the scraping process.

Similar threads

  • Programming and Computer Science
Replies
9
Views
2K
  • Programming and Computer Science
Replies
4
Views
1K
  • Programming and Computer Science
Replies
4
Views
7K
  • Programming and Computer Science
Replies
17
Views
2K
  • Feedback and Announcements
Replies
0
Views
94K
Back
Top