Loging into a webpage using python.

  • Context: Python 
  • Thread starter Thread starter dacruick
  • Start date Start date
  • Tags Tags
    Python
Click For Summary
SUMMARY

This discussion focuses on logging into a webpage using Python, specifically with the `urllib` and `cookielib` libraries. The user attempts to authenticate on the website 'https://www.hobolink.com' using a POST request, but encounters redirection back to the login page. Key issues identified include the need to ensure that the username and password fields are populated correctly and that the login button is properly activated. The user also notes that `urllib` defaults to GET requests, which may conflict with the POST method required by the website's authentication process.

PREREQUISITES
  • Understanding of Python programming
  • Familiarity with HTTP methods, specifically GET and POST
  • Knowledge of handling cookies in Python using `cookielib`
  • Basic understanding of web scraping techniques
NEXT STEPS
  • Explore the `requests` library for more efficient HTTP handling in Python
  • Learn about session management in web scraping to maintain authentication
  • Investigate the differences between GET and POST requests in web development
  • Research best practices for ethical web scraping and obtaining permissions
USEFUL FOR

Web developers, data analysts, and anyone interested in automating web interactions or scraping data from authenticated web pages.

dacruick
Messages
1,038
Reaction score
1
Hi,

I am trying to access some data online, but I am having trouble getting past the actual authentication process.

Code:
import cookielib
import urllib, urllib2

if __name__ == '__main__':
    urlLogin = '[PLAIN]https://www.hobolink.com'[/PLAIN] 

    uid    = 'userid'
    password = 'xxxxxxx'

    fieldId   = 'username'
    fieldPass = 'password'
    
    ButtonId = 'submit'
    Button = 'Log in'

    cj = cookielib.CookieJar()
    data = urllib.urlencode({fieldId:uid, fieldPass:password, ButtonId:Button})

    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

    urllib2.install_opener(opener)
    usock = opener.open(urlLogin)
    usock = opener.open(urlLogin, data)
    #pageSource = usock.read()
    usock.close()
 
    usock = opener.open('FinalWebpage')
    pageSource = usock.read()
    usock.close()
    print(pageSource)

The HTML code which corresponds to the Username, Password, and login button are respectively as follows.

Code:
<input id="username" name="username" type="text">
Code:
<input id="password" name="password" type="password">
Code:
<li id="submit"><input class="button" name="commit" onclick="alertForExplorerBrowserVersion(7, 3);" type="submit" value="Log in"></li>

After I try to access the sought after link, it redirects me back to the authentication page. So two possible things are happening. The first one is, I am not entering anything into the username and password fields. The second possibility is that I am not "clicking" the log in button, but instead am trying to open a page that I am not authenticated to open.
 
Last edited by a moderator:
Technology news on Phys.org
One thing I notice is that "hobolink" (do you have their permission to use their site in this way..?) uses POST in their form, whereas urllib will produce GET queries.
 
Coin said:
One thing I notice is that "hobolink" (do you have their permission to use their site in this way..?) uses POST in their form, whereas urllib will produce GET queries.

hmm, I'm pretty new at this so I'm not sure exactly what the difference is, but that would be consistent with my results thus far :frown:
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K