Python Loging into a webpage using python.

  • Thread starter Thread starter dacruick
  • Start date Start date
  • Tags Tags
    Python
Click For Summary
The discussion centers on difficulties accessing data online due to issues with the authentication process. The user shares a Python script using the `cookielib` and `urllib` libraries to log into a website, but encounters a redirect back to the login page after attempting to access a final webpage. Key points include the identification of the username, password, and login button in the HTML code, and the possibility that the script is not correctly submitting the login form. The user notes that the website uses a POST request for authentication, while the script may be defaulting to GET requests, which could be causing the failure to authenticate. There is also a mention of ethical considerations regarding permission to access the site in this manner.
dacruick
Messages
1,038
Reaction score
1
Hi,

I am trying to access some data online, but I am having trouble getting past the actual authentication process.

Code:
import cookielib
import urllib, urllib2

if __name__ == '__main__':
    urlLogin = '[PLAIN]https://www.hobolink.com'[/PLAIN] 

    uid    = 'userid'
    password = 'xxxxxxx'

    fieldId   = 'username'
    fieldPass = 'password'
    
    ButtonId = 'submit'
    Button = 'Log in'

    cj = cookielib.CookieJar()
    data = urllib.urlencode({fieldId:uid, fieldPass:password, ButtonId:Button})

    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

    urllib2.install_opener(opener)
    usock = opener.open(urlLogin)
    usock = opener.open(urlLogin, data)
    #pageSource = usock.read()
    usock.close()
 
    usock = opener.open('FinalWebpage')
    pageSource = usock.read()
    usock.close()
    print(pageSource)

The HTML code which corresponds to the Username, Password, and login button are respectively as follows.

Code:
<input id="username" name="username" type="text">
Code:
<input id="password" name="password" type="password">
Code:
<li id="submit"><input class="button" name="commit" onclick="alertForExplorerBrowserVersion(7, 3);" type="submit" value="Log in"></li>

After I try to access the sought after link, it redirects me back to the authentication page. So two possible things are happening. The first one is, I am not entering anything into the username and password fields. The second possibility is that I am not "clicking" the log in button, but instead am trying to open a page that I am not authenticated to open.
 
Last edited by a moderator:
Technology news on Phys.org
One thing I notice is that "hobolink" (do you have their permission to use their site in this way..?) uses POST in their form, whereas urllib will produce GET queries.
 
Coin said:
One thing I notice is that "hobolink" (do you have their permission to use their site in this way..?) uses POST in their form, whereas urllib will produce GET queries.

hmm, I'm pretty new at this so I'm not sure exactly what the difference is, but that would be consistent with my results thus far :frown:
 
Learn If you want to write code for Python Machine learning, AI Statistics/data analysis Scientific research Web application servers Some microcontrollers JavaScript/Node JS/TypeScript Web sites Web application servers C# Games (Unity) Consumer applications (Windows) Business applications C++ Games (Unreal Engine) Operating systems, device drivers Microcontrollers/embedded systems Consumer applications (Linux) Some more tips: Do not learn C++ (or any other dialect of C) as a...

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K