How to write a program to retrieve Web data

  • Thread starter Thread starter Saladsamurai
  • Start date Start date
  • Tags Tags
    Data Program Web
Click For Summary

Discussion Overview

The discussion revolves around developing a program to retrieve data from a website, specifically focusing on using Python for web scraping. Participants share their experiences and suggest tools and libraries that can facilitate this task.

Discussion Character

  • Exploratory, Technical explanation, Conceptual clarification

Main Points Raised

  • One participant expresses a need for a tool to automate data retrieval from a poorly designed website and seeks guidance on using Python.
  • Another participant mentions that most programming languages, including Python, have tools for reading HTML content from web pages.
  • A participant shares their experience using Beautiful Soup and Minidom for parsing the DOM of web pages, noting the need to adjust the code if the webpage structure changes.
  • Another participant agrees on Python's simplicity for this task and mentions having written a parser without additional libraries, though they cannot provide details at the moment.
  • A later reply humorously summarizes the discussion by referring to "Parsing and python" as the two key elements for their upcoming work.

Areas of Agreement / Disagreement

Participants generally agree that Python is a suitable language for web scraping and that various libraries can assist in the process. However, there are no explicit resolutions regarding the best approach or tools to use.

Contextual Notes

Some limitations include potential changes in webpage structure that may require adjustments to the code, and the varying levels of experience among participants with different libraries and methods.

Who May Find This Useful

Individuals interested in web scraping, particularly those with basic programming skills looking to automate data retrieval from websites.

Saladsamurai
Messages
3,009
Reaction score
7
Hello! :smile:

So I have some basic programming skills, but I have never done anything that interacts with the web. Here at work, we have a website that we have to go to in order to check the statuses of all of the jobs we have open. The website is awful in that you cannot run a report on all of the jobs at once. I want to develop a tool that goes to the website, loops through all of the jobs, and pulls the necessary data.

I just need a starting point for now. Is this something I can do with Python? Any thoughts are helpful.

Thanks!
 
Technology news on Phys.org
Basically every programming language has some tool to read HTML content of web pages. You can pick your favorite one.
It looks easy with python.
 
I've used Beautiful Soup, and Minidom (both Python libraries) to do this. What you are really doing is parsing the DOM of the webpage you want. You look for the information located in some div by traversing the XML structure, and extract it. If the webpage changes structure, you might have to recode your solution, but it's pretty easy programming.
 
I agree python is a simple way to do it. I wrote similar parser without using any additional libraries but those that installed automatically (and with some ancient 2.x python version). But I don't have access to the code ATM so I can't tell you details.
 
Ok thanks guys! Parsing and python... The two P's... I'll tell my boss I'll be PP'ing all next week!
 

Similar threads

  • · Replies 43 ·
2
Replies
43
Views
7K
  • · Replies 15 ·
Replies
15
Views
3K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 8 ·
Replies
8
Views
1K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 22 ·
Replies
22
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 12 ·
Replies
12
Views
2K