How to write a program to retrieve Web data

In summary, the conversation discusses using Python to develop a tool that can retrieve data from a website. Participants suggest using libraries such as Beautiful Soup or Minidom to parse the DOM of the webpage and extract the necessary information. The simplicity of Python makes it a suitable choice for this task.
  • #1
Saladsamurai
3,020
7
Hello! :smile:

So I have some basic programming skills, but I have never done anything that interacts with the web. Here at work, we have a website that we have to go to in order to check the statuses of all of the jobs we have open. The website is awful in that you cannot run a report on all of the jobs at once. I want to develop a tool that goes to the website, loops through all of the jobs, and pulls the necessary data.

I just need a starting point for now. Is this something I can do with Python? Any thoughts are helpful.

Thanks!
 
Technology news on Phys.org
  • #2
Basically every programming language has some tool to read HTML content of web pages. You can pick your favorite one.
It looks easy with python.
 
  • #3
I've used Beautiful Soup, and Minidom (both Python libraries) to do this. What you are really doing is parsing the DOM of the webpage you want. You look for the information located in some div by traversing the XML structure, and extract it. If the webpage changes structure, you might have to recode your solution, but it's pretty easy programming.
 
  • #4
I agree python is a simple way to do it. I wrote similar parser without using any additional libraries but those that installed automatically (and with some ancient 2.x python version). But I don't have access to the code ATM so I can't tell you details.
 
  • #5
Ok thanks guys! Parsing and python... The two P's... I'll tell my boss I'll be PP'ing all next week!
 

1. How do I write a program to retrieve Web data?

To write a program to retrieve Web data, you will need to use a programming language such as Python, Java, or JavaScript. You will also need to have a basic understanding of HTML, CSS, and web development principles. There are many tutorials and resources available online to guide you through the process.

2. What is Web scraping and how does it relate to retrieving Web data?

Web scraping is the process of extracting data from websites. It involves using a program or script to automatically access and grab information from web pages. Web scraping is often used as a method for retrieving Web data, as it allows for large amounts of data to be collected quickly and efficiently.

3. Can I retrieve data from any website with my program?

It is possible to retrieve data from most websites, but it is important to check the website's terms of service and robots.txt file before doing so. Some websites may have restrictions or explicitly prohibit web scraping. It is always best to obtain permission from the website owner before retrieving data.

4. How can I ensure the accuracy and legality of the retrieved Web data?

One way to ensure the accuracy and legality of the retrieved Web data is by using proper data validation and verification techniques. This involves checking the data for errors and inconsistencies, as well as verifying its source and ensuring that it does not violate any copyright or privacy laws.

5. Are there any tools or libraries that can assist with retrieving Web data?

Yes, there are many tools and libraries available that can assist with retrieving Web data. Some popular ones include BeautifulSoup, Scrapy, and Selenium for web scraping, and Requests and urllib for making HTTP requests. It is important to research and choose the best tool for your specific needs and programming language.

Similar threads

  • Programming and Computer Science
Replies
1
Views
697
  • Programming and Computer Science
Replies
15
Views
1K
  • Programming and Computer Science
Replies
17
Views
1K
  • Programming and Computer Science
Replies
1
Views
137
  • Programming and Computer Science
Replies
8
Views
228
  • Programming and Computer Science
Replies
10
Views
1K
  • Programming and Computer Science
Replies
6
Views
1K
  • Programming and Computer Science
Replies
22
Views
852
  • Programming and Computer Science
Replies
7
Views
1K
  • Programming and Computer Science
Replies
13
Views
1K
Back
Top