Tool to read contents of a web page

  • Thread starter Thread starter sganesh88
  • Start date Start date
  • Tags Tags
    Web
Click For Summary

Discussion Overview

The discussion revolves around finding a tool or method to automate the retrieval of specific data from a web page, particularly data displayed in a table format related to server memory usage. Participants explore various programming languages and tools that could facilitate this task.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant inquires about tools for reading HTML pages to automate data retrieval for calculations.
  • Another suggests using Perl or JavaScript, noting that the desired data may be hidden within HTML formatting tags.
  • A different participant proposes using Python for parsing the HTML, mentioning that simpler tools like grep, sed, or awk could also be effective if the layout is straightforward.
  • One response recommends using "wget" to download the HTML page before applying other suggested methods.
  • A participant shares a link to a Python article that discusses using urllib2 for web data retrieval.
  • Another participant mentions a PHP function that can read HTML line-by-line and suggests using a refresh script for periodic data retrieval.

Areas of Agreement / Disagreement

Participants present multiple approaches and tools without reaching a consensus on a single solution. Various programming languages and methods are proposed, indicating a range of opinions on the best approach.

Contextual Notes

Some suggestions depend on the specific structure of the HTML page, and the effectiveness of the proposed methods may vary based on the complexity of the data layout.

sganesh88
Messages
286
Reaction score
0
I need a tool which can read some specific data from a website.
Based on the data retrived i will make some calculations.
Is there a tool that can read a HTML page and retrive some values displayed in it?
The data are shown in table format in that website.
This website is probe which displays some memory usage of a server.
We take some statistics from this page and make calculations like taking average etc.
This task is now being done manually for evry 30 minutes.
We need to automate it.
 
Computer science news on Phys.org
I'm not familiar offhand with any programs that are specifically made to do this. Usually Perl or Javascript get put into service for this kind of thing, especially since the desired data is well hidden in the actual page formatting tags. You might search for HTML parsers to see if there are any helper functions you could use.
 
You can parse the html page using something like python. If it's a reasonably straightforward layout then you can use even simpler tools like grep, sed or awk.

Just have a look at the html page in a text editor, see if the location of the table is easy to locate (near its title for instance), then write a small script to retrieve the required values.

If you're on windows then I suppose you could use VisualBasic or even a VBA macro in Office, but it's easier in linux/unix.
 
You can use "wget" to get the html page.
wget http://thepage.com
After that, follow the usualname's sugestions.
 
There is a PHP function which can read the contents of a pages' HTML and put each line into an array. You could read the HTML line-by-line searching for keywords. It could all be done in PHP and use a refresh script to perform it at each interval.

There are other options as well.
 

Similar threads

Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 22 ·
Replies
22
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 15 ·
Replies
15
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 10 ·
Replies
10
Views
11K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 15 ·
Replies
15
Views
4K
  • Sticky
  • · Replies 0 ·
Replies
0
Views
5K