Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Tool to read contents of a web page

  1. Jul 7, 2010 #1
    I need a tool which can read some specific data from a website.
    Based on the data retrived i will make some calculations.
    Is there a tool that can read a HTML page and retrive some values displayed in it?
    The data are shown in table format in that website.
    This website is probe which displays some memory usage of a server.
    We take some statistics from this page and make calculations like taking average etc.
    This task is now being done manually for evry 30 minutes.
    We need to automate it.
  2. jcsd
  3. Jul 7, 2010 #2
    I'm not familiar offhand with any programs that are specifically made to do this. Usually Perl or Javascript get put into service for this kind of thing, especially since the desired data is well hidden in the actual page formatting tags. You might search for HTML parsers to see if there are any helper functions you could use.
  4. Jul 8, 2010 #3
    You can parse the html page using something like python. If it's a reasonably straightforward layout then you can use even simpler tools like grep, sed or awk.

    Just have a look at the html page in a text editor, see if the location of the table is easy to locate (near its title for instance), then write a small script to retrieve the required values.

    If you're on windows then I suppose you could use VisualBasic or even a VBA macro in Office, but it's easier in linux/unix.
  5. Jul 8, 2010 #4
    You can use "wget" to get the html page.
    wget http://thepage.com
    After that, follow the usualname's sugestions.
  6. Jul 14, 2010 #5
  7. Jul 14, 2010 #6


    User Avatar
    Science Advisor

    There is a PHP function which can read the contents of a pages' HTML and put each line into an array. You could read the HTML line-by-line searching for keywords. It could all be done in PHP and use a refresh script to perform it at each interval.

    There are other options as well.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook