Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Curious about PF robots

  1. Dec 30, 2016 #1


    Staff: Mentor

    The right column of the PF home page says "Robots 221, members 62, guests 1266" under "members online now". How interesting.

    Does a web crawler count as a robot?
    How do you detect robots as opposed to guests?
    What are those robots doing, and what motivates those who send them?
    Does PF send robots to monitor other sites?

    The number of guests is also remarkable. PF members need to be aware of that; especially with controversial dangerous threads. 95% of those viewing PF are silent and not identifiable.
  2. jcsd
  3. Dec 30, 2016 #2


    User Avatar

    Staff: Mentor

    What do you think we mentors are? Humans???
  4. Dec 30, 2016 #3


    User Avatar
    2017 Award

    Staff: Mentor

    Not Greg, but here are some answers:

    Web crawlers count as robots, and they identify themselves as robot (otherwise they are counted as guests). They are crawling the forums for search engines and similar tools.
    PF doesn't operate search engines (outside the forums) or anything like that, no need to crawl other websites.

    Most visitors are guests, yes. Most of them come from search engines.
  5. Dec 30, 2016 #4


    User Avatar

    Staff: Mentor

    Polite robots identify themselves in the User-Agent field of their requests to web servers. For example, here are some requests for the home page of my web site, from my server log:
    Code (Text): - - [27/Dec/2016:17:44:52 -0500] "GET / HTTP/1.1" 200 4442 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +[PLAIN]http://yandex.com/bots)"[/PLAIN] [Broken] - - [27/Dec/2016:18:14:05 -0500] "GET / HTTP/1.1" 200 4442 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +[PLAIN]http://www.bing.com/bingbot.htm)"[/PLAIN] [Broken] - - [27/Dec/2016:20:27:54 -0500] "GET / HTTP/1.1" 200 4442 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +[PLAIN]http://www.google.com/bot.html)"[/PLAIN] [Broken] - - [27/Dec/2016:22:47:29 -0500] "GET / HTTP/1.1" 304 - "-" "Mozilla/5.0 (compatible; SeznamBot/3.2; +[PLAIN]http://napoveda.seznam.cz/en/seznambot-intro/)"[/PLAIN] [Broken]
    The ones shown above are for search engines. There are also companies that crawl the web and collect statistics that they sell to website owners, e.g. statistics about who links to your site, and what your site links to. There is at least one site (archive.org) that crawls the web in order to maintain a historical archive of the web, where you can look up what a website looked like in the past.
    Last edited by a moderator: May 8, 2017
  6. Dec 31, 2016 #5


    User Avatar
    Staff Emeritus
    Science Advisor

    I'm one-quarter lawn gnome on my mother's side.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted