Over 3,000 guests and 100 members

  • Thread starter Thread starter flyingpig
  • Start date Start date
AI Thread Summary
The discussion highlights the disparity between the high number of guests viewing the forum and the low membership count, with only 100 active members. A significant portion of the guest traffic is attributed to web crawlers or "spiders," which are automated programs that index web content for search engines. These spiders can originate from various sources, including commercial entities and educational projects, and often appear as regular visitors. The conversation also touches on the role of web crawlers in maintaining up-to-date data for search engines and the importance of PageRank in determining the relevance of web pages. Overall, the presence of these crawlers complicates the understanding of genuine user engagement on the forum.
flyingpig
Messages
2,574
Reaction score
1
WHat the heck? There are 3000 people viewing us and not even a tenth of those people are members. We have 100 members active. What are those 3000 people waiting for!??
 
Physics news on Phys.org
A lot of them are spiders. Ever since Google changed the way search engines rank web sites the web has been crawling with little critters. Some of these spiders emanate from commercial engines, some are written by students just to learn how to create a web crawler, some are commercial entities crawling the web for their own nefarious purposes. Some people who have strong interest in some topic will write their own crawlers rather than relying on search engines.
 
D H said:
A lot of them are spiders. Ever since Google changed the way search engines rank web sites the web has been crawling with little critters. Some of these spiders emanate from commercial engines, some are written by students just to learn how to create a web crawler, some are commercial entities crawling the web for their own nefarious purposes. Some people who have strong interest in some topic will write their own crawlers rather than relying on search engines.
These all show up as "guests" which are supposed to be real people. Unless the reporting has changed, the spiders show up as such and will say "yahoo" "google" etc...

It does seem odd that lately I haven't seen any spiders.
 
PF also has a lot of solved problems in his database. So some people just google a phrase and look at the solved problem at PF. I don't really like this, but I guess that should count for some visitors.
 
Most are real guests. They pop in from random google searches.
 
What are the spiders u guys referring to? I don't want to be in the dark
 
flyingpig said:
What's the purpose of crawling?

Read the wiki.

any sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for sending spam).
 
  • #10
flyingpig said:
What's the purpose of crawling?
To find web pages. Crawling took on much greater importance with PageRank (google that term), where the "Page" in PageRank stands for Larry Page, not web page. Page's PhD thesis was about a new kind of search engine that used a technique similar to how librarians and academicians decide which are the most important journal papers. It's a fairly simple concept: Count the number of times a paper is referenced by some other paper.

Now think of the web. Suppose you are the author of a sports blog and you write an article about AC Milan (I'm watching AC Milan v Barcelona right now). In this article you happen link to AC Milan's home page. Lots of other people will do the same, in various contexts. In fact, if you want to know about AC Milan the best place to go is to AC Milan's home page. Because so many people link to this page in reference to discussions of AC Milan, PageRank will quickly find that this is the place to go for info on AC Milan.

This is part of the reason why you rarely need to go to page 13 when you do a google search. Google's goal is to make the page that you think is the best source of information on topic X is the very first page they list in their search (the very first page they list after the paid ads that relate to topic X, that is).
 
Back
Top