Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

How to make a search engine by Java ?

  1. May 1, 2013 #1
    want some sites to teach me how to develop a simple web-based search engine that demonstrates the main features of a search engine (web crawling, indexing and ranking) and the interaction between them.
    Using Java :confused:
     
  2. jcsd
  3. May 1, 2013 #2
    I'm going to suggest that you take the CS101 course over at Udacity because it will teach you exactly what you want to know, how to build a search engine. They teach it using Python, but the code is not complex and you could easily adapt it to Java.

    One thing to be aware of. At the end of the course, you don't so much have a working search engine, as you have all the components that are required. The reason they don't give you a working program is because a search engine involves a fair amount of web etiquette - meaning you have the power to hit web servers with thousands upon thousands of requests, and before you unleash yours onto the world, you want to make sure that you are acting in a courteous manner. Particularly in the testing phase.

    The actual code for web-crawling and indexing involves making a request to some seed page, getting the HTML back, parsing the HTML for links, and then recursively following those links and parsing the new HTML for more links, until you run out of room in your index, or the links stop.

    Ranking can be done in many ways. Google's algorithm is fairly well documented around the web. It basically says, for any page, the rank is a measure of how many other pages link to this page, and the rank of those other pages. A high ranked page linking to your page, increases your rank by a larger factor than a low ranked page linking to your page.
     
  4. May 2, 2013 #3
    thank you :smile:
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook