Java How to make a search engine by Java ?

AI Thread Summary
To develop a simple web-based search engine focusing on web crawling, indexing, and ranking using Java, it's recommended to take the CS101 course on Udacity. This course covers the essential components needed to build a search engine, although it uses Python, the concepts can be easily adapted to Java. It's important to note that the course does not provide a complete working search engine due to the ethical considerations of web crawling, which requires careful handling of server requests to avoid overwhelming them. The process of web crawling involves making requests to seed pages, parsing HTML for links, and recursively following those links. For indexing and ranking, understanding Google's algorithm is beneficial, as it emphasizes the importance of backlinks and the quality of those links in determining page rank.
Todee
Messages
6
Reaction score
0
want some sites to teach me how to develop a simple web-based search engine that demonstrates the main features of a search engine (web crawling, indexing and ranking) and the interaction between them.
Using Java :confused:
 
Technology news on Phys.org
I'm going to suggest that you take the CS101 course over at Udacity because it will teach you exactly what you want to know, how to build a search engine. They teach it using Python, but the code is not complex and you could easily adapt it to Java.

One thing to be aware of. At the end of the course, you don't so much have a working search engine, as you have all the components that are required. The reason they don't give you a working program is because a search engine involves a fair amount of web etiquette - meaning you have the power to hit web servers with thousands upon thousands of requests, and before you unleash yours onto the world, you want to make sure that you are acting in a courteous manner. Particularly in the testing phase.

The actual code for web-crawling and indexing involves making a request to some seed page, getting the HTML back, parsing the HTML for links, and then recursively following those links and parsing the new HTML for more links, until you run out of room in your index, or the links stop.

Ranking can be done in many ways. Google's algorithm is fairly well documented around the web. It basically says, for any page, the rank is a measure of how many other pages link to this page, and the rank of those other pages. A high ranked page linking to your page, increases your rank by a larger factor than a low ranked page linking to your page.
 
thank you :smile:
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I had a Microsoft Technical interview this past Friday, the question I was asked was this : How do you find the middle value for a dataset that is too big to fit in RAM? I was not able to figure this out during the interview, but I have been look in this all weekend and I read something online that said it can be done at O(N) using something called the counting sort histogram algorithm ( I did not learn that in my advanced data structures and algorithms class). I have watched some youtube...
Back
Top