Lucene Indexing: Clearing Confusion

  • Context: MHB 
  • Thread starter Thread starter shivajikobardan
  • Start date Start date
  • Tags Tags
    Confusion
Click For Summary
SUMMARY

The discussion clarifies the indexing process in Apache Lucene, specifically addressing confusion around the first and third steps of the indexing architecture. The IndexWriter is a crucial component that facilitates the addition, updating, and deletion of documents within the Lucene index. The architecture diagram presented serves to illustrate the roles of various components in the indexing process, rather than depicting a sequential flow. Understanding these steps is essential for effectively utilizing Lucene for document indexing and search functionalities.

PREREQUISITES
  • Familiarity with Apache Lucene 8.0 indexing concepts
  • Understanding of the role of IndexWriter in document management
  • Basic knowledge of indexing architecture and its components
  • Experience with document parsing and lexing techniques
NEXT STEPS
  • Study the Apache Lucene IndexWriter API documentation
  • Learn about the document parsing process in Lucene
  • Explore the architecture of Lucene indexing in detail
  • Investigate best practices for managing indexes in Lucene
USEFUL FOR

Software developers, data engineers, and anyone involved in search engine optimization or document indexing using Apache Lucene will benefit from this discussion.

shivajikobardan
Messages
637
Reaction score
54
https://lh5.googleusercontent.com/47guV-L3yY2ZevuNEwk1wC9t9rJjQw0bXNHug16ah2EQ2XyLTAzqrBZcDMEwzFSd1mR_jFDTOFyG1GVHT8p1G4tPPRkRtqtOcOGXTb3UrildRHMayRznHaFQD9RdCdjeuEvyM-FkvQ_U3GLBHGVgkFY
These are the steps of indexing in Lucene given in our syllabus-:
https://lh4.googleusercontent.com/Qliby1unxHTU0vAwycWZzy563XdxcwUT4UAyA6Xf1ydKQAwSfKwqexdDFFc0CBZb9kSSvRKXEFoyKQ4cYn9K2EgEDRnWTiYFCDlqZ4VCAw9CWvgvcI9cOJo055PCJhyFTBJckNhtLi-eAMwM7q8JoUU

https://lh3.googleusercontent.com/X9geIBuCERbSadCkshBekbjvl4GAqGvHppgCayGcOBvaIkpIX1Jy5jyFSmmp39ANIyg3cq0tYWrYTxl1RNlOUfbHFAcNy5CJLxfWGve6DpjeXXekNwTl3P64zQ1_6dojvdo4-Z8aTFn-EZ51CqOlLfE
I understand the second step clearly. But I don't understand the first and third step. It's not mentioned clearly in this figure imo. Can you clear my confusion? Plus the sources that I refer don't even mention it like this, they explain it differently. I'm not sure from where this is copied from.
What are we doing in first vs third step as written in that figure text?
Why was indexwriter created first and not used later? Because according to my information that I've collected, you can also use indexwriter to add/remove/update indexes.

I've a good feeling that all of this information is incorrect but this is what's written in my teacher's notes so I'm not 100% sure of it. And even if it's wrong, they'd expect us to write the same thing in exam, so I've to learn it.
 
Technology news on Phys.org
shivajikobardan said:
But I don't understand the first and third step.
Are you talking about the first 3 steps? The "build index" step and "IndexWriter object" step?

The top diagram is simply a architecture diagram, not a flow diagram. It's show the role in the search/index process that Lucene fills.

The 3 steps at the top, with the IndexWriter, represent the lower let portion of that architecture. This is where the documents are being incorporated in the Lucene index, so that they can alter be searched (that's the upper right portion of the diagram).

The IndexWriter is simply the mechanic to route documents into the index. The IndexWriter handles the lexing and parsing of the documents to prepare them for the actual index. And, yes, the IndexWriter can also add/update/delete indexes.

Let me know if you have other questions, I know this response is a bit late. So it may no longer be timely.
 

Similar threads

Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 2 ·
Replies
2
Views
989
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 7 ·
Replies
7
Views
2K
Replies
1
Views
2K
Replies
6
Views
2K