Creating index vs adding document to index differences

  • Context: Comp Sci 
  • Thread starter Thread starter shivajikobardan
  • Start date Start date
  • Tags Tags
    Index
Click For Summary
SUMMARY

The discussion clarifies the distinction between creating an index and adding a document to an index in Apache Lucene. Creating an index involves collecting and analyzing words from documents, resulting in an inverted index. In contrast, adding a document to an index is a separate step that occurs after the index has been created. The process is automated behind the scenes, and users are encouraged to focus on utilizing the Lucene API for practical applications.

PREREQUISITES
  • Understanding of Apache Lucene 9.2.0 indexing concepts
  • Familiarity with document analysis and tokenization
  • Knowledge of inverted indexing techniques
  • Basic proficiency in using APIs
NEXT STEPS
  • Explore the Apache Lucene API documentation for version 9.2.0
  • Learn about document analysis and tokenization in Lucene
  • Study the process of creating inverted indexes in Lucene
  • Investigate best practices for optimizing indexing performance in Lucene
USEFUL FOR

Developers, data engineers, and anyone interested in implementing search functionality using Apache Lucene will benefit from this discussion.

shivajikobardan
Messages
637
Reaction score
54
Homework Statement
Lucene indexing
Relevant Equations
none
yl9HczXxZegv04i7pfmBTYHJ9fKg1hgZs5YhCxiNwk4MuaW7fM.png

These are the steps of indexing in Lucene given in our syllabus-:
THSmRNHvZg5Nl4S1QLMbKjyOa5yolaAfGBIQNRDY6hhZ8VpfIQ.png

HnTlCNGxuYfFumBGzDVwfU_zxb9Ht5o6ZQKHuZLmN3tiJa04kY.png

The first step says that it is creating an index whereas the last step says that it's adding document to index.
What's the difference between these two? Can I get an example.

Here's what I think it should happen-:
1) Collect all words from each documents. Lists it like-;

doc1=>word1,word2,WORD3….wordn
doc2=>word1,WORD2,word3….wordn
And so on.

2) Analyse the words and remove various types of words as per analyzer, process them as per analyzer.

Say now what remains is-:
doc1=>word1,word3,...word(n-1)
doc2=>word2,...word(n-3)

3) Done. Now you can make inverted index as well by converting this to inverted index.

But it's done bit differently, which I'm not 100% clear about.
 
Physics news on Phys.org
shivajikobardan said:
whereas the last step says that it's adding document to index.
No it doesn't, what you are calling the "last" step simply creates a document; adding it to the index is another step.

shivajikobardan said:
What's the difference between these two? Can I get an example.
Can you get an example of the difference between creating a thing and adding something to that thing? Are you serious?

shivajikobardan said:
Here's what I think it should happen-:
...
This is all done behind the scenes, you don't have to worry about any of this to use Lucene, you just need to learn how to use the API. A good place to learn that is the API documentation itself: https://lucene.apache.org/core/9_2_0/core/index.html
 
  • Like
Likes   Reactions: jim mcnamara

Similar threads

  • · Replies 8 ·
Replies
8
Views
1K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 67 ·
3
Replies
67
Views
16K
  • · Replies 3 ·
Replies
3
Views
10K