I’ve playing around with Lucene.NET and trying to get a feeling of what was required to develop and implement a full business application using it.
As you would imagine, many things are required for you to implement a robust solution for indexing content and searching it afterwards.
Lucene is a great and robust solution for indexing content. It offers fast and performance enhanced search engine library available in Java and .NET.
You will want to use this library in many particular scenarios:
- In Windows Azure, to support Full Text Search (a functionality not currently supported by SQL Azure)
- When storing files outside or not managed by your database (like in large document storage solutions that uses File System)
- When Full Text Search is not really what you need
Lucene is more than a Full Text Search solution. It has several analyzers that let you process and search content in different ways (decomposing sentences, deriving words, removing articles, etc.).
When deciding to implement indexing using Lucene, you will need to take into account the following:
- How content is to be indexed by Lucene and when.
When content is to available for searching / Availability of indexed content (as in real time content search)
- Using a service that runs after a specific interval
- Immediately when content changes
Ease of maintainability and development
- Immediately when content changes = near real time searching
- After a few minutes..
Some Technical Concerns..
When indexing content, indexes are locked for writing operations by the Index Writer. This means that Lucene is best designed to index content using single writer approach.
When searching, Index Readers take a snapshot of indexes. This has the following implications:
- Setting up an index reader is a costly task. Your are not supposed to create one for each query or search. A good practice is to create readers and reuse them for several searches.
- The latter means that even when the content gets updated, you wont be able to see the changes. You will need to recycle the reader.
In the second part of this post we will review some alternatives and design considerations.