Using Lucene.NET search engine library in .NET applications

Friday, September 2, 2011

.NET Search

Adding search capabilities to applications is something that users often ask. Sometimes it is not enough to have just filters on lists. Instead of going with mainstream and write complex SQL queries we can be smarter and use specialized indexing and search solutions that perform way better than custom large SQL queries that I consider as anti-pattern for searching something. In this posting I will introduce you Lucene.NET indexing and search engine and show you how to use it in your applications.

NB! I just started playing with Lucene.NET and this is just my quick introduction about how to get it work and start discovering it. Code examples here maybe not perfect ones but they help you to get started and I am sure that later I am able to write more effective code for Lucene.NET.

What is Lucene.NET?

Lucene.NET is indexing and search server ported from famous Lucene that is developed for Java platform. From Lucene.NET project page we can read that Lucene.NET has the following goals:

Maintain the existing line-by-line port from Java to C#, fully automating and commoditizing the process such that the project can easily synchronize with the Java Lucene release schedule.
Maintaining the high-performance requirements excepted of a first class C# search engine library.
Maximize usability and power when used within the .NET runtime. To that end, it will present a highly idiomatic, carefully tailored API that takes advantage of many of the special features of the .NET runtime.

To add search capabilities to your application you can take Lucene.NET because it performs way better than all those awful custom mega-big-and-smart search queries that sooner or later will kill your server.

Adding documents to Lucene.NET index

By its nature Lucene.NET let’s you define loose and structured documents. Documents have properties that you can freely define. These properties may have values and Lucene.NET is able to index them. These properties and their values are used when user searches Lucene.NET index.

Before we can search something we need to have at least one document in Lucene.NET index. Here’s how to add document to index.

private static void WriteDocument()

{

    Directory directory = FSDirectory.Open(
                             new DirectoryInfo("LuceneIndex")
                          );

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);

var writer = new IndexWriter(directory, analyzer,
IndexWriter.MaxFieldLength.UNLIMITED);

var doc = new Document();

doc.Add(new Field("id", "1", Field.Store.YES, Field.Index.NO));

doc.Add(new Field("postBody", "Lorem ipsum", Field.Store.YES,
Field.Index.ANALYZED));

writer.AddDocument(doc);

writer.Optimize();

writer.Commit();

writer.Close();

}

After calling this method we have new document in Lucene.NET index. This document has ID with value 1 and property postBody with value “Lorem ipsum”. As we can see the ID property is not indexed and we don’t expect somebody to search documents by ID.

Searching documents

Now we can write method that searches documents by given phrase in document body.

private static void SearchSomething()

{

    Directory directory = FSDirectory.Open(
                             new DirectoryInfo("LuceneIndex")
                          );

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);

var parser = new QueryParser(Version.LUCENE_29, "postBody", analyzer);

Query query = parser.Parse("lorem*");

var searcher = new IndexSearcher(directory, true);

TopDocs topDocs = searcher.Search(query, 10);

int results = topDocs.scoreDocs.Length;

Console.WriteLine("Found {0} results", results);

for (int i = 0; i < results; i++)

{

ScoreDoc scoreDoc = topDocs.scoreDocs[i];

float score = scoreDoc.score;

int docId = scoreDoc.doc;

Document doc = searcher.Doc(docId);

Console.WriteLine("Result num {0}, score {1}", i + 1, score);

Console.WriteLine("ID: {0}", doc.Get("id"));

Console.WriteLine("Text found: {0}\r\n", doc.Get("postBody"));

}

searcher.Close();

directory.Close();

}

In this method we will search for search term lorem*. Asterisk means that we want to get all documents that start with search term lorem. Search results are retrieved as TopDocs object that contains scoreDocs collection. Each scoreDoc contains information about specific document returned by search. To get actual document we need to ask it from searcher by document ID.

More resources

Here you can find more information about Lucene.NET:

Conclusion

Lucene.NET is good solution for applications that need wide and powerful search capabilities. Lucene.NET is small library by size and it is very easy to use. Lucene.NET API enables you to fully manage the search index and perform queries on it. Although Lucene.NET is in Apache incubator right now it is promising project and I think it is worth to try out.

hi

where i want to write this code in asp.net mvc application, i have to write in controller or global.asax file. please help me out.

selvakumars - Wednesday, January 30, 2013 10:44:41 AM

You can put document indexing code to controller action that saves/creates/deletes data and search code to controller action where search keywords are submited.

DigiMortal - Wednesday, January 30, 2013 6:16:26 PM

What is Lucene.NET?

Adding documents to Lucene.NET index

Searching documents

More resources

Conclusion

2 Comments