I observed people implementing website search by using google API's or other third party dll's. This is a simple site search engine to search all the images , files ,... in a website without using any third party's like lucene , google. Previously I worked on lucene where i implemented both desktop and database search. But i thought depending on a third party doesn't gain much knowledge. So i've implemented a simple search using regular expressions.
Architecture of the search engine :
I'm using 5 classes here. Here is the class diagram
-
CleanHtml.cs : This is used to clean the file of HTML Tags
-
Page.cs : Page class to store data of individual files on the website
-
PageData.cs : Defines shared methods to create and add records to dataset
-
Site.cs : Properties of this class are used to store configurations and data of the entire site
-
Usersearch.cs : This class contains all the search function methods.
Design and Implementation :
Here the design interface is inspired by google where I'm using a simple textbox where I'll be giving the input i.e., my search keyword. There are 3 option where i will be searching through phrases, senteces and words.
The input which has been given will be passed as the input to the below method which is an instance of UserSearch inside the code.
private
Searchs.UserSearch SearchSite(string strSearch)
{
Searchs.UserSearch srchSite;srchSite = new Searchs.UserSearch();
srchSite.SearchWords = strSearch;
..........
}
Now the search will be looped into all the folders and files in the project and returns the search results which I'm displaying in a list with the url of the file.
Webconfig :
In the webconfig I'm keeping restrictions for the search like what files to search , what files not to search etc,...Check below
<!-- Place the names of the files types you want searching in the following line sepeararted by commas -->
<
add key="FilesTypesToSearch" value=".htm,.html,.asp,.shtml,.aspx,.xml,.jpg"/><!-- Place the names of the dynamic files types you want searching in the following line separated by commas -->
<
add key="DynamicFilesTypesToSearch" value=".asp,.shtml,.aspx,.xml,.jpg"/><!-- Place the names of the folders you don't want searched in the following line spearated by commas-->
<
add key="BarredFolders" value="support files,cgi_bin,_bin,bin,_vti_cnf,_notes,images,scripts"/><!-- Place the names of the files you don't want searched in the following line spearated by commas include the file extension--> <add key="BarredFiles" value="adminstation.htm,no_allowed.asp,AssemblyInfo.vb,Global.asax,Global.asax.vb,SiteSearch.aspx"/>
This is the basic functionality of my search.In this article I'm posting the complete project. If any comments they are welcome.
Regards,
Surya.