File System search via LINQ to Objects

LINQ provides a standard way for developers to query data in diverse locations, ranging from in memory objects, XML data, or relational data living in an SQL Server Database.Lets take a look at a scenario where we use Linq to objects to query a directory on the local drive for files that match a given extension, and show the results in a Datagrid control.

Here is the plan of action for the application

  1. Create a function to return a list of all the files in a directory.  We will call this method GetFiles, it will take one string parameter representing the base search directory, and will return a strongly typed list of FileInfo Objects.
  2. Use Linq to Objects to filter the returned list for files that matches a user-specified file extension.
  3. Bind the results to a grid
Lets start by taking a look at the GetFiles function:

public
static System.Collections.Generic.List<FileInfo> GetFiles(string sPath, string sFileExtension)
{
     
DirectoryInfo _dirInfo = new DirectoryInfo(sPath);
     
return System.Linq.Enumerable.ToList(_dirInfo.GetFiles(string.Format("*{0}",sFileExtension), SearchOption.AllDirectories));
}

The GetFiles() function uses objects from the System.IO namespace to do the heavy lifting of searching the file system. Results are returned in a strongly typed list of FileInfo Objects.

Next we will look at the code that calls the GetFiles method defined above, and uses Linq to filter the list for the given file type. You would typically put this code in the click event for a button control on a windows form.

//get all files contained in the path supplied by the user
System.Collections.Generic.List<FileInfo> _theFiles = GetFiles(c:\myDirectory, ".doc"); 

//we now have a list of files...next we use LINQ to query the file list and sort the results by name
var _files = from file in _theFiles 
     
orderby file.Name 
                
select file;

this.dataGridView1.DataSource = _files.ToList();

And there you have it. Note the use of the orderby clause to sort the results before you bind the results to the grid. 

21 Comments

  • Haven't you filtered only for .doc files? Hmm... Since all the extensions are the same, how does sorting on extensions help? Did miss something here?

  • Cyril, you are correct sorting by file-extension would be meaningless in this particular scenario. Sorting by Name or last modified date would be a better option. Thanks for your feedback!

  • FYI, calling ToLower() will allocate a new string. Since this is called for every file in directory, this could get expensive.

    A more efficient way would be to write the query like so:

    var _files = from file in _theFiles
    where file.Extension.Equals(".doc", StringComparison.InvariantCultureIgnoreCase)
    orderby file.Name
    select file;

  • Nice, but I wonder how this scales when there are thousands of files in the folder. I would think it still performs nicely, but have you done any performance testing?

  • Chris, somebody said "performance isn't an issue until it is an issue." Since it isn't an issue don't worry about perfomrance ;)

  • This is like the hello world of Linq.

    (and if you want performance, am enterprise or desktop search tool is what your're looking for)

  • Your GetFiles function looks strange to me. Why not:
    protected IList GetFiles(string sPath)
    {
    try
    {
    var _dirInfo = new DirectoryInfo(sPath);
    return _dirInfo.GetFiles("*.*", SearchOption.AllDirectories))
    }
    catch (Exception ex)
    {
    MessageBox.Show(ex.Message.ToString());
    }
    }

    Even this way it is bad, since functions should not translate exception to message boxes, especially if it is a linq provider function to be used anywhere.

  • To Andemann and Chris: The intent is not to provide an &nbsp;efficient file search tool, but to demonstrate how one could use LINQ to quary a list of objects, filter the list, sort the items, and bind the results to a grid.

    To Andrey Shchekin: The MessageBox.Show code in the catch block for the GetFiles() function is a good way to get my point accross. in production code, you would perhaps want to re-throw the error, or return a status code to the calling app, or something to that effect. And yes, the code in the GetFiles function could have been written otherwise...the code for that function was written about 2 years ago when Type Inference wasnt around. But thanks for your suggestion!

  • Never "re-throw" exceptions. Just "throw". You will preserve your stack trace.

  • Hey Andrew!

    @Andrey: why not take it one step further and write like:

    try
    {
    return new DirectoryInfo(path).GetFiles("*.*", SearchOption.AllDirectories);
    }
    catch (Exception exception)
    {
    throw;
    }

  • @Will, in this case you should remove try catch as well. ;)

  • Hey Andrew, thanks for pointing that out (rethrowing exceptions). Old habits die hard I suppose :)

  • While we are on the subject, you're using Linq on one end of this, but not the other. We can reduce GetFiles to:

    public static System.Collections.Generic.List GetFiles(string sPath)
    {
    DirectoryInfo _dirInfo = new DirectoryInfo(sPath);
    return System.Linq.Enumerable.ToList(_dirInfo.GetFiles("*.*",SearchOption.AllDirectories));
    }

    Note, also, that you explicitly use System.Collections.Generic, thereby assuming that it's not in a "using" (even though Visual Studio automatically inserts it), but DO assume that there's a "using System.IO;" (which must be manually added)

  • To Andrey Shchekin:
    What are the benefits of using Ilist rather than List ?

  • It is really good.

  • Nice example, thanks!

    However, it doesn't seem to scale very well. Trying to get a list of files (over 600) by descending LastWriteTime takes forever or doesn't complete at all:

    Dim di As New DirectoryInfo(path)

    Dim result = (From files In di.GetFiles(wildCard) _
    Order By files.LastWriteTime Descending _
    Select files.Name, files.LastWriteTime).First()


    For Each myFile In result
    Console.WriteLine("{0} {1}", result.Name, result.LastWriteTime.ToString)
    Next

    What I was trying to do is get the file with the latest LastWriteTime. Any LINQ or non-LINQ ideas?

    Thanks.

  • Sharp tools make good work.

    -----------------------------------

  • -----------------------------------------------------------
    You made some good points there. I are you aware a look for within the topic and discovered most folks will concur together with your weblog.

  • Hi, i feel that i saw you visited my internet web page therefore i came to ??return the favor??.I'm trying to uncover things to increase my web page!I suppose its ok to use some of your ideas!!

  • this is helpful.
    Can you select top 10 files.
    this will be great

  • Anyone didn't remember to include Playlist. com, just where it certainly is not actually needed for you to definitely sign-up and you could steady stream any kind of song you wish.

Comments have been disabled for this content.