Easy high speed reading/writing of structured binary files using C#

Quite a bit has been written about reading structured binary data from or writing it to files (see [1,2,3]). [1], for example, compares three different approaches. Unfortunately none is as straightforward as C/C++ code would be. Here´s how you could read the ID3v1 tag from a MP3 file:

struct ID3v1Tag

   char tag[3]; // == "TAG" 
   char title[30]; 
   ...
};

ID3v1Tag t;

FILE *f = fopen("mysong.mp3", "r");

fseek(f, -128, SEEK_END);

fread(&t, 1, 128, f);

printf("%.30s\n", t.title);

fclose(f);

Now, if you wanted to accomplish the same with C#... it would not look that easy anymore. The reason: you cannot read data from a file (stream) directly into a struct. A stream always requires a byte array as the target for read operations. Or if you use a BinaryReader the ReadBytes() method returns a byte array. In any case the data read into a byte array needs to be copied into the target struct.

[1] uses Marshal.PtrToStructure() to do this, and [3] offers a much more elegant solution using an unsafe assignment like this:

[StructLayout(LayoutKind.Sequential, Pack=1)]
unsafe struct ID3v1Tag
{
    ...

    public ID3v1Tag(byte[] data)
    {
       fixed (byte* pData = data)
       {
           this = *(ID3v1Tag*)pData;
       }
    }
}


Alternatively you could read data from an input stream in little chunks using a BinaryReader, which would mean you deserialize the data into each field by hand. This avoids the extra copy of data, but requires much effort on your side. You´re trading performance for lines of code.

That´s what can be said about reading (and writing) binary data using C# (or managed code in general).

However, due to a customer engagement I recently started thinking about this. The customer needs to port C++ code which interacts massively with binary files to C#. The approaches found in the literature, though, are too slow for him. The need for an extra data copy really hurts the application´s performance. So he kept essential parts of the code in C++ to benefit from the languages ease of use when accessing binary data.

I felt challenged by this problem. And here´s my solution: Easy reading/writing of binary structured data using C# 2.0 - without the need for an extra data copy. Look at the following code for reading the ID3v1 tag of a MP3 file:

[StructLayout(LayoutKind.Sequential, Pack = 1)]
public unsafe struct ID3v1Tag
{
 private fixed sbyte tag[3];
 private fixed sbyte title[30];
 ...
}

using (System.IO.BinaryFile fmp3 = new System.IO.BinaryFile("myfile.mp3", System.IO.FileMode.Open))
{
 ID3v1Tag t;
 
 unsafe
 {
  fmp3.Seek(-128, System.IO.SeekOrigin.End);
  fmp3.ReadStruct<ID3v1Tag>(&t);
 }

 if (t.Tag == "TAG")
 {
  Console.WriteLine("title: " + t.Title); ...
 }
}


I´d say it´s as easy to read/write as the C++ equivalent above. And it´s just generic functions that get called. And no extra copies of data are needed. The ID3v1 tag data is read directly into the ID3v1Tag struct passed to the Read() method.

How is this done?

Well, I removed the premise that underlies the usual literature on this topic: I don´t use System.IO to access the file, but the old CRT fxxx() functions. The above BinaryFile class encapsulates the calls to the following C DLL functions:

[System.Runtime.InteropServices.DllImport("CRTFileIO.dll")]
private static extern int FileOpen(string filename, string mode);

[System.Runtime.InteropServices.DllImport("CRTFileIO.dll")]
private static extern void FileClose(int hStream);

[System.Runtime.InteropServices.DllImport("CRTFileIO.dll")]
private unsafe static extern bool FileReadBuffer(int hStream, void* buffer, short bufferLen);

[System.Runtime.InteropServices.DllImport("CRTFileIO.dll")]
private unsafe static extern bool FileWriteBuffer(int hStream, void* buffer, short bufferLen);

[System.Runtime.InteropServices.DllImport("CRTFileIO.dll")]
private unsafe static extern bool FileSeek(int hStream, int offset, short origin);

[System.Runtime.InteropServices.DllImport("CRTFileIO.dll")]
private unsafe static extern bool FileGetPos(int hStream, out int pos);

[System.Runtime.InteropServices.DllImport("CRTFileIO.dll")]
private unsafe static extern bool FileFlush(int hStream);

I just wrote a small unmanaged DLL wrapper around the basic stdio C functions like fopen(), fread() etc. That´s all the magic there is. Look at my C function for reading data from a file:

extern "C" DLLEXPORT short __stdcall FileReadBuffer(FILE *stream, void *buffer, int bufferLen)
{
 int n = fread(buffer, 1, bufferLen, stream);
 return n == bufferLen;
}

This function is called by a wrapper class´ method to make it easier for application code to work with binary files. BinaryFile hides the CRT file handle and looks much like a FileStream (that´s also the reason why I put BinaryFile into the System.IO namespace):

public unsafe bool ReadStruct<StructType>(void *buffer) where StructType : struct
{
  return Read(buffer, (short)System.Runtime.InteropServices.Marshal.SizeOf(typeof(StructType)));
}

public unsafe bool Read(void* buffer, short bufferLen)
{
 ...
 return FileReadBuffer(hFile, buffer, bufferLen);
}

This Read() method you just need to pass the address of the target struct to receive the data from the file and the number of bytes to read. That´s it. fread() will put the data right into the C# struct. No extra byte[], no explicit deserialization of fields. You just need to be willing to use unsafe code:

unsafe
{
 fmp3.Read<MyStruct>(&myStructVar);
}

I´d say, it cannot become much easier or faster than this, when reading from binary files.

If you´d like to give this approach a try, you can download sources here.

In order to use the BinaryFile class just add a reference to CRTFileIO.Import.dll to your C# project and make sure the C wrapper CRTFileIO.dll gets copied to the same directory as CRTFileIO.Import.dll.

Enjoy!

Resources

[1] Anthony Baraff: Fast Binary File Reading with C#, http://www.codeproject.com/csharp/fastbinaryfileinput.asp

[2] Robert L. Bogue: Read binary files more efficiently using C#, http://www.builderau.com.au/architect/webservices/0,39024590,20277904,00.htm

[3] Eric Gunnerson: Unsafe and reading from files, http://blogs.msdn.com/ericgu/archive/2004/04/13/112297.aspx

4 Comments

  • Assuming this is C# 2.0, and you need to do this on more than one struct, you can make it easier by using generics:



    public T ReadStruct(string filename)&lt;T&gt; where T:struct, new()

    using (System.IO.BinaryFile fmp3 = new System.IO.BinaryFile(filename, System.IO.FileMode.Open))

    {

    T t = new T()



    unsafe

    {

    fmp3.Seek(-128, System.IO.SeekOrigin.End);

    fmp3.Read(&amp;t, (short)Marshal.SizeOf(typeof(T)));

    }

    return T;

    }



  • @Ayende: Thx for your idea. However, it introduces the very data copy I wanted to avoid: ReadStruct() returns a struct on the stack which probably needs to be copied to the real destination in the caller&#180;s method.



    Nonetheless using Generics could make my Read() method a little easier, since the struct length could be determined automatically.



    -Ralf

  • IntPtr ptr = IntPtr.Zero;
    Marshal.StructureToPtr(YourStruct, true);
    fs = new FileStream(Filename,FileMode.CreateNew,FileAccess.Write);
    byte* bytedata = (byte*)ptr.ToPointer();

    for (int i = 0; i < Marshal.SizeOf(YourStruct); ++i)
    {
    fs.WriteByte(bytedata[i]);
    }

  • If you want to save an struct in C# like in C++
    you must use MarshalAs attribute.

    [StructLayout(LayoutKind.Sequential, CharSet=CharSet.Ansi)]
    public struct Empleado
    {
    [MarshalAs(UnmanagedType.ByValTStr,SizeConst=32)]
    public string name;
    public UInt32 id;
    }

    With the MarshalAs attribute, you set that when you use the string in an unmanaged context, it would be used like an ANSI null terminated string of 32 bytes of fixed size.

    you can use the string member of the struct like a normal string. When you wanto to write the struct into a binary file, you must use a code like this:

    Empleado emp = new Empleado();
    FileStream fstream = new FileStream("C:\\binario.bin", FileMode.Create, FileAccess.Write);
    BinaryWriter binwriter = new BinaryWriter(fstream);

    emp.name = "Estuardo";
    emp.id = 0x00112233;

    int size = Marshal.SizeOf(emp);

    IntPtr handle = Marshal.AllocHGlobal(size);
    Marshal.StructureToPtr(emp, handle, true);

    byte* ptr = (byte*)handle.ToPointer();

    while(size-- != 0)
    {
    binwriter.Write(*ptr++);
    }

    Marshal.FreeHGlobal(handle);
    binwriter.Close();

    this code is only valid in an unsafe context
    (you must compile your code with the /unsafe option). The Marshal class is declared in the
    System.Runtime.Interop namespace

Comments have been disabled for this content.