Erik Porter's Blog

Life and Development at Microsoft and Other Technology Discussions

News

    Binary Data in SQL Server 2005

    So I finally got around to installing Beta 2 of SQL2K5 (it's about time, been so busy with work).  I open up management studio and go to create a new database so I can start playing.  By the way, I'd recommend you install it and get to playing.  As the betas evolve, SQL Server is starting to get REALLY cool.  There are lots of goodies to make it all easier for you all over the place.  Anywho, so I create a new Database and then create a new Table and I noticed an interesting property for the table.  Text/Image Filegroup

    First off, I am a huge believer in storing images, documents, files of any sort in the DB.  I haven't actually done it much yet because performance is sometimes a problem.  My limited understanding of the performance problem is that storing huge amounts of data like a big binary chunk splits up data pages when it normally would've have been necessary.  So I'm wondering if splitting off the binary data (anything big really) onto a separate file group will help with the performance and storage.  Regardless if it will help or not, at some point I'm just going to start doing it anyway, because in 5 years, everything we're always soooo worried about from a performance standpoint almost always ends up not being an issue anymore.

    Thoughts?  Opinions?  Religious battles?  I'm curious about the subject and what everybody thinks...

    Comments

    Mischa Kroon said:

    Main advantage of storing binaries in an sql server is that you can easier scale too multiple webservers.

    That said I'm more of a store as file person.
    # March 3, 2005 7:29 AM

    Kent Tegels said:

    I believe in storing as file, then have storing either a UNC or URI in the database since you rarely do any DML work on instance stored in the database anyway.
    # March 3, 2005 11:00 AM

    Mike Swaim said:

    The advantage of storing data in the db is that you don't have to worry about file system permissions. The disadvantage is that it's harder to take old documents offline. At the last place I worked, we did the db thing, and files older than 6 months got archived. The place I worked before that, we used a seperate "db" server (actually a FileNet server)for files. In both places, we tended to generate a lot of documents. (As in we had dedicated machines spitting out Word documents every 10-30 seconds.)
    # March 3, 2005 1:06 PM

    Erik Porter said:

    Ok, so what I'm hearing so far is that more people (so far anyway) like storing links to files instead of the files themselves in SQL Server, but I'm not hearing any reasons why. Is it just a religious thing? Give me some reasons! ;)

    Mike, good point on the permissions. That's related to another point that I thought of later which was actually sending the files to the client. Sometimes you want only certain files to go down to certain clients based off of data in other tables anyway, so while you're there you can just join on the file they need and you've got it. You have to create a custom httphandler or something for either scenario, so there's no difference there other than you have to store the physical files in a location not accessible in IIS and give the ASP.NET process write and read access to the folder where they're located or just not worry about it if they're in the DB already.
    # March 3, 2005 1:38 PM

    Mike Swaim said:

    The problem with storing files in a DataBase is that it isn't really designed for it. In our case, most files didn't need to be accessed more than a few days after they were created. For those that we did need to get to, a DBA had to restore them from tape. There are filesystems out there that'll move files that aren't used that much to offline storage, and automagically restore it when you try to access it.
    I believe that Oracle supports file system links where the database just stores the path to the file, and the actual file/blob is stored on a regular filesystem. I'm not sure if SQL Server supports the same functionality, but it's something to look into.
    # March 3, 2005 5:42 PM

    Erik Porter said:

    Yes, Mike, that's exactly why I'm curious about the new property of a table that talked about separating image, text, etc into it's own file group.

    I thought of another thing that storing binaries in the DB helps with...transactions. Say I have 3 records and they all have files associated with them. It's doable, but difficult if two of the files get saved to the file system and the third fails. Whereas if it's just in the table, just rollback and you're done.
    # March 4, 2005 8:52 PM

    Joe Carron said:

    I like the image stuff concept too. I did a file tree backend without any database but my next one is going to be in a database if the performance goes up (I think it will). I have one in the works with C# and MS SQL but have not considered the performance issue yet.

    My DBA friend does not like the idea yet. He rightly claims store the filename in the db and retrieve the file outside the db. It gets messy with 40,000 documents as was the upper limit I did. Luckily, they never get to the upper limit! Typically, company only uses 2,000 images outside the database. That is still a lot of files for things to go wrong.


    # March 15, 2005 2:30 PM

    Josh Nelson said:

    I've used both methods of storing binaries in SQL and I think it really just depends on the requirements/constraints of the system you are building. Really, I could jump on either side of the argument.

    I believe from a performance perspective overall, you are better off storing files in the file system and storing references to those files in the DB. However, I've run into two very good (in my opinion) reasons to store the files directly in the db.

    1) Storing the files in the DB adds another layer of security in a web environment. If your file system is hacked, an intruder would need to break another layer of your application in order to access the files, versus direct access to the files via the file system.

    and (this is my favorite, at the moment!) 2) It is a decent solution when you need to provide file sharing between geographically disparate locations. Image/binary storage of files in a database in conjunction with database replication (so long as your schema is architected correctly) provides awesome benefits. My example is that we have two locations that are only connected 11 hours a day. Users can upload/share files in a web environment at either location, and once communications are established, the databases replicate and users can view the new/updated files that each location made changes to.

    Just my two cents.
    # March 16, 2005 9:01 PM

    Mike Rogers said:

    I have used both methods and I still cannot form an opinion. I like that I can have multiple files with the same filename without any difficulty when I use a db. I also like what Mike said about file sharing.

    My DBA's tell me that I shouldn't store documents because of poor performance. But when Microsoft built a document library handler (Sharepoint), they hold the documents in the database. They obviously think it is a good idea. So when the DBA's tell me that SQL 2005 was not designed with document storage in mind, it sounds a little specious to me (most likely left over paranoia from SQL 7).

    # May 2, 2007 11:15 AM

    Kevin Berridge said:

    I went through this same debate not too long ago.  Ultimately I decided to store my files in the file system and not as blobs in the database.  I even managed to come up with a workable solution for full text searching them from within a stored procedure using linked servers.

    The reasons for this decision were focused on performance.  Using varbinary(max) you can upload the data in one of two ways: All at once through a single SQL call by creating a potentially HUGE byte[], or in chunks by making multiple SQL calls.  I was concerned about performance on both the client and the server in the first case, and transaction support in the second case.  Using client side transactions would make the second case less frightening though.

    Using the file system approach I can stream the bytes in optimally sized buffers based on the filesystem environment we're operating in and without the overhead of a SQL connection and stored procedure call.  I can also now store my files on a different server than my SQL server box and I can full text catalog them without affecting the SQL server's performance in anyway.

    # May 9, 2007 4:41 PM

    Subbu said:

    Can u send me the code in c# to store and retreive the Binary data in SQL

    # June 22, 2007 4:28 AM

    Mike said:

    We've never stored file data in the db.  Mostly due to historical reasons around performance (piping the data through ADO used to have poor performance).   So we too store data on a file server.  It creates problems (syncing, transactions etc), we've been waiting for WinFS for years ;-)

    Regarding security.  We were concerned with this too (we're actually more concerned with the rogue employee behind the firewall having unauthorized access to the share).  The web process runs as a local anonymous user (no network privs).  Then we created a COM+ object that runs as a domain user (providing "proxy" access).  Only this domain user has access to the share.  The domain user cannot logon locally, only as a service, and the password is 100+ random generated characters (at install and thrown away).   The API of the COM+ is such that you don't simply provide a URI, rather you provide a key that is looked up in the db.  Everything happens internal to the API call.  So somebody can't recursively dump the contents of the share (or access other content), you have do know the document ID and request it specifically (and the ids aren't sequential).  This domain proxy user doesn't have access to the folders on the local webserver either - only the remote share.

    We also have one domain user per webserver, and those users are wrapped into a domain group (the group has permissions to the share).  There is also a read-only group, and write-group (plus a backup group for running backups).  This allows us to audit access on a per web server basis.

    # October 5, 2007 10:59 AM

    Jim said:

    We are attempting to store files in the database and all is well except when trying to delete a file from the web app.  It seems that the stored procedure that deletes the file is taking a long time to run and it nevers returns a value back to the web app.  I'm not sure why this is happening but if anyone else has had this issue, please let me know how you fixed it.

    jimkiely@yahoo.com

    # January 7, 2008 10:33 AM

    smartcatxxx@yahoo.com said:

    There is a paper on the performance of saving files to filesystem vs. database. It seems that for files under 256 kb the database does better than the filesystem, and for files over 1 MB the filesystem is the clear winner.

    The link:

    research.microsoft.com/.../view.aspx

    # February 11, 2008 4:57 PM

    eYarmarka said:

    I am working on a system using both technologies. Overall, we are finding that under load file system fares better, but we decided to use SQL server for "secure" data. The files stored in the DB are around 500KB on average, and the data is basically "read only", i.e. is preserved for long time, so no updated or deleted are necessary.

    The point I'd like to stress is that it's difficult to butcher a filesystem implementation (from programming standpoint), on the other hand, developing efficient SQL implementation (with C# in our case) was a "incremental" process.

    Regards

    Yuriy

    # March 30, 2008 2:55 AM

    Shaun O'Reilly said:

    Sharepoint services does not seem to perform poorly when checking documents out of the database. When all documents are stored in db, it is so easily scaled and portable that maintenace becomes a breeze. Imagine having to manage 40 000 cv's when moving to a new server. To many files slows down the indexing server as well. DB file access performance will improve as this will become main stream. It is maintenace paradise for the support guys. Most documents and images that has a quantity in the thousands are less that 256Kb on avearage anymay!

    # April 8, 2008 2:23 AM

    Ruchi Saini said:

    Replicating the data in dbs across different geographical locations does have a run_value limit of 65MB. So the file size stored in the DB is limited by that.

    # May 15, 2008 1:48 PM

    Fuad said:

    Everything sounds cool but i'm trying to store and retrive EXE and DLL files from DataBase and what happens is that my files come up without header information.. any ideas?

    # June 5, 2008 7:15 AM

    Subramania said:

    I am trying to retrieve the data from a table where it is stored as binary. I am inputing an integer value(eg:100.10.10 an IP).It will be converted to Hexa in my Business Level and will be stored as Binary in DB.

    How can retrieve the value as such how i Inserted (like 100.10.10) Is there any possible way?

    Thank you.

    # July 10, 2008 5:41 AM

    HumanCompiler said:

    Subramania,

    You don't need to store IP addresses in a binary field.  Store them in a bigint field instead.  There's a little bit of code you have to write to do the conversion from System.Net.IPAddress to a long (Int64) but it's easy to find.  Do a search on "convert ipaddress to long in c#" or something similar.  A page like this should have the code you need:

    www.codeguru.com/.../c10651

    Good luck!

    # July 10, 2008 2:01 PM

    Ron Barone said:

    MS article on why, with tests to back it up.

    ftp.research.microsoft.com/.../TR-2006-45.pdf

    # August 21, 2008 3:03 PM

    sirplus said:

    ftp.research.microsoft.com/pub/TR/TR-2006-45.pdf

    no longer exists

    guess it was against what was desirable to ms

    cant find the doc anywhere else

    anyone help with a link?

    # September 24, 2008 10:34 AM

    Campi BLOB | hilpers said:

    Pingback from  Campi BLOB | hilpers

    # January 21, 2009 10:10 AM