Creating Zip archives in .NET (without an external library like SharpZipLib)
Overview
SharpZipLib provides best free .NET compression library, but what if you can't use it due to the GPL license? I'll look at a few options, ending with my favorite - System.IO.Packaging.
SharpZipLib is good, but there's that GPL thing
SharpZipLib includes good support for zip. I've written about it a few times, and I think it's great. Unfortunately, it's under a wacky "GPL but pretty much LGPL" license - it's GPL, but includes a clause that exempts you from the "viral" effects of the GPL:
Linking this library statically or dynamically with other modules is making a combined work based on this library. Thus, the terms and conditions of the GNU General Public License cover the whole combination. As a special exception, the copyright holders of this library give you permission to link this library with independent modules to produce an executable, regardless of the license terms of these independent modules, and to copy and distribute the resulting executable under terms of your choice, provided that you also meet, for each linked independent module, the terms and conditions of the license of that module.
Bottom line In plain English this means you can use this library in commercial closed-source applications.
I'm pretty sure that the reason for this odd "sort-of-GPL" license is because some of the SharpZipLib is based on some GPL's Java code. However, most companies have policies which forbid or greatly restrict their use of GPL code, and for very good reason: GPL has been set up as an alternative to traditional commercial software licensing, and while it's possible to use GPL code in commercial software, it's something that requires legal department involvement. So, my bottom line is that I can't use your code due to your license.
.NET Zip Library
UPDATE: DotNetZip has been released on CodePlex, and the one issue I ran into has been fixed. I'd recommend giving this a try instead of System.IO.Packaging (as I'd originally recommended), because it's a lot easier to use.
The Zip format allows for several different compression methods, but the most common is Deflate. System.IO.Compression includes a DeflateStream class. You'd think that System.IO would include Zip, but... no. The problem is that, while System.IO.DeflateStream can write to a stream, it doesn't write the file headers required for Zip handlers to read them.
Microsoft Interop blog posted a .NET Zip Library which adds the correct headers to the output of a System.IO.Compression DeflateStream.
ZipFile zip= new ZipFile("MyNewZip.zip");
zip.AddDirectory("My Pictures", true); // AddDirectory recurses subdirectories
zip.Save();
Note: DotNetZip has been released to CodePlex, and the issue I reported has been fixed.
This works, but with some caveats. First of all, adding files causes an identical structure to be created in the zip. For instance, if I use the following:
zip.AddFile("C:\My Documents\Sample\File.txt");
The resulting Zip will contain File.txt, but it will be within the \My Documents\Sample\ hierarchy. There's no way to control the structure of the zip file when you add individual files, unless you want to modify the zip library (which is under MsPL license). That proved to be a big problem in my case, because the zip structure I'm creating is pretty rigid. So, if you're just zipping an entire folder full of files, this library may work for you, but if you need more control you may need to modify the library. I'm guessing if this were published on CodePlex it would have been fixed a while ago.
Another larger problem to keep in mind is that stream based compression is much less efficient than file based compression. File compression can optimize the compression used based on the content of all included files; stream based compression compresses data as it comes in, so it can't take advantage of data it hasn't seen yet.
The J# Zip Library
J# has included zip since day one, to keep compatible with the Java libraries. So, if you're willing to bundle the appropriate Java library (specifically, vjslib.dll), you can use the zip classes in java.util.zip. It works, but it seems like a really goofy hack to distribute a 3.6 MB DLL just to support zip.
System.IO.Packaging includes Zip support
In .NET 3.0, you can use the the System.IO.Packaging ZipPackage class in WindowsBase.DLL. It's just 1.1 MB, and it just seems to fit a lot better than importing Java libraries. It's not very straightforward, but it does work. The "not straightforward" part comes from the fact that this isn't a generic Zip implementation, it's a packaging library for formats like XPS that happen to use Zip.
First, you'll need to find WindowsBase.dll so you can add a reference to it. If it's not on your .NET references, you'll probably find it in C:\Program Files\Reference Assemblies\Microsoft\Framework\v3.0\WindowsBase.dll.
It's not as simple as it should be, but it does work. Here's a sample that creates a Zip archive and adds two files:
using System;
using System.IO;
using System.IO.Packaging;
namespace ZipSample
{
class Program
{
static void Main(string[] args)
{
AddFileToZip("Output.zip", @"C:\Windows\Notepad.exe");
AddFileToZip("Output.zip", @"C:\Windows\System32\Calc.exe");
}
private const long BUFFER_SIZE = 4096;
private static void AddFileToZip(string zipFilename, string fileToAdd)
{
using (Package zip = System.IO.Packaging.Package.Open(zipFilename, FileMode.OpenOrCreate))
{
string destFilename = ".\\" + Path.GetFileName(fileToAdd);
Uri uri = PackUriHelper.CreatePartUri(new Uri(destFilename, UriKind.Relative));
if (zip.PartExists(uri))
{
zip.DeletePart(uri);
}
PackagePart part = zip.CreatePart(uri, "",CompressionOption.Normal);
using (FileStream fileStream = new FileStream(fileToAdd, FileMode.Open, FileAccess.Read))
{
using (Stream dest = part.GetStream())
{
CopyStream(fileStream, dest);
}
}
}
}
private static void CopyStream(System.IO.FileStream inputStream, System.IO.Stream outputStream)
{
long bufferSize = inputStream.Length < BUFFER_SIZE ? inputStream.Length : BUFFER_SIZE;
byte[] buffer = new byte[bufferSize];
int bytesRead = 0;
long bytesWritten = 0;
while ((bytesRead = inputStream.Read(buffer, 0, buffer.Length)) != 0)
{
outputStream.Write(buffer, 0, bytesRead);
bytesWritten += bufferSize;
}
}
}
}
One weird side-effect of using the ZipPackage to create Zips is that Packages contain a content type manifest named "[Content_Types].xml". If you create a ZipPackage, it will automatically include "[Content_Types].xml"., and if you try to read from a ZIP file which doesn't contain a file called "[Content_Types].xml" in the root, it will fail.
You'll notice that the compression in my test is not that great. In fact, pretty bad - Notepad.exe got bigger. Binary files don't compress nearly as well as text-based files - for example, I tested on a 55KB file and it compressed to less than 1KB - but the compression in this library doesn't appear to be fully implemented yet. For example, the CompressionOption enum includes CompressionOption.Maximum, but that setting is ignored. Normal is the best you'll get right now.
Another possible reason for low compression ratios in this sample is that I'm adding files separately rather than adding several files at a time. As I mentioned earlier, Zip compression works better when it has access to the entire file or group of files when creating the archive.
You can use the packaging library for your own file format. For example, here's an example that stores object state using XmlWriters to write to a Zip stream.
But where's System.IO.Zip?
That's a good question. All the Zip handling in System.IO.Packaging is in an internal class MS.Internal.IO.Zip. It would have been a lot more useful to implement a public System.IO.Zip which was used by System.IO.Packaging so that we could directly create and access Zip files without pretending we were creating XPS packages with manifests and Uri's.