Damn, Rob saw right through the PR lies of Phil Haack about the new NuGet gallery and figured it all out. Kidding, kidding. For your enjoyment and context, here is Rob’s post:
Phil made a thoughtful answer to the post where he gives the rationale behind the decision:
What I want to address here is what that means (or not) about Orchard.
“Why wasn’t it fast before?”
Because by default, back when the NuGet gallery was first built, Orchard was grossly under optimized. We made a lot of progress, but it still is under optimized out of the box --in a number of ways-- which is why we’re having this discussion for the 1.4 release:
One thing you cannot accuse the Orchard team of is premature optimization… But seriously, there are options available today: installing the cache module, or the combinator module. Or custom, in-site optimizations.
So could the NuGet team have optimized its gallery instead of mobilizing several engineers over a few months to rewrite it? Sure they could have, but it so happened that they didn’t, for reasons Phil explained better than I could.
As a matter of facts, our own gallery site, which just passed the million-download mark, still uses that same codebase that NuGet was using before. The new NuGet gallery doesn’t display total download stats (because the home page is now static html) but I believe it’s around 3 million so we’re looking at a similar scale here. Just visit http://gallery.orchardproject.net/ and you’ll see that it’s performing quite well. So what’s the deal here?
Well, we did spend the time to apply a few optimizations:
- We profiled the code and fixed the biggest bottlenecks. Those fixes, of course, now benefit all Orchard users.
- We profiled the database queries (using Ayende’s excellent NHProf) and optimized the biggest offenders.
- We activated output caching.
All optimizations that NuGet never saw applied. And well, see for yourself, it’s not too bad. We’re not quite done, there is a whole lot more we can and will do, with what limited resources we have (hint: a lot less than NuGet).
We’re not the only ones: more and more big sites are going online with Orchard, with good performance.
But Rob also says something interesting and expresses a perception that I’ve seen many times:
“I’ve deployed Big CMS’s before as a solution for clients and every single time we decided to move away. They’re great for getting off the ground – but after a while there’s just too much friction.”
That is true of some CMS, but not all. In the case of Drupal or Orchard (not what Rob used, I checked), my own experience and that of many others is that because the content model is factored at precisely the right level of abstraction, it’s actually sometimes harder to get off the ground, but the friction diminishes over time. Where a more traditional application requires you to pile functionality on a relatively monolithic data model, modern CMS allow you to add functionality in a completely decoupled and unobtrusive way that puts the composition exactly where it needs to happen, somewhere between individual properties and full types.
I’m not very satisfied with traditional database backup solutions. It seems like almost no progress was made since SQL Server 6.5 (at least for the built-in tools). They are still outputting monolithic binary backup files that only do the job if the only thing you expect from backups is the ability to recover from catastrophic failures. I expect more, but before I explain, we need a disclaimer:
I am no expert of database backup and may very well miss some crucial points or some aspects of the current state of the art. The solution exposed in this post is the result of my own experiments and has no pretense at exhaustiveness or even reliability. It’s something I use on my own simple projects, and no more. Database backups are serious business and your data is valuable. Use this at your own risk.
What I wanted from my database backup solution can be summarized as follows:
- Works on Azure databases (can you believe that as I’m writing this there is no valid built-in backup solution on Azure?)
- Backups are not stored in the same place as the database
- Has more than one state to rollback to
- Can be automated
- Easy to restore
- Can make diffs between backup states
- Can merge backup states
The last two, diffs and backups, can seem a little peculiar and require some explanation. I work on several Orchard projects, where a team of developers collaborate. The team is virtual, with people in the US, and in various places in Europe. Because Orchard is a CMS, the database contains more than just data: it also stores the configuration of the site, something that developers modify as part of their routine job. Since it is impractical to work off a central database because of latency, and also because one developer can break everyone’s instance with a bad change, we need multiple copies of the database, and a way to reconcile them.
What works in a distributed fashion, resolves conflicts, and stores history? Why, a Distributed Version Control System of course! This is what gave me the idea of storing my database backups in a private repository on BitBucket, using Mercurial as the DVCS.
In order to be able to do diffs and merges, we need a text-based persistence format. I chose to use the bcp command-line utility to emit that format because my past experience of it has been that it’s fast and reliable. The code could in principle use the bulk copy SQL commands, but having never worked with them I went for bcp. If someone wants to modify the code and make a pull request using bulk copy that’s fine by me. Bcp produces flat text files for each table that are very easy to diff and merge.
My solution, HgDBackup, comes with two command-lines that automate bcp, one for backup and one for restore. HgDBackup.exe enumerates the tables in the database and spits out one text file per table. HgDRestore.exe enumerates the tables, tries to find a text file with the same name, and if found, drops all rows in the table before importing the contents of the file into the table.
The schema of the database is not backed up, so you need to do that as a separate operation every time it changes, or as a separate automated task.
In the distributed development environment I described above, the process to take a new version of the database will be as follows:
- execute HgDBackup
- commit local changes
- pull and update from the repository
- execute HgDRestore
- run tests
- commit if a merge was necessary
- push to repository
Both commands have built-in help that you can bring up with hgdbackup /? or hgdrestore /?.
The backup command responds to the following flags:
- /Server or /S: the address of the server to backup (e.g. tcp:myazuredb.database.windows.net or .\SQLExpress)
- /Database or /D: the name of the database to backup (e.g. Orchard)
- /User or /U: user name to use on the database
- /Password or /P: password to use on the database
- /Output or /O: output directory where the tables will be dumped (e.g. C:\backup)
- /Template or /T: connection string template
The restore command responds to the same flags, except that Output is replaced with Input.
The default connection string pattern is SQLExpress with integrated security (so no password has to be on the command-line).
For example, here is what I use to backup one of my Azure databases:
And here is what I use to import that backup into my local SQLExpress instance:
That nwazet.backup directory is a clone of a private repository on BitBucket. It’s of course important that the repository is private because databases contain secrets.
The code is available on BitBucket:
You may also download the already compiled commands:
And remember, use at your own risk…