Unexpected Data

While evaluating a product that massages my data (very large quantities - think multiple terabytes) I was becoming frustrated with the fact that the product couldn't handle my sample data set. Even with the vendor rep on-site we never got the entire data set processed so we could evaluate what the product actually did. I didn't think this should be a big deal, the data set wasn't that large (couple of megabytes of ASCII text), but was proprietary so I couldn't be sending it off to have their techies pour over it to see why their product kept choking on it.
I do remember my consulting days when applications would suddenly stop working and after hours of debugging find out that there was some unexpected data sending things out of kilter. In fact after awhile I started looking for "bad" data first when presented with certain scenarios.
Everything was really brought home when evaluating a competitor's product where everything worked right the first time on the same data set. The first product is more mature, has more features, more tuning options, etc, etc. But it doesn’t work on my data, even with their assistance. Because they assumed that all the incoming data would be cleansed and perfect and they skimped on the ability for their product to handle unexpected data, they are going to lose out on licensing revenue (a lot of revenue).
Defensive programming wins out over features.
One my peers today was pointing out a manual process that is a chokepoint. The manual process is there because some of our programs don't handle unexpected data very well. His now famous comment was "If we wrote better progams, we wouldn't have to do that". Amen.
How does one go about writing "better programs"? We are embarking on a crusade to bring automated unit testing to all our projects. A part of this will include having the QA engineers teach our developers about writing good tests including basics such as boundry testing so that unexpected data problems tend to go away as a class of problems.