SharePoint Testing Strategies

Saturday, October 24, 2009

Someone recently asked about test plans and how to test components during development so you can be comfortable they'll perform well when hosted on large farms. The short answer is that you want to create the best simulation you can, and that means creating a test farm as close to production as possible, and testing scenarios with patterns and data as close to production as possible. With mission-critical apps the test environment should be identical with production, but in most cases it won’t be. Recent versions of LoadRunner do well for building the tests, earlier versions have issues (e.g. with javascript and with scripting against dynamically named / generated file sets). Visual Studio 2010 contains load-testing tools that work great against SharePoint 2010, I'm really looking forward to testing these when beta 2 is released next month. The Developer Dashboard is another great tool for breaking down the load times of each component on your page, the performance of methods in your call stack, and the latency of calls to background services; this will be an indispensible tool for checking performance.

Another great but under-used way to test is the million monkey method - get as many people in a room hammering at the test farm as you can. The first client I know of who identified what turned out to be a SharePoint performance bottleneck (row-level locks escalating to table-level locks) tested this way with just 14 people. Happily this any many other database performance issues are resolved in SharePoint 2010, but this targeted brute-force testing laid these issues bare where they hadn't come up at all in months of automated testing at other companies.

As an example of why you want the test farm to be identical with production, there was a SharePoint app that a colleague built and tested at a large bank, and they had the benefit of an identical test farm. The app was stress tested over a full 24 hour period, and issues only showed up in the ninth or tenth hour. While that translated to the issue occurring only after weeks of actual production, it would have ground the system to a halt had they not caught it before release. When SharePoint becomes mission critical, building the right test farm is worth the investment.

But because duplicating production can cost a lot, few companies actually do it. When the test farm doesn’t match production you can only guarantee that tests will provide a baseline of performance to compare with other apps or versions run in the same test environment; it really doesn’t tell you much about how the app will behave in production or how many users it will handle. You can make estimates in the ballpark - within an order of magnitude - and extrapolate to a degree, but too many factors conspire against a reliable interpretation; there’s simply no such thing. Whether it’s the number of servers in different farm roles, the use of virtualized vs. real servers for testing, differences in SQL implementation (is it clustered or mirrored, how may spindles are available to split OS/logs/data, are other apps sharing the SAN, etc.) , the effects of load balancing schemes (and admins whose understanding stops at round-robin), the availability and latency of network services, and the actual behavior of users (it’s hard to guess what peaks will be until the app’s been in the wild for a while) all conspire against relying on extrapolated results. Experience helps makes better predictions, but it equally tells you that it’s hard to guess where an app will fail until it does. And then service packs and upgrades inevitably throw your baselines off unless you retest every application after every upgrade.

So unless you can stress test a farm identical with production, you just try to get in the ballpark and pick other practical goals – are web parts or application pages running logic that kill performance, how does the farm respond to a mix of operations, are web service calls stalling pages, where should you be caching parts or data, where could you benefit from AJAX, how does response time on the local subnet compare to requests from around the world, how are these affected by authentication and the location of the domain controllers, how does the respones time of your custom pages compare with out-of-box pages, do use-of-memory patterns show leaks, etc.

Once an app is in production for a while you can get a few numbers to inform the tests – how often do you see peak usage, how many concurrent users does that mean, and how long does it typically last. What’s the mix of reader / writer / administrative operations, what pages do users hit (e.g. list views vs. home pages vs. app pages), where are users calling from, and what’s the mix of file sizes and types being uploaded or read. All of these help you build tests to more accurately simulate production use.

So test early, test often, and gather as much information as you can in order for your tests to approximate "the truth." Where you can't get enough information, or the hardware does not match production, be realistic about what testing will prove. And even if you can't predict the circumstances where production will fail, you can still use load and stress tests to build a better product.

1 Comment