Another great but under-used way to test is the million monkey method - get as many people in a room hammering at the test farm as you can. The first client I know of who identified what turned out to be a SharePoint performance bottleneck (row-level locks escalating to table-level locks) tested this way with just 14 people. Happily this any many other database performance issues are resolved in SharePoint 2010, but this targeted brute-force testing laid these issues bare where they hadn't come up at all in months of automated testing at other companies.
As an example of why you want the test farm to be identical with production, there was a SharePoint app that a colleague built and tested at a large bank, and they had the benefit of an identical test farm. The app was stress tested over a full 24 hour period, and issues only showed up in the ninth or tenth hour. While that translated to the issue occurring only after weeks of actual production, it would have ground the system to a halt had they not caught it before release. When SharePoint becomes mission critical, building the right test farm is worth the investment.
But because duplicating production can cost a lot, few companies actually do it. When the test farm doesn’t match production you can only guarantee that tests will provide a baseline of performance to compare with other apps or versions run in the same test environment; it really doesn’t tell you much about how the app will behave in production or how many users it will handle. You can make estimates in the ballpark - within an order of magnitude - and extrapolate to a degree, but too many factors conspire against a reliable interpretation; there’s simply no such thing. Whether it’s the number of servers in different farm roles, the use of virtualized vs. real servers for testing, differences in SQL implementation (is it clustered or mirrored, how may spindles are available to split OS/logs/data, are other apps sharing the SAN, etc.) , the effects of load balancing schemes (and admins whose understanding stops at round-robin), the availability and latency of network services, and the actual behavior of users (it’s hard to guess what peaks will be until the app’s been in the wild for a while) all conspire against relying on extrapolated results. Experience helps makes better predictions, but it equally tells you that it’s hard to guess where an app will fail until it does. And then service packs and upgrades inevitably throw your baselines off unless you retest every application after every upgrade.
So unless you can stress test a farm identical with production, you just try to get in the ballpark and pick other practical goals – are web parts or application pages running logic that kill performance, how does the farm respond to a mix of operations, are web service calls stalling pages, where should you be caching parts or data, where could you benefit from AJAX, how does response time on the local subnet compare to requests from around the world, how are these affected by authentication and the location of the domain controllers, how does the respones time of your custom pages compare with out-of-box pages, do use-of-memory patterns show leaks, etc.
Once an app is in production for a while you can get a few numbers to inform the tests – how often do you see peak usage, how many concurrent users does that mean, and how long does it typically last. What’s the mix of reader / writer / administrative operations, what pages do users hit (e.g. list views vs. home pages vs. app pages), where are users calling from, and what’s the mix of file sizes and types being uploaded or read. All of these help you build tests to more accurately simulate production use.
So test early, test often, and gather as much information as you can in order for your tests to approximate "the truth." Where you can't get enough information, or the hardware does not match production, be realistic about what testing will prove. And even if you can't predict the circumstances where production will fail, you can still use load and stress tests to build a better product.