At some point over the last six weeks and change, I launched POP Forums as a commercial, hosted product. By that I mean, it was something I could sell to others. I'm not suggesting that I've done any real sales effort, obvs, because I don't have any external customer yet. But there was a lot to learn from the effort.
The first bit was that, in instrumenting the shit out of it, I discovered some weaknesses early on. Maybe "weakness" is a strong word, because none of the things I found would have prevented a new, high scale customer from bringing it down. I had, over the course of the last year and a half, written all kinds of code to make it scale, but never really put the app into a real scale situation, where there were multiple nodes and an external service (ElasticSearch) providing search, etc. There were some quick lessons learned:
- Having search well outside of the network where the app runs means it did sometimes fail. Retry logic (using Polly) adequately addressed the problem.
- At first I tried using the AWS flavor of ElasticSearch, but I was kind of annoyed at their overall approach to security, which was different from that of straight ES, meaning it was like all permissions in AWS. There were way too many cautionary tales about the speed at which you could scale the AWS flavor, too, and starting out tiny, that didn't sit well with me.
- Moving my search to an Elastic hosted instance was the right move, and it's dirt cheap for the foreseeable future. Like, under $20 per month cheap. Plus it's "real" ES. The only negative is that it's hosted in Azure US West-2, and my app is in US West. Search results still come back in a decent 150ms-ish hit. It's easy and fast to scale up or down.
- The small utility app I wrote to populate the search index took about three minutes to run for a half-million posts. Again, the cheap index hosting from Elastic took it like a champ.
- The Linux variety of Azure App Services are actually wrapped up in Docker containers, even when you deploy via web deploy. This might explain why they deploy and scale out so fast.
- The forum app and the management app are both running on the same app service plan (the same VM), and I'm running on two instances for now. I had one of the management apps go "bad" for some reason twice, so depending on which instance the infrastructure assigned your request to, it could have returned a 502 bad gateway response. Not sure why Azure doesn't automatically sense this as a failure and take the instance out of rotation, but they have a preview feature for this. I'll take it!
- I have to keep the ARR affinity on, which means users are sticky to a particular node, otherwise SignalR, the websockets implementation, doesn't work. This doesn't seem to have any material impact on performance.
- Web deploy to Linux app services isn't "done" when the deployment is done. It appears that there are some switches in the background as it cycles between old and new instances, and it is possible to briefly get a 502. Maybe having 3 nodes would improve this, but it's a little disappointing, however momentary.
All of this still leaves me in awe that I can have this kind of redundancy and resiliency for so little money. How did we get along before "the cloud?" I've learned that more than 90% of my database calls are handled by in-memory cache (across multiple nodes, no less). The database DTU (the virtualized unit of CPU for the price structure I'm using) averages less than 4%, while the CPU for the web nodes stays well under 10% at the current, starter pricing levels. Page render times stay below 40ms, and that's accounting for serving images out of the database. I'm still confident that I could hit 2,500 request per second without scaling up (though response times would be better if I did).
The "big" test came when I migrated the PointBuzz forums over to the platform, and wow did it shine. It's not a huge load, but 30 to 50 requests per second isn't unusual for that site, and it doesn't cause any deviation from the above numbers. I'm so happy to split the forums from the rest of the site, so they aren't dependent on each other. The search doesn't get a lot of use there, but it's so nice to have real, useful search! The migration also forced me to add a feature to insert Google AdSense ad slots into the page, because their "auto ad" thing wasn't really inserting anything. I added code to insert a banner at the top, and a sidebar ad, and wouldn't you know it, it's generating CPM's of almost $2. Haven't seen money like that in a long time!
At this point, I just need some real customers to prove this is all for real. Honestly, it all works better than I expected.