Sitecore site suddenly goes down (Azure PaaS)
Taxonomy: Sitecore
Problem:
Recently in one of our projects, a happy and well performing site, suddenly started performing really poorly to a point that it started throwing 502 errors and basically Site went down!
As an initial investigation step, we started looking at resource utilization but everything seemed to be perfectly fine, CPU utilization was under 40% and Memory utilization was below 50% the rest of resources such as Redis, SQL DTUs, etc were also following the smiliar pattern
After hours of investigation and getting Sitecore support team involved, Sitecore support deep dive into memory dump of CD server shown some threads are being blocked while Sitecore was trying to send the logs to Application Insight via log4net
As you could see the thread 127 was blocked by Application Insight Post action
Solution:
This was happening due to Application Insight logs not being batched and as you know there are quite a lot of logs flowing through so if response time of Application INsight API is degregated for any reason, this could happen to many threads.
To resolve this, you can set the logs to be batched by changing the AutoFlash to false for trace in your web.config