Disaster recovery in case of Database/Container deletion and data corruption in Azure Cosmos DB
No matter what our strategy is to protect our resources and avoid accidental deletion, we should always be prepared for crisis management and know which steps to follow in case a disaster occurs. When it comes to disaster recovery, databases come on top of the list. Here we are going to discuss Azure Cosmos DB.
Before we go any further, here is the reference page in azure documentation.
https://docs.microsoft.com/en-us/azure/cosmos-db/online-backup-and-restore
First of all, what would you do if you accidentally delete a Container?!
According to the documentation "Azure Cosmos DB automatically takes a backup of your data for every 4 hours and at any point of time, the latest two backups are stored". That means at the time of deletion in worst case there are two backups available, one for about 4 hours ago and the other for 8 hour ago.
These settings could be changed in the portal as you see here:
Even if we set the backup settings here but backups are not directly available to us. To restore a backup we need to open a support ticket and azure support to do so. Pay attention that backups are made at account level.
Another fact to be aware of is that "If the container or database is deleted, Azure Cosmos DB retains the existing snapshots of a given container or database for 30 days".
What should we do if we accidentally deleted a Database or Container in Azure Cosmos DB?
1. The first thing to do is to go to the portal and increase Retention time to preferably 7 days! That gives azure support enough time to react and restore the backup before it's overridden with new backups! According to the documentation this action "It’s best to increase your retention within 8 hours of this event". I do not exactly know what "It’s best" means in this context and what will happen if it goes beyond 8 hours but anyway better to do this immediately.
A question may arise here is that what if we notice the wrong deletion of the Database/Container sometime later?! For instance, we delete the container on Friday and notice the mistake on Monday? My first thought would be that we should adjust number of backups and retention time based on our business needs. If the data is critical and chance of corruption is high, better to create backups more frequently and retain them longer.
2. Second actin is to open a support ticket and ask them to restore the database.
What should we do if we accidentally deleted an Azure Cosmos DB Account?
Microsoft recommends we should NOT re-create the account with the same name! According to the documentation "Because it not only prevents the restored data to use the same name, but also makes discovering the right account to restore from difficult". So keep that in mind. We just create a support ticket.
What about data corruption? What if a buggy application corrupts the items inside a Container? What if we accidentally deleted/updated wrong data in a Container?
1. First thing to do is the same as deleting a Container which we explained earlier. We should increase retention time. In this scenario knowing the time of incident is very important. Since the Container is up and running, the new backups get created from the corrupted Container and therefore, we should know the time of incident to restore the right backup.
2. Open a support ticket.
In the end, I would highly recommend to read the reference page in the documentation since it contains some other important points.