Real world use-cases to set lifetime to the documents in Cosmos DB
While creating a container in Cosmos DB and also while upserting (insert/update) a document in the database, we can set a lifetime or so called Time to Live (ttl) to the documents. The lifetime is in seconds and the documents get deleted from the database after the given seconds are passed from the last modified time of the documents. But why would be a real use-case to leverage this feature?
Cosmos DB is Microsoft's NoSQL database which is a distributed database in nature. The only thing we need to do while working with this database is to have a right partitioning strategy and provisioning enough throughput. Database sharding and scaling and pretty much everything else is handled by Microsoft seamlessly behind the scene. Having all that said, makes it an ideal database for persisting IoT telemetry data. You can also read my recent post Lesson learned after 2 years working with Cosmos DB to know more about my practical experiences with this database.
In IoT scenarios we typically receive volumes of telemetry data points in mediums like Azure IotHub, EventHub or even normal http endpoints. Cosmos DB has a great integration with other data related technologies in azure and is highly available and globally distributed database. That makes it a perfect location to persist telemetry data at beginning to use it later for further processing. Now let's talk about what is typically done by IoT telemetry data. Three use-cases come into my mind.
1. Realtime processing scenarios
There are some pieces of data that may need to be processed asap and possibly reacted immediately upon. For instance, the data may be related to the temperature of a device or a system. If the temperature exceeds a threshold, we may need to act upon that and take some immediate actions.
Another example could be realtime reporting like tracking a moving car or Escooter or Ebike etc.
2. Reporting scenarios
IoT telemetry data has typically a tight relationship with time. The data either indicates an occurring of an event at a certain time or carries special data at a certain time (like electricity consumption of a smart meter). In such scenarios, typically the data gets processed and harmonized and finally persisted ideally in the time-series date stores like Time Series Insights or TimeScaleDB. Then we can provide super fast reports based on time buckets.
3. Archiving scenarios
The data may just need to be stored somewhere as an archive or source of truth for future possible incidents. The data may not be even needed after a certain time or if a device or machine breaks or goes out of existence. For archiving purposes, data warehouse technologies are probably good feet.
Considering 3 scenarios above, after a certain time, we may really not need to keep storing data in our Cosmos DB. For instance, GPS data points of an EBike from last year may really not be needed to be available in Cosmos DB now. If it is going to be needed, probably some of other 3 scenarios mentioned above are the better places to lookup. Or even if the EBike is already broken and disposed, we no longer need to keep the data in the database.
Setting a proper lifetime makes sure documents get deleted when they no longer are needed. That brings two benefits along. First, it reduces the number of documents in logical partitions in Cosmos DB and consequently improves queries performance (again very important in IoT scenarios). Second, we do not pay for their storage.
They ttl feature is quite handy. The deletion is handled behind the scene and it uses remaining not used throughput we have provisioned for the container. That means first, it does not cost anything extra for us and second the document may not be deleted immediately when ttl is over. We may need to wait until enough not used throughput is available.
Lifetime can be set while creating a container in Cosmos DB by code or from the portal. It could also be updated later. In this case that life time will be applied to all the documents in the container.
Lifetime could also be assigned while upserting a document. In this case, it will override the lifetime set at container level for this particular document.
I'd like to mention one more tiny point before I wrap up this post. There is one more use I make from this feature. If we need to go to the portal and truncate all the data inside a container in Comsmos DB, we can just set the ttl to something like 1 second and then after all the documents are deleted, disable (or reset) the feature again. That's a handy way of getting rid of all the documents in a container in a few seconds.
Here you can read more about this feature: