HypeTeq

Azure Cosmos db – Best Practices

Technical Challenges while integrating CosmosDB in .NET Core Middleware API
Here I would like to highlight some technical challenges while choosing CosmosDB.

1. CosmosDB Partitioning:

How to choose cosmos db partition key?
Performance. Super Fast. Unique Partition Key. Very Fast. Partition Key + Row Key. Slower. Only Partition Key. No Row Key. Slowest. No Partition Key. Partition Key and Row Keys matter!
The Partition key is a property that will exist on every single object that is best group similar objects together. There are logical and physical partition. All documents share same value of partition key and the same logical partition. More logical partition populates a physical partition.
Partition key work as a filter in our queries, it has a large range of values. The request to the same partition cannot exceed the defined throughput.
If There is multi-tenant application, then “TenantId” is best practice of choose partition key. Partition key that has many distinct values to avoid “hot partitions”. Each logical partition has storage limit 20GB, if there are more records then need to divide records in multiple partition.

2. Performance & Cost Optimization:

Azure Cosmos DB is available in two different capacity models.
  1.   Provisioned throughput
  2. serverless
Provisioned throughput offers guaranteed speed and availability and requires you to plan and manage throughput capacity. It is best suited for large, high-throughput workloads with high performance requirements. You can manually provision throughput or enable autoscale throughput.
Serverless is consumption-based and charges for the resources used by database operations, with no minimum cost. It is best suited for workloads with low traffic, occasional spikes or bursts, and moderate performance requirements.
Detailed comparison is available here.
Optimized Query Selection / Writing
In Azure CosmosDB we can query data by writing SQL as JSON Query Language. We can query data by using Azure Data Explorer and Azure CosmosDB Emulator. We can check provision throughput and Query RU Cost. In Cosmosdb Count and Group by query consumes more RU. Query selection will be based on RU costing. We must need to pass partition key in queries. SQL API works on JSON values, Result of query is a valid JSON value.

3. Integration: Database Connectivity SDK

CosmosDB provides SDK for database connectivity which supports different languages. CosmosDB provides CosmosClient, DocumentClient as SDK. Document Client is an older version of SDK. We need to always choose latest version of CosmosClient to store data in database and query data from database. New SDK supports both async and sync whereas the older SDK supports sync.
CosmosClient is thread safe. We need to maintain a single instance of client per lifetime which enable efficient connection and managed performance.
It is also supporting retry policies. We can set MaxRetryAttemptsOnThrottledRequests and MAXRetryWaitTimeInSeconds.
There are different connectivity modes in CosmosDB connections.
  • Gateway mode:

It is supporting all SDK platforms. It uses standard HTTPS port and Single DNS endpoint. Gateway mode is best choice for secure connection. Gateway mode requests create by client are routed to a server and send requests to the appropriate partitions in AzureCosmosDB backend.

Protocol – TCP:  CosmosDB Port :443 

Connection Protocol- HTTPS: Default 443

  • Direct mode

It is supporting on .NET and Java SDK. This mode provides selection between TCP or HTTPs.

Direct mode is based on condition the route that data plan request and document read and writes, take from client machine to partitions in Azure CosmosDB backend and send request directly.

Protocol – TCP:  CosmosDB Port :443 , different port range between 10000 to 20000

Connection Protocol- HTTPS: Default 443

  • string cosmosDbEndpoint = new Uri(commosdbconnectionendpont);
  •  string authKey =”comosdbconnection key”;
  •  DocumentClient client = new DocumentClient(cosmosDbEndpoint, authKey,
  •  new ConnectionPolicy
  •  {
  •     ConnectionMode = ConnectionMode.Direct,
  •     ConnectionProtocol = Protocol.Tcp
  •  });

Optimistic Concurrency Control allows to prevent lost updates and delete. Every item stored in container has “_etag” property. “_etag” value automatically generated and updated when document update. “_etag” value if match request header is no longer current “_etag” value then server rejects and throw error. By using “_etag” we can prevent lost of changes if request comes in single time. This is reapplying updates and retry the original request.

Below are few practical implementation tips / best practices :

  1.  Use Bulk Executor

Bulk executor supports bulk import, update and delete operations.  It is consuming more Rus in Upsert queries. It is supporting in .NET Core API. 

Bulk executors create batches and all operations in group and upset records in containers. A loop will go through documents to be imported by partition and the partition range id will be used to get the list of documents to import by partition.

  • How Continuation Token works? 

CosmosDB provides continuation token to fetch number data from container. We can say if we want apply paging then we need to use Continuation token. It is used to recreate the state of index and track progress of execution.

Query result will be dividing in multiple pages.  We can “MaxItemCount” to -1 if we want to fetch all data from container. We can set number of records in “MaxItemCount” which we want to fetch from container.

Continuation Token works on query which we pass, it will return combined object of multiple value as a Continuation Token. One thing we need to understand whatever we received from continuation token we need to pass exact value to fetch next page data from container.

 you can manage continuation tokens with the x-ms-continuation header. You cannot use it for queries like GROUPBY. 

Continuation token look like 

{\”token\”:\”+RID:qqhuAMREDACTEDAAAAAAA==REDACTED:AQwAAAAAAAAAEwAAAAAAAAA=\”,\”range\”:{\”min\”:\”\”,\”max\”:\”FF\”}}

This all values are not fixed; token value can change in length also new values like range or any other can come. Continuation token format is not fixed we only need to pass whatever we have received from previous request.

To put in nutshell, Azure CosmosDB provides variety of advantages for better performance of services and cost effective. We need to take care some points white integrating CosmosDB 

Tagged:

SAY HELLO

Follow Us on Social Media

en_USEnglish