Contact Us

Quick contact info

Call us at

USA : +1 919-592-5521

INDIA : +91-9606013311

UAE & OMAN : +971-52-764-2906

Email us at

Mar 22 2022 | by Mahammed Nasir

Analyze with:
A research on best way to archive large data 

Few months back I happened to work on SaaS based IOT project, a large project with at least 35+ microservices running on K8s cluster. The application is hybrid & multi cloud application, comprising technologies like .Net Core, python, PostgreSQL, MongoDB, GRPC and many other supporting open-source applications. The application components are hosted on multi cloud environment.  

One of the major components in the application was IOT data collection, storage & report generation, encompassing multiple microservices. System was receiving IOT signals from large number of devices in different frequencies. As per the client need the frequency could have been configured for each device from 5 Seconds ~ 60 Seconds, that is (12to1 rec)/min. Everyday system was accumulating massive amount of data on MongoDB system. To cut the cost some of the clients/tenants (in SaaS) did not want to retain these IOT data for very long duration. Hence, we decided to archive IOT data to some cold storage.  

I started going through with available options on the market. Following are the options initially came up to me: 

  1. HDFS 
  2. Bring up another Mongo Cluster for archival.  
  3. Store it on AWS S3 
  4. Store it on Azure Storage  

I ruled out first two options for some obvious reasons. Though AWS S3 could also have been solutions, I still started researching on Azure Storage, since it is my primary area. Table storage was right suited option we chose.   

Data size(bytes)/ Record  (KBs)/Record 
440  0.44

 

          Monthly Data Usage in (GBs)/ Below Devices 
Data Frequency 
(In Seconds) 
Records  
Per Minute 
Per Day  
Count 
Per Day  
data size (KBs) 
Monthly data Usage  
for 1 device (MBs) 
1000 2000 5000 10000 100000
60 1 1440  633.6  19.008  19.008  38.016  95.04  190.08  1900.8 
30 2 2880  1267.2  38.016  38.016  76.032  190.08  380.16  3801.6 
20 3 4320  1900.8  57.024  57.024  114.048  285.12  570.24  5702.4 
15 4 5760  2534.4  76.032  76.032  152.064  380.16  760.32  7603.2 
10 6 8640  3801.6  114.048  114.048  228.096  570.24  1140.48  11404.8 
5 12 17280  7603.2  228.096  228.096  456.192  1140.48  2280.96  22809.6 

 

The Azure Storage pricing appeared remarkably simple, I started sizing the data what is accumulating to estimate the monthly cost. Below table gave me clear cut information that 2.2TB is the maximum expected monthly data size for 100K devices.  

Next part was about estimating number of read & write operations per month. Since every device had custom data retention days, the solution was to fetch all expiring records, archive it and delete from primary database. There was no read operation expected in near time. Considering per device per record(/device/record) archival, the maximum estimated number of write operations per month was 22,809,600,000 for 100K devices.  

The azure estimate showed very high pricing for the above sizes. Primary reason for spike in price was number of writes were very high.  

 
Graphical user interface, text, applicationDescription automatically generated

 

To reduce the number of writes we decided pass 100 records per API call. That worked! we could reduce number API calls drastically and we saw high reduction in pricing.   

 
Graphical user interface, applicationDescription automatically generated
 
Data flow block diagram: 
 
DiagramDescription automatically generated

 

Conclusion: 

There are lot of options available on market for the data archival. Earlier days data archival was tedious work, as it involved setting up infra manually & maintaining it. The cloud storage removes the pain of maintaining the archival infra, it also provides multiple advantages like Redundancy, through which the data can be replicated to different regions.   

Browse other topics

Contact Us

Let's Talk Business - Engage Novigo as your solution provider and transform your business.

Send us a message.

Contact

  • +91 9148162015