What we've learned simulating Millions of IoT endpoint
You've heard it probably a thousand times. It's so common to find references of Billions of connected IoT devices in articles around the media today. Scalability has been one of the biggest challenges of IoT ecosystems. The reality, however, today is entirely different.
Contrary to popular perception, most of the companies are still in the early stages of IoT deployment. Research conducted by Strategy Analytics found that 35 percent of companies with IoT deployments have fewer than 100 devices connected. Of the 70 percent of current IoT deployments in the US, the company found these cover less than 500 devices in total. The claim to support millions of nodes, therefore, is often an expertise of a selected few vendors or only subject to internal validation within engineering teams of the cloud platform providers.
Recently we put the theory of scalability into the practice and ventured out to simulate a Million IoT endpoints and go even beyond. The idea was to understand and identify the primary bottlenecks in the overall system which will slow down the scalability of an IoT solution. Another motivation was to test the linearly scalable design of our device simulation software (NSim) and see if we could deliver Millions of simulated endpoints in production. We've learned so many things during these tests that we wanted to share few findings which could benefit everyone:-
Focus on the right metrics
Design for scale and gracefully fail has been one of the most important principles we have followed while building network simulator. While evaluating various SW vendors for different brokers and cloud platform, we often hear impressive metric such as able to handle 1 Million messages/second - while other important metrics were conspicuously absent. Imagine yourself buying a sports car in a showroom. It is not only important to know about the top speed of the car, but also how fast it can reach that speed (0 -100 in seconds). Similarly, while advertising maximum messages/second handling, it's often overlooked how many new connections/second can be handled by the infrastructure, which is quite important in the real world IoT condition.
In real life, IoT devices are mostly asleep and not communicating. They wake up at a regular interval, establish the connection to the server, send/receive data, take some actions and sleep again if there is nothing more to do. To conserve data and battery life, it is important that when devices wake up, they have a highly available infrastructure which can quickly establish a session and then can be quickly torn down. In the worst case, all of them may want to connect at once (such as after a major power outage). Therefore, in practical scenarios, it is important to measure how many new connections your server could safely handle per second (i.e. High Availability) and how well it could recover from an outage situation.
Interestingly, handling new connections/second is primarily affected by your OS and underlying hardware performance. During our test with Eclipse Californium, we measured the throughput between two AWS C4.2xlarge instance in same availability zone. It turns out packet/second rate peaks at approximately 38K/second for UDP and 33K/second for TCP. The overall bandwidth between two nodes turned out to be 1Gbps as advertised by the vendor. This gives us a fundamental limit of how fast your IoT devices could be handled by the system.
Maximum Packet/seconds during iperf3 UDP Test
Maximum Packet/seconds during iperf3 TCP Test
Offload TLS/DTLS - Always
This may not come as a surprise, but SSL termination is essentially a CPU bound activity and must be offloaded to external servers if you want to preserve CPU for your brokers to do something useful. During our tests, the performance of the server under test became so miserable that we had to turn off DTLS for the rest of the test cases.
Zero Day Load Balancing
To overcome the inherent limitation of underlying OS and hardware, you will need horizontal scaling. Here comes the need for a Load Balancer (such as HAproxy, Nginx) and clustering support from your MQTT / CoAP broker. Load balancing should be the part of your architecture from Day 1, it is not something which you should defer later. Note that increasing cluster size increases the inter-dependency as well, so beyond a point, clustering performance starts degrading rapidly.
An ideal solution is to combine a DNS load balancer with cluster setup. Using geolocation, the clients could be directed to the closest data center, while clustering provides additional scalability within a particular data center.
For the scope of this test, we didn't use clustering as our focus was on measuring the performance of a single server.
System Performance Tuning
Linux system performance should be optimized to handle the peak traffic conditions. We made following changes to system configuration parameters:-
- Overcome ephemeral port limitation by increasing number of available ports (parameter net.ipv4.ip_local_port_range)
- Increase max open file handles limits
- Increment default receive socket buffer size
- Disable IPv6 (We didn’t need that, however, may not be a good idea for the production)
- Disable IP routing (Again, we didn’t need it but not a good idea for the production)
- Changed TCP Finish Wait timeout to 30 seconds (net.ipv4.tcp_fin_timeout)
Test 1. Let all hell break loose
Our first idea was to bombard a single Eclipse Californium instance to as many packets as we can. We spawned half a million nodes and started sending CoAP PUT request to a common URL to Californium. The interval between each request was one second and there were 30 iterations. As you would've rightly expected, not a single request was successful out of 15 Million put request. But that was kind of expected. Isn't it?
Surprisingly CoAP server performed really well and most of the packets were discarded or dropped.
Important Server Parameters during 500K clients test
Test 2. Let the sanity prevail
Once we were happy with our little experiment, it was time to measure something practical. We now focussed on the maximum peak load the server could handle without destroying the world. We knew from the previous experience that the peak message/second is 30K. So we spawned 30K clients and sent a CoAP PUT request each second. Here is how the test went
CoAP server performed really well and most of the PUT requests were successful.
Important Server Parameters during 30K clients test
Finding the bottlenecks
Every infrastructure has certain choke points, which limit the overall performance of your system. Once our test clients were fine tuned and the majority of the issues were resolved, we faced a completely unexpected problem. Our network simulator generates information about each simulated device's virtual state and other statistics. All of this data needs to be stored in MongoDB so that user can see and analyze the results, once the test is complete.
It turned out MongoDB ingestion performance was way slower than what we expected. Initially, we were combining the results across the iteration in a single document and we quickly hit a 16MB single record limitation. Thereafter, we changed the schema to have individual records and iterations, however still the performance of the result were really slow. We tried sharding, in-memory replica-set clustering and various other optimization, however, MongoDB write performance was still the slowest element of the entire chain.
Finally, redis came to the rescue in this case. We stored results temporarily in a REDIS in-memory cluster and then wrote sync programs which will SCAN redis keys and put the records into MongoDB as fast as possible. Though this approach slowed down results retrieval, it greatly increased result storage process.
Need help in evaluating your IoT platform?
At IoTIFY, we have developed a scalable IoT simulation and performance test infrastructure which could validate the performance of your IoT platform at any scale today. Our smart device simulation tool supports multiple protocol and method of authentication as well as completely customizable message contents, deployable on-premises or on the cloud. Get in touch to learn more about how we could help.