Why IoT needs simulation instead of load testing?
There are more connected devices on the planet today than the total human population, and this number will continue to grow manifold. IoT performance testing teams are facing an unprecedented challenge today. How could they prove the reliability of their IoT infrastructure to handle millions of connected devices? How would they ensure the coverage of the entire spectrum of possible real world IoT use case scenarios? How to automate orchestration, execution and the entire results analysis so that they could focus on the key metrics of system performance? It turns out that developing a comprehensive testing strategy which could challenge the limits of their IoT infrastructure is a major engineering challenge in itself.
Test case writing: The end of an era
Traditional QA approach of pre-defining test cases and executing them sequentially becomes limited in meeting the exponential demands of IoT. The usual method of triggering a set of input and then validating the output from the system becomes suboptimal due to the nature of the system under test. A combination of varying sensor data, device states, combined with possible error scenarios, complex device to device interaction and unreliable network conditions will generate nearly an infinite amount of test vectors. A comprehensive IoT performance test needs to shift its focus on the macro level and define overall system test objectives.
Let's consider a hypothetical case study. Company A is an industry leading manufacturer of smoke detector and automated fire extinguisher system. It is now shifting towards IoT and building a connected system which has both local intelligence (on premises decision making capability about fire detection and extinguishing) as well as the central command and control system which would monitor all of the company's fire safety installation across the globe. The engineers who have built the platform claims to support up to 5 Million endpoints. Now it's the turn of the QA team to validate it.
A simple test case for such a platform would be following:- Simulate a connected device reporting temperature more than 100 degrees, ensure that system detects a fire alarm locally and reports it to the cloud backend. Ensure the fire extinguishers are activated within the 90 seconds, and authorities notified automatically.
Sounds easy! Right? A QA engineer could quickly write a simple script in his/her favorite Test tool on his desktop, connect with the cloud platform under test and execute the test within an hour.
Now consider the following real world challenges to your test execution.
- 5 Million devices will need to be connected to the cloud platform. Assuming a single workstation could handle 10K devices, your team would need 500 desktop machines to execute full load. Even if you could spawn all those desktop machines in the cloud, or local data center, you will need a way to manage and execute all those test cases centrally. The bill to obtain the licenses of 5 Million endpoints with your favorite performance test software will probably reach up to a Million dollar.
- Each fire detector wakes up every 30 seconds and performs a temperature measurement. Your test needs to make sure that the sensor occasionally reports false positives (i.e., spurious readings above 100 degrees) and cloud platform starts a replacement procedure if the number exceeds a certain occurrence in a particular time.
- Your test needs to ensure that maximum guaranteed latency between reporting a certain fire incident and activation of sprinkler system should not exceed 90 seconds under any circumstances. If it does, your test need to capture the entire sequence of events which caused it.
- Your test will need to simulate multiple fires in the close geographic proximity so that the cloud platform could perform an escalated incidence reporting.
- At any given point of time, up to max 10% of the total fire alarm system can report the fire. (up to 500K) You will need to test what happens if system exceeds the capacity.
- The test needs to ensure that system should be able to handle messaging of up to 10,000 devices per second at peak load. There will be multiple generations of devices; the older one will speak CoAP, the new generation will talk MQTT and HTTP. Your test will test all such combinations
- Any problem report raised from QA will need to contain the entire history of messaging done between cloud and simulated device – to help the developer find the cause.
As more and more system requirement are understood, the list will grow very large. It is clear that testing such a complex system is not an easy task and requires considerable engineering within the QA team.
The challenges of IoT performance testing
It may not come as a surprise, but IoT is all about exponential scalability. The number of connected devices start from 100s initially but will easily grow to hundreds of thousands in a short span and even up to a million in few years. The tools and scripts to test thousands of devices will shortly become outdated and incapable of testing at such a large scale. The amount of time taken to prepare and run test cases as well as analyzing the results becomes extremely slow and error prone. Finally, the need to pinpoint a single malfunctioning endpoint from a million become the equivalent of finding the needle in the haystack.
IoT specific Test Objectives
The existing test tools are focussed on Web / API testing and have been designed keeping a human user in mind. The tools either record or capture a user flow and then try to automate the behavior with additional scripts.
The performance test objectives of IoT testing are completely different than traditional web testing. In the web domain, we usually monitor page load performance, API response time, the overall user experience and smooth user flow. Unlike the web, there are no humans involved in IoT devices. Instead, devices represent a combination of parameters known as "state". A state represents what IoT device is currently doing.
To take an example, when an IoT device is first installed, it will be commissioned or registered with the backend. Thereafter, it will mostly sleep and connect at regular interval to the backend. The IoT devices will monitor its various sensors at regular interval and then take a decision based upon a certain combination of sensor data. Depending upon the number of sensors, the device may do several actions and send several messages to the backend. Commands received from backend and message received from its peers would also influence the device behavior. Modelling such a complex interaction is not possible by recording a macro or any other method. Instead, this behavior needs to be programmatically modeled in your test.
Increased System Complexity
The interaction of the user with a web browser is easily limited to certain actions and flows. Now consider this scenario for IoT - a connected car can move from anywhere to anywhere within the city and could stop at any number of points. It could drive with a varying speed and abruptly accelerate or decelerate. Any of the thousand sensors installed within the car could report a malfunction. The car would interact with several other components such as traffic light, gas station, service station etc. The peer to peer communication between several connected cars could create a large number of test scenarios.
IoT has its own set of protocols which are still evolving and competing with each other. Testing the compliance of these protocols and testing backward/ future compatibility creates another major challenge for the QA teams. Today there are hundreds of cloud platform providers, each using their own implementation or a particular version of the open source component. Along with full compliance, measuring the efficiency of the protocol also remains a primary objective for QA team e.g. what are bandwidth savings if devices connect via CoAP instead of MQTT.
Introducing System Simulation for IoT performance testing
Smart city simulation of Rome with hundreds of connected cars under realtime traffic condition.
The complexity of IoT system and scalability challenges demand a ground up and holistic approach to test the IoT system performance. The performance testing tools of tomorrow need to evolve to match the complexity and scalability of the IoT. At IoTIFY, we saw a clear need in the market to solve these challenges ahead in time and therefore we designed our network simulation software from the ground up to meet the needs of the future enterprise. We have used the same software infrastructure component which is used in production today to build scalable cloud platforms and developed a performance simulation software out of it. The result is IoTIFY smart network simulator, which is first of its kind IoT performance testing software designed for cloud platforms. Here are the key facts which make it different.
1. A cloud first approach, which is horizontally scalable and could be leaned down to a single machine. The result is seamless scalability from Cloud, hybrid or event to the local desktops. The IoTIFY network simulator could truly match your IoT cloud platform and grow with it.
2. Advanced simulation capabilities such as drive simulation under real traffic conditions, location functionality, and custom payload generation, with several helper libraries.
3. An API driven standard interface which can be easily integrated with existing test ecosystem. The test results could also be exported to any database of your choice.
5. Multiprotocol support – The scripting is protocol agnostic, which means testers could easily create several versions of the same test with different protocols. We support COAP, MQTT, HTTP, LWM2M out of the box and many other protocols will be supported in the future.