Top 5 challenges in IoT platform testing
Internet of Thing has become a crowded and fiercely competitive marketplace today. Customers looking to choose an IoT platforms have to make some tough choices. Should they build their own platform or buy one? Should they go with large established players like AWS or risk going with smaller startups to get more flexibility and customization? Whatever may be the eventual choice, one thing is clear- the IoT platform needs a robust performance testing and validation framework to match its full potential. The area of IoT cloud platform testing is quite new and strategies used for web and application testing may not fully apply in this case. In this post, we discuss top challenges faced by performance and functional testing teams while testing an IoT cloud platform.
1. Thinking from a machine's perspective
One of the fundamental difference in IoT is that it is built for machines instead of humans. In the web or application domain, humans are the key users of entire systems, so the tests are validating responsiveness, functional correctness and overall user experience. In the case of IoT, devices take the central role and become the primary producer of data and consumer of the control actions from the cloud. They speak binary and rather complex protocols, which are hard to understand without proper tools. Devices have limited battery life and power, and data coming through each sensor requires a different kind of handling depending upon the context. The interaction of the machines with the system is less intense (e.g. 1 message a minute) but more frequent (i.e. 24x7). This fundamentally changes your entire approach to the testing IoT cloud platform.
- Monitoring, understanding and analyzing IoT protocols such as MQTT, CoAP.
- Developing scripts/function simulation for an IoT device behavior in real world (e.g. battery decay) for test automation.
- Generating contextual data to the cloud platform (i.e. correlated values of temperature and humidity) instead of feeding random numbers.
2. Wider scope of functional testing
Because devices are primary actors in the system, the scope of functional testing changes dramatically for IoT. From a cloud platform perspective, an individual device generates multiple streams of data. There could be various generations or types of these devices, which increases the types of data stream manifold. Each of these data stream needs to be interpreted in its own unique manner, sometimes in correlation with other streams. For example, a temperature value coming from a sensor needs to be correlated with a humidity sensor value to ascertain the fire situation and raise the alarm. Also, the business intelligence built on top of your IoT cloud platform needs additional input vectors for the testing. To take an example, if you are testing a car insurance IoT platform which rewards the user with the discount for eco-driving, you will need to simulate harsh braking and sudden acceleration conditions, along with the normal drive.
Another scope of added functional testing in IoT is device malfunctioning. E.g. an IoT device with a broken actuator needs to report its status back to the cloud platform so that a replacement process can be initiated and user of that device can be informed in advance. In the case of malfunctioning, the device needs to ignore certain control actions which could further complicate the situation. E.g. in case of a smart door lock, if the battery reaches a critical low threshold, locking mechanism should be disabled and the user notified about the failure unless the battery is replaced. This is to prevent the situation where the door is locked, and the battery is completely dead, leaving a home user unable to unlock his home. Similarly, if the device has not reported to the cloud for an unusually long time, this should be a critical event generated by the cloud platform.
- Understanding the device lifecycle and creating test vectors which cover complex scenarios.
- Validating IoT platform behavior and performance under erroneous conditions.
- Generating complex test inputs for testing Business applications written on top of the IoT platforms.
Top 10 criteria to choose the right IoT cloud platform.
While web performance testing focusses primarily upon handling sudden peak loads (e.g. Black Friday) and then average regular traffic, the IoT devices produce a consistently low rate traffic on a large scale. There is another significant difference about how scalability of IoT platform is tested in the field. The traditional web approach to test scalability and performance is to test for a smaller set of users and prove that a single server can handle the load of a given subset of users. This method is based on the assumption that with the added number of users, system load increases almost linearly. This approach is applicable to web only because the nature of communication between web browser and the server is one to one, i.e. a client-server relationship.
This approach fails to address challenges in IoT because as more devices get added to the system, the system complexity increases exponentially. The primary difference between web and IoT comes from the nature of communication protocol itself. Web services are usually implemented using HTTP, which is a client-server protocol. A user browsing a shopping website is not talking to other users who are online (unless being in a chat, which is again limited to few users at the most.) Most of the contents served to the user are static in nature, so they could be offloaded to be served via CDN. Assuming that a user sends 1 live request per seconds, a total load of server at any given time is maximum to a number of user online time request per second).
Compared this to IoT scenario where MQTT a publish-subscribe protocol is being used. Any device can publish to any topic of interest and can subscribe to any other topic of interest as well. Let's consider the worst case scenario that each device publishes 1 message per second and all other devices subscribe to all publication as well (Root topic in MQTT). If the number of devices is N, the server not only needs to receive N messages per second but also distribute all those messages to N-1 subscribers within a second. So total messages handled by the server would be N + N*(N-1) = N^2 per second.
For 10 devices => 100 Messages per second
For 1000 devices => 1,000,000 Messages per second
For 10,000 devices => 100,000,000 Messages per second
For a relative simpler load of 10,000 devices, your server infrastructure would need to handle a maximum peak load of 100 Million messages per second. For most of the practical applications, this worst case situation will not happen. However, the exponential nature of traffic will certainly cause additional overheads under peak traffic.
- Developing and orchestrating test for full system scalability and performance.
- Finding performance bottlenecks due to different communication topologies of the IoT
Smart city simulation by IOTIFY to load test cloud platform
SYSTEM SIMULATION AS AN ALTERNATIVE?
The number of test vectors required for comprehensive testing of the IoT platform is so large that it demands a fresh look into planning and execution of the testing strategies. At IoTIFY we believe the right way to test IoT platform is to build the large-scale system simulation of the virtual IoT devices which act like real-world counterparts. We call this approach lightweight digital twinning. These virtual models live and function in the cloud and are highly scalable. The simulation also includes communication channel modeling to simulate real-world network conditions. The test vectors from these virtual models are autonomously generated and cover the entire spectrum of possible input values. Therefore, the cloud platform could be tested in their entirety using the approach with reduced efforts.
4. Measuring latency
It is widely known that a delay in website response may often cause people to lose interest and abandon their shopping cart. Sadly, the problem only gets worse when it comes to IoT. Increased latency directly affect battery life for constrained devices as they have to wait longer for the response. Under normal conditions, devices could cope up with slight delays. However, in some cases, a delayed decision could severely reduce the effectiveness of the entire solution. E.g. in an industrial IoT solution, the control of a gas valve must be done with little or negligible delays. Failure to delay such action may result in catastrophic consequences. Similarly, a critical engine warning should be promptly displayed to the driver to get his attention. Any delay in notification of parameters such as high pressure may result in irrecoverable losses. The overall response duration of the IoT platform consists of two parts, communication latency and the processing latency within the cloud platform. Since communication latency is mostly external and fixed due to the choice of communication media (i.e. WiFi/LoRaWAN), the processing latency becomes the most important parameter for testing perspective.
Measuring systemwide response latency over the entire fleet of IoT could be achieved in multiple ways. First, the physical devices themselves could record the latency of each transaction and report it back in each subsequent transactions. Since system timings could be out of sync, each device should send its own timestamp in the outgoing message which should be echoed back from the cloud platform in response. Network connectivity is often shoddy as machines are located in remote corners of the worlds, producing data 24x7 over the wireless channel to the backend. In this case, it is also important to model the communication channel characteristics to your test systems.
- Finding KPI related to the processing, communication latency and its effect on IoT system performance.
- Modelling the different networking conditions and understanding the impact of deterioration on the system performance.
No discussion on IoT platform testing could complete without discussing security. Securing IoT cloud platform is a multipronged strategy and consists of identifying and managing attack surfaces. There are several general principles of cloud security which also apply to the IoT cloud platform. However, we will keep our focus on the area which is specific to an IoT cloud testing.
- 1. Securing Device onboarding/certificate enrollment
Usually, devices are onboard to acquire credentials which would subsequently authenticate them with the cloud platform. The onboarding platform could be provided by your IoT cloud platform itself (e.g., AWS certificate enrollment) or could be managed separately with organization’s Public Key Infrastructure. Testing this infrastructure for scalability and functional correctness is essential and should be automated as a part of device lifecycle testings.
- 2. Testing for communication protocol security
Once the devices have proper credentials, they will use them to encrypt the communication to IoT cloud platform. Encryption/Decryption are primarily CPU bound activities, so focus should be to test the capacity of the cloud platform for terminating the secure session.
- 3. Validating Privilege levels and ACL
Each authenticated device has only limited rights to access the cloud platform. E.g., a device using its credentials should only be able to add the data to the database, not delete the past historical data.
- 4. Testing against the DDoS attack.
The possibility of your IoT platform coming under DDoS attacks is always there. IoT seems to be the sweet spot for conducting globally distributed cyber attacks. The Mirai botnet infected millions of unsecured IoT devices and caused a massive outage on world’s top websites and service providers. Though generating a real-world DDoS attack is out of the picture, the testing teams should ensure that IoT platform could perform under most severe network load and still maintain a guaranteed quality of service. One of the methods to ensure this is to load the system with a full-scale negative test (such as sending incorrect object Id) and validating the system is responding correctly to valid requests.
- 5. Certificate revocation and re-enrollments
An IoT device could get stolen or could be deactivated due to hardware faults. In this case, the credentials allocated to the device must be revoked. The cloud platform should be able to reject access to such blacklisted device. Similarly, certificate or credential re-enrollment should also be tested, once the lifetime of the certificate expires.