The percentile latency of an endpoint is widely accepted as a measure of system performance. However, most teams today measure this metric in their production environment. While this may be acceptable at light loads, it fails to capture the performance expected with the peak traffic in production.

In this article, we will explore some of the limitations of the industry standard methods that pertain to measuring the percentile latency of a REST endpoint. We will also examine a possible alternative method using IoTIFY to address these limitations.
Percentile latency of a REST endpoint
The percentile latency of a REST endpoint is a measure of the performance of the endpoint under different load conditions. It represents the time the endpoint takes to process a request and return a response and is usually expressed in terms of the N’th percentile, where N is a value between 0 and 100.
The percentile latency is important because it can help to identify performance issues or bottlenecks in an application. For example, if the 99th percentile latency of a REST endpoint is high, it may indicate that the endpoint is struggling to handle the workload and is taking a long time to process requests. This could lead to a poor user experience and may impact the application’s overall performance.
The Problem
Most teams today measure the percentile latency in two stages:
- In staging or testing, using mock load generators with limited scale.
The drawback here is that most tools are not capable of generating the peak load that a system can expect in production. Moreover, these tools are also generally incapable of simulating complex scenarios or sequences of tests.
2. Live in production using an Application Performance Monitoring tool.
The issue here is quite apparent. An issue will be caught only after it manifests in the production environment. This means that it has already caused some undesired user experience.
Measuring Percentile latency effectively in Testing/Staging
IoTIFY is a tool proven to work at a Million+ scale. Being a tool designed to mimic IoT Device behaviour, there are many tools on the platform that can be leveraged to test a plethora of use cases effectively. Using IoTIFY, you can easily simulate a full peak production load in your staging environment and study your system behaviour.
De-risking your performance and scaling challenges in this manner means that you are already prepared for the worst-case scenario before the code moves to production. For this example, we will use a Sample Test on IoTIFY that tests the N’th Percentile Latency of one endpoint. The same can be extended to test a sequence of events very easily.
You can start by creating a free account at IoTIFY.io and familiarizing yourself with the basics of the platform with our Docs and the Meet IoTIFY playlist on YouTube.
Let’s get started by creating the sample templates from the IoTIFY Tests section. Click on ‘Create from Sample’ and choose the ones shown below.

Sample Test to measure Percentile Latency
Once the test has been created, you can set the value of the percentile threshold and the endpoint URL.
This test can run with any number of parallel clients. Requests are sent from the first iteration to the penultimate iteration. Each client pushes its calculated latency to a Mailbox (Learn more about the Mailbox API here), and the same is aggregated by the first client in the subsequent iteration.
Before you run the test, you will need to create a Run setting to specify the number of clients, number of iterations, etc.



Run Settings for Latency Test
Once the test has finished running, you can go to the Metrics tab to view the results. Note that this example has been tested with 10000 clients against a simple NGINX server running on a cloud VM.






A plot of failed requests
In Conclusion
Measuring system performance under load in staging environments can greatly de-risk chances of failure in production. IoTIFY gives you the toolset to benchmark and study your system performance under load as well as in normal conditions. Using the IoTIFY Metrics and Mailbox APIs, we were able to test the percentile latency of a sample endpoint.
If you would like to discuss any of the topics mentioned in this post, please Contact Us, and we can offer a Free Consultation for your use case. You can try the platform out yourself by signing up at IoTIFY.io