A service provider probably has a technology-related service contract with the client to promise about the service level availability and usability.
Here we will discuss service level terminology about SLI/SLO/SLA and User-Facing:
What is SLI (Service Level Indicator)
SLI is a quantitative indicator (QPS, response time, latency, etc.) defined to measure some aspect of the level of service that provides:
Latency: How long it takes to return a response to a request.
Availability: the fraction of the time that a service is usable.
What is SLO (Service Level Objective)
SLO is a target value or range of values for a objectivity(qps 99.9%, response time <= 10ms, latency <=100ms, etc.) .
Latency: <=100ms Availability: 99.9%
What is SLA (Service Level Agreement)
SLA is a commitment or part or contract between the service provider and client, usually involve SLI, SLO and recovery time, etc. If service failed and reaches the SLA, service maybe need compensation, a refund (or other forms) by commitment.
Credit/Discount 10% 99.0% <= Availability < 99.9%
Credit/Discount 30% 95% <= Availability < 99.0%
What is User-Facing?
User-Facing is an application user dashboard interface.
USE Method
USE is the infrastructure monitoring indicator:
Utilization: CPU% Saturation: Concurrency Errors: Error Rate (Error numbers/time)
RED Method
RED Method is the client request monitoring
Rate: Requests/sec Error: Request Error Rate Duration: Request Latency
What is an effective way to monitoring User-Facing SLI/SLO and lower impact:
For example, AWS can be using the following process to monitoring User-Facing SLI/SLO:
First, collect log (HTTP access logs) from AWS ALB that provides “user-facing APIs” in a near real-time fashion.
Secondary, transfer access logs to data analytics pipeline, and store results as mastics in time-series database.
Finally, render metrics as a comprehensive dashboard.
Usually, we can set the SLI (like collect the error rate), and calculate and setting the SLO topic (like the latency).