A service provider probably has a technology-related service contract with the client to promise about the service level availability and usability.

Here we will discuss service level terminology about SLI/SLO/SLA and User-Facing:

What is SLI (Service Level Indicator)

SLI is a quantitative indicator (QPS, response time, latency, etc.) defined to measure some aspect of the level of service that provides:

Latency: How long it takes to return a response to a request.

Availability: the fraction of the time that a service is usable.

What is SLO (Service Level Objective)

SLO is a target value or range of values for a objectivity(qps 99.9%, response time <= 10ms, latency <=100ms, etc.) .

Latency: <=100ms Availability: 99.9%

What is SLA (Service Level Agreement)

SLA is a commitment or part or contract between the service provider and client, usually involve SLI, SLO and recovery time, etc. If service failed and reaches the SLA, service maybe need compensation, a refund (or other forms) by commitment.

Credit/Discount 10% 99.0% <= Availability < 99.9%

Credit/Discount 30% 95% <= Availability < 99.0%

What is User-Facing?

User-Facing is an application user dashboard interface.

USE Method

USE is the infrastructure monitoring indicator:

Utilization: CPU% Saturation: Concurrency Errors: Error Rate (Error numbers/time)

RED Method

RED Method is the client request monitoring

Rate: Requests/sec Error: Request Error Rate Duration: Request Latency

What is an effective way to monitoring User-Facing SLI/SLO and lower impact:

For example, AWS can be using the following process to monitoring User-Facing SLI/SLO:

First, collect log (HTTP access logs) from AWS ALB that provides “user-facing APIs” in a near real-time fashion.

Secondary, transfer access logs to data analytics pipeline, and store results as mastics in time-series database.

Finally, render metrics as a comprehensive dashboard.

Usually, we can set the SLI (like collect the error rate), and calculate and setting the SLO topic (like the latency).