Skip to content

Conversation

@tbuchaillot
Copy link
Contributor

@tbuchaillot tbuchaillot commented Jan 29, 2026

Description

Related Issue

Motivation and Context

How This Has Been Tested

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Refactoring or add test (improvements in base code or adds test coverage to functionality)

Checklist

  • I ensured that the documentation is up to date
  • I explained why this PR updates go.mod in detail with reasoning why it's required
  • I would like a code coverage CI quality gate exception and have explained why

Ticket Details

TT-16439
Status In Code Review
Summary Improve AccessLogs Error observability 5XX

Generated at: 2026-01-30 12:56:08

@github-actions
Copy link
Contributor

github-actions bot commented Jan 29, 2026

API Changes

--- prev.txt	2026-01-30 12:56:53.917821433 +0000
+++ current.txt	2026-01-30 12:56:43.182883227 +0000
@@ -6030,18 +6030,29 @@
 	//
 	// Template Options:
 	//
-	// - `api_key` will include they obfuscated or hashed key.
-	// - `client_ip` will include the ip of the request.
+	// - `api_key` will include the obfuscated or hashed key.
+	// - `circuit_breaker_state` will include the circuit breaker state when applicable.
+	// - `client_ip` will include the IP of the request.
+	// - `error_source` will include the source of an error (e.g., ReverseProxy).
+	// - `error_target` will include the target that caused an error.
 	// - `host` will include the host of the request.
+	// - `latency_gateway` will include the gateway processing latency.
+	// - `latency_total` will include the total latency of the request.
 	// - `method` will include the request method.
+	// - `org_id` will include the organization ID.
 	// - `path` will include the path of the request.
 	// - `protocol` will include the protocol of the request.
 	// - `remote_addr` will include the remote address of the request.
-	// - `upstream_addr` will include the upstream address (scheme, host and path)
+	// - `response_code_details` will include detailed error description for 5XX responses.
+	// - `response_flag` will include the error classification flag (e.g., URT, UCF, TLE).
+	// - `status` will include the response status code.
+	// - `tls_cert_expiry` will include the TLS certificate expiry date when applicable.
+	// - `tls_cert_subject` will include the TLS certificate subject when applicable.
+	// - `trace_id` will include the OpenTelemetry trace ID when tracing is enabled.
+	// - `upstream_addr` will include the upstream address (scheme, host and path).
 	// - `upstream_latency` will include the upstream latency of the request.
-	// - `latency_total` will include the total latency of the request.
+	// - `upstream_status` will include the upstream response status code for 5XX responses.
 	// - `user_agent` will include the user agent of the request.
-	// - `status` will include the response status code.
 	Template []string `json:"template"`
 }
     AccessLogsConfig defines the type of transactions logs printed to stdout.
@@ -8313,6 +8324,10 @@
     GetDefinition will return a deep copy of the API definition valid for the
     request.
 
+func GetErrorClassification(r *http.Request) *errors.ErrorClassification
+    GetErrorClassification retrieves the error classification from the request
+    context. Returns nil if no error classification has been set.
+
 func GetOASDefinition(r *http.Request) *oas.OAS
     GetOASDefinition will return a deep copy of the OAS API definition valid for
     the request.
@@ -8321,6 +8336,10 @@
 func SetDefinition(r *http.Request, s *apidef.APIDefinition)
     SetDefinition sets an API definition object to the request context.
 
+func SetErrorClassification(r *http.Request, ec *errors.ErrorClassification)
+    SetErrorClassification sets the error classification for the request
+    context. This is used to store structured error information for access logs.
+
 func SetOASDefinition(r *http.Request, s *oas.OAS)
     SetOASDefinition sets an OAS API definition object to the request context.
 
@@ -8367,6 +8386,8 @@
 	SelfLooping
 	// RequestStartTime holds the time when the request entered the middleware chain
 	RequestStartTime
+	// ErrorClassification holds structured error information for access logs
+	ErrorClassification
 	// MCPRouting indicates the request came via MCP JSON-RPC routing
 	MCPRouting
 )
@probelabs
Copy link

probelabs bot commented Jan 29, 2026

This pull request introduces a comprehensive error classification system to improve the observability of 5xx and other upstream errors in access logs. It replaces generic error codes with structured, detailed information, enabling operators to quickly diagnose the root cause of failures such as TLS issues, connection timeouts, or circuit breaker events.

Files Changed Analysis

The changes are centered around a new internal/errors package, which introduces a structured error model (ErrorClassification) and a robust classifier for various upstream Go errors. This new functionality is integrated throughout the gateway:

  • internal/errors: A new package containing classification.go (defining the error structure and response flags like TLE, UCF), classifier.go (mapping Go errors to the new structure), and extensive tests.
  • ctx/ctx.go: The request context is now used to propagate error classification details from the point of detection to the logging middleware.
  • gateway/reverse_proxy.go: Connection errors (e.g., connection refused, DNS issues) and circuit breaker trips are now classified and attached to the request context.
  • gateway/handler_success.go: Upstream 5xx responses are classified to distinguish them from gateway-level connection failures.
  • gateway/middleware.go & internal/httputil/accesslog/record.go: The access logging mechanism is updated to read the new error details from the context and include them in the logs.
  • config/config.go: The configuration is updated to document the new template variables available for access logs (e.g., response_flag, response_code_details, error_target).

Architecture & Impact Assessment

  • What this PR accomplishes: It significantly improves the diagnostic value of access logs for upstream errors, moving from opaque error codes to a rich, structured format. This allows for faster troubleshooting, more precise monitoring, and more effective alerting on specific failure modes.

  • Key technical changes introduced:

    1. Structured Error Model: A new ErrorClassification struct provides a standardized vocabulary for upstream failures, inspired by Envoy's response flags.
    2. Context Propagation: Error details are passed via the request context (context.Context), cleanly decoupling error detection from the logging system.
    3. Centralized Classifier Logic: A robust classifier in internal/errors/classifier.go handles a comprehensive set of error types, ensuring consistent and accurate classification.
  • Affected system components: The changes primarily impact the Gateway Reverse Proxy, Request Context Management, and the Access Logging system.

Error Classification Flow

sequenceDiagram
    participant Client
    participant Gateway as Tyk Gateway
    participant Upstream

    Client->>Gateway: Request
    Gateway->>Upstream: Forward Request

    alt Upstream Connection Fails in reverse_proxy.go
        Upstream--xGateway: Error (e.g., syscall.ECONNREFUSED)
        Gateway->>Gateway: ClassifyUpstreamError(err)
        Gateway->>Gateway: ctx.SetErrorClassification(req, classification)
        Gateway-->>Client: 502 Bad Gateway
    else Upstream Responds with 5xx in handler_success.go
        Upstream-->>Gateway: HTTP 503 Service Unavailable
        Gateway->>Gateway: ClassifyUpstreamResponse(503)
        Gateway->>Gateway: ctx.SetErrorClassification(req, classification)
        Gateway-->>Client: 503 Service Unavailable
    end

    Note over Gateway: Request finishes, logging middleware runs
    Gateway->>Gateway: middleware.go: GetErrorClassification(req)
    Gateway->>Gateway: accesslog/record.go: WithErrorClassification(ec)
    Gateway->>Gateway: Write structured access log with error-details
Loading

Scope Discovery & Context Expansion

  • Broader Impact: This feature is a significant operational enhancement. It empowers SREs and operators to build sophisticated dashboards and alerts based on the new structured log fields (response_flag, response_code_details). This can dramatically reduce Mean Time To Resolution (MTTR) for upstream failures by providing actionable insights directly within the access logs.

  • Context Expansion: The implementation correctly handles two distinct failure scenarios:

    1. Connection Failures: In gateway/reverse_proxy.go, when an upstream connection fails (e.g., connection refused, circuit breaker open), the error is classified and set on the request context before the error handler records the access log.
    2. Upstream 5xx Responses: In gateway/handler_success.go, if the upstream connection succeeds but returns a 5xx status code, the response is classified and set on the original request context, which is then used by the access logger at the end of the request lifecycle.

This dual approach ensures that both types of upstream failures are correctly captured and logged with the new structured error details.

Metadata
  • Review Effort: 4 / 5
  • Primary Label: enhancement

Powered by Visor from Probelabs

Last updated: 2026-01-30T12:59:14.807Z | Triggered by: pr_updated | Commit: 32fb79e

💡 TIP: You can chat with Visor using /visor ask <your question>

@probelabs
Copy link

probelabs bot commented Jan 29, 2026

Security Issues (3)

Severity Location Issue
🟡 Warning gateway/reverse_proxy.go:1291
The 'error_target' field, which is logged in access logs, is constructed using the full request path (`outreq.URL.Host+outreq.URL.Path`). If the URL path contains sensitive information such as session tokens, personally identifiable information (PII), or other secrets, this information will be exposed in the logs. A similar issue exists on line 1275 for circuit breaker errors.
💡 SuggestionTo prevent sensitive data leakage, construct the 'error_target' using only the host and scheme, omitting the path. The implementation in `gateway/handler_success.go` already does this correctly by only using `r.URL.Host`. A consistent approach should be used.
🟡 Warning internal/errors/classifier.go:111
The subject of a TLS certificate (`tls_cert_subject`) is extracted and logged for certain TLS errors. Certificate subjects can contain personally identifiable information (PII) like email addresses, names, and organizational details. Aggregating this potentially sensitive data in logs can create privacy risks and may violate compliance regulations like GDPR.
💡 SuggestionLogging of the TLS certificate subject should be disabled by default. Introduce a configuration option to allow operators to explicitly enable it, and clearly document the privacy implications of doing so.
🟡 Warning internal/errors/classifier.go:242-313
The `classifyByErrorString` function performs multiple string-matching operations on the full error message from an upstream service. If an attacker can cause the upstream to return exceptionally long error messages, this could lead to excessive CPU and memory consumption in the gateway during error classification, potentially creating a denial-of-service vulnerability through resource exhaustion.
💡 SuggestionMitigate this risk by truncating the error message to a reasonable maximum length (e.g., 1024 or 2048 characters) before performing string comparisons. This ensures that classification performance is not degraded by malicious or malformed error responses.

Architecture Issues (1)

Severity Location Issue
🟠 Error gateway/reverse_proxy.go:1275-1290
The `error_target` for error classification is constructed using `outreq.URL.Host + outreq.URL.Path`. Including the full path in the target can lead to logging sensitive information present in URL paths (e.g., session tokens, PII). Furthermore, this is inconsistent with `gateway/handler_success.go`, where the target is correctly identified as only the host. The target should consistently represent the upstream service endpoint (host and port), not the specific resource path.
💡 SuggestionModify the calls to `ClassifyCircuitBreakerError` and `ClassifyUpstreamError` to use only `outreq.URL.Host` as the target, aligning with the safer and more consistent implementation found in `gateway/handler_success.go`.

✅ Performance Check Passed

No performance issues found – changes LGTM.

Quality Issues (1)

Severity Location Issue
🟡 Warning gateway/reverse_proxy.go:1275-1294
The `error_target` field in access logs is constructed using the upstream host and the full request path (`outreq.URL.Host + outreq.URL.Path`). Logging the full path can expose sensitive information that might be present in path parameters (e.g., session tokens, user IDs, PII). This is also inconsistent with the implementation for 5xx responses in `handler_success.go`, which correctly uses only the host as the target.
💡 SuggestionTo prevent potential sensitive data exposure and maintain consistency, the `error_target` for connection-level errors should only contain the upstream host. Modify the calls to `ClassifyCircuitBreakerError` (line 1275) and `ClassifyUpstreamError` (line 1294) to pass only `outreq.URL.Host` as the target.

Powered by Visor from Probelabs

Last updated: 2026-01-30T12:59:17.778Z | Triggered by: pr_updated | Commit: 32fb79e

💡 TIP: You can chat with Visor using /visor ask <your question>

@tbuchaillot
Copy link
Contributor Author

tbuchaillot commented Jan 29, 2026

Security Issues (3)

Severity Location Issue
🟠 Error gateway/reverse_proxy.go:1269-1276
Error classification details for connection failures and circuit breaker events are set on the context of a cloned request (logreq), but the access logging middleware reads from the original request's context (req). This flaw prevents critical error details, including security-relevant TLS handshake failures, from being logged, undermining the feature's observability goals.
💡 Suggestion
🟡 Warning gateway/reverse_proxy.go:1275
The error_target field in access logs includes the full request path, which may contain sensitive information such as session tokens, PII, or other secrets passed as path parameters. Logging this data can lead to sensitive information exposure.
💡 Suggestion
🟡 Warning internal/errors/classifier.go:111
The subject of a TLS certificate (tls_cert_subject) is logged on certain TLS errors. Certificate subjects can contain personally identifiable information (PII) such as email addresses, names, and organizational details. While this is public information, aggregating it in logs can create privacy risks.
💡 Suggestion

Architecture Issues (1)

Severity Location Issue
🟠 Error gateway/reverse_proxy.go:1270
Error classification is set on a copied request context (logreq) instead of the original (req) at two locations (lines 1270 and 1287). The access logging middleware reads from the original request's context, causing classifications for connection errors and circuit breaker events to be lost and not appear in the logs.
💡 Suggestion

✅ Performance Check Passed

No performance issues found – changes LGTM.

Quality Issues (1)

Severity Location Issue
🔴 Critical gateway/reverse_proxy.go:1268-1286
The error classification is being set on logreq, which is a shallow copy of the original request. The access logging middleware later reads this information from the original request's context (req), where it will not be found. This will cause connection-level errors (e.g., connection refused, circuit breaker open) to be missing from the access logs, undermining a primary goal of the feature.
💡 Suggestion
🔧 Suggested Fix

ctx.SetErrorClassification(req, errClass)

Powered by Visor from Probelabs

Last updated: 2026-01-29T19:21:51.615Z | Triggered by: pr_updated | Commit: c2279f4

💡 TIP: You can chat with Visor using /visor ask <your question>

/visor ask RecordAccessLog is called from (e *ErrorHandler) HandleError, which uses logreq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants