Incident History
Service: Redis Cache (prod-redis-01) Issue: Cache evictions causing database overload and slow queries Environment: Production (us-west-2) Error: "used_memory exceeds maxmemory, eviction policy: allkeys-lru" Impact: Database query latency increased 10x, c
a82ac573
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: PostgreSQL Primary (prod-db-01) Issue: Connection pool saturated causing 503 errors on API Environment: Production (us-west-2) Error: "FATAL: remaining connection slots are reserved for non-replication superuser connections" Impact: All API reque
35d15f80
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: NGINX Load Balancer (prod-lb-01) Issue: CPU usage consistently at 95%+, causing slow response times Environment: Production (us-east-1) Error: Worker processes consuming excessive CPU, request queue building up Impact: API response times increase
b973f545
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: NGINX Load Balancer (prod-lb-01) Issue: CPU usage consistently at 95%+, causing slow response times Environment: Production (us-east-1) Error: Worker processes consuming excessive CPU, request queue building up Impact: API response times increase
c45bbe13
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: PostgreSQL Primary (prod-db-01) Issue: Connection pool saturated causing 503 errors on API Environment: Production (us-west-2) Error: "FATAL: remaining connection slots are reserved for non-replication superuser connections" Impact: All API reque
ec182e0f
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: Redis Cache (prod-redis-01) Issue: Cache evictions causing database overload and slow queries Environment: Production (us-west-2) Error: "used_memory exceeds maxmemory, eviction policy: allkeys-lru" Impact: Database query latency increased 10x, c
4a178f61
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: Kong API Gateway (prod-apigw-01) Issue: API Gateway returning 503 Service Unavailable for 15% of requests Environment: Production (multi-region) Error: "upstream connect error or disconnect/reset before headers. reset reason: connection failure"
4bdb4c5e
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: Redis Cache (prod-redis-01) Issue: Cache evictions causing database overload and slow queries Environment: Production (us-west-2) Error: "used_memory exceeds maxmemory, eviction policy: allkeys-lru" Impact: Database query latency increased 10x, c
12d8d7ad
mediumCompletedCreatedDuration—ServiceNowINC0010468WebexSentProducts
RedisService: Redis Cache (prod-redis-01) Issue: Cache evictions causing database overload and slow queries Environment: Production (us-west-2) Error: "used_memory exceeds maxmemory, eviction policy: allkeys-lru" Impact: Database query latency increased 10x, c
a6d597e2
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: Kong API Gateway (prod-apigw-01) Issue: API Gateway returning 503 Service Unavailable for 15% of requests Environment: Production (multi-region) Error: "upstream connect error or disconnect/reset before headers. reset reason: connection failure"
71c969b8
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: Kong API Gateway (prod-apigw-01) Issue: API Gateway returning 503 Service Unavailable for 15% of requests Environment: Production (multi-region) Error: "upstream connect error or disconnect/reset before headers. reset reason: connection failure"
4d6dc883
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: Redis Cache (prod-redis-01) Issue: Cache evictions causing database overload and slow queries Environment: Production (us-west-2) Error: "used_memory exceeds maxmemory, eviction policy: allkeys-lru" Impact: Database query latency increased 10x, c
06da2a3b
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: Kong API Gateway (prod-apigw-01) Issue: API Gateway returning 503 Service Unavailable for 15% of requests Environment: Production (multi-region) Error: "upstream connect error or disconnect/reset before headers. reset reason: connection failure"
6d491c7d
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: Redis Cache (prod-redis-01) Issue: Cache evictions causing database overload and slow queries Environment: Production (us-west-2) Error: "used_memory exceeds maxmemory, eviction policy: allkeys-lru" Impact: Database query latency increased 10x, c
5e83e01b
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: Elasticsearch Cluster (prod-es-cluster) Issue: Cluster status yellow, 45 unassigned shards detected Environment: Production (multi-AZ) Error: "unassigned_shards: 45, reason: INDEX_CREATED, allocation_decider: disk_threshold" Impact: Search querie
0ddc16a3
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: Redis Cache (prod-redis-01) Issue: Cache evictions causing database overload and slow queries Environment: Production (us-west-2) Error: "used_memory exceeds maxmemory, eviction policy: allkeys-lru" Impact: Database query latency increased 10x, c
71c8acfb
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: NGINX Load Balancer (prod-lb-01) Issue: CPU usage consistently at 95%+, causing slow response times Environment: Production (us-east-1) Error: Worker processes consuming excessive CPU, request queue building up Impact: API response times increase
2031a37d
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentFresh test incident
b9da43be
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentTest kubernetes pod crash
712edab9
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sentService: Kong API Gateway (prod-apigw-01) Issue: API Gateway returning 503 Service Unavailable for 15% of requests Environment: Production (multi-region) Error: "upstream connect error or disconnect/reset before headers. reset reason: connection failure"
ca6f59b1
mediumIn ProgressCreatedDuration—ServiceNow—WebexNot sent