Health Checks

Monitor your endpoints and get alerted when they go down.

Overview#

Uptime monitoring continuously checks your endpoints from multiple regions and alerts you when they become unreachable or return unexpected responses.

Creating a Monitor#

Navigate to Monitoring > Uptime and click Add Monitor.

| Setting | Description | Example | |---------|-------------|---------| | Name | Descriptive monitor name | Production API | | URL | Endpoint to check | https://api.example.com/health | | Method | HTTP method | GET, POST, HEAD | | Interval | Check frequency | 1, 5, 10, 15, 30 minutes | | Timeout | Max wait time | 10, 15, 30 seconds | | Expected Status | Success status code | 200 | | Regions | Check from regions | US East, EU West, Asia Pacific |

Request Configuration#

Optionally configure request headers and body:

Headers:
  Authorization: Bearer your-token
  Content-Type: application/json

Body (POST only):
  {"check": "health"}

Response Validation#

Beyond status codes, validate the response body:

Contains text -- response must contain a specific string
JSON path -- a JSON field must match a value
Response time -- must respond within N milliseconds

Monitor Dashboard#

Each monitor shows:

Current status -- Up or Down with response time
Uptime percentage -- over 24h, 7d, 30d, 90d
Response time chart -- latency over time
Incident history -- list of all downtime events
Average response time -- across all regions

Incidents#

When a monitor fails, an incident is created:

Detection -- check fails (confirmed after 2 consecutive failures to avoid flapping)
Alert -- notifications sent to configured channels
Duration -- incident remains open until endpoint recovers
Recovery -- recovery notification sent when endpoint is back

Incident Detail#

Each incident records:

Start and end time
Total downtime duration
Error details (timeout, DNS failure, HTTP error, etc.)
Response body (if any)

Alerts#

Uptime monitors automatically create alerts:

Endpoint down -- after 2 consecutive failures
Slow response -- response time exceeds threshold
SSL certificate expiring -- within 14 days of expiry
Recovery -- when endpoint comes back up

Status Page#

Share uptime status with your users by enabling the public status page. This shows:

Current status of all monitors
Uptime percentages
Recent incident history
Scheduled maintenance windows

Best Practices#

Monitor critical paths -- check your most important endpoints, not just /
Use health check endpoints -- create dedicated /health endpoints that verify database connectivity
Check from multiple regions -- detect regional outages
Set appropriate intervals -- critical services at 1 minute, others at 5-15 minutes
Configure meaningful timeouts -- too short causes false alarms, too long delays detection