Health Checks
Monitor your endpoints and get alerted when they go down.
Overview#
Uptime monitoring continuously checks your endpoints from multiple regions and alerts you when they become unreachable or return unexpected responses.
Creating a Monitor#
Navigate to Monitoring > Uptime and click Add Monitor.
Configuration#
| Setting | Description | Example |
|---------|-------------|---------|
| Name | Descriptive monitor name | Production API |
| URL | Endpoint to check | https://api.example.com/health |
| Method | HTTP method | GET, POST, HEAD |
| Interval | Check frequency | 1, 5, 10, 15, 30 minutes |
| Timeout | Max wait time | 10, 15, 30 seconds |
| Expected Status | Success status code | 200 |
| Regions | Check from regions | US East, EU West, Asia Pacific |
Request Configuration#
Optionally configure request headers and body:
Headers:
Authorization: Bearer your-token
Content-Type: application/json
Body (POST only):
{"check": "health"}
Response Validation#
Beyond status codes, validate the response body:
- Contains text -- response must contain a specific string
- JSON path -- a JSON field must match a value
- Response time -- must respond within N milliseconds
Monitor Dashboard#
Each monitor shows:
- Current status -- Up or Down with response time
- Uptime percentage -- over 24h, 7d, 30d, 90d
- Response time chart -- latency over time
- Incident history -- list of all downtime events
- Average response time -- across all regions
Incidents#
When a monitor fails, an incident is created:
- Detection -- check fails (confirmed after 2 consecutive failures to avoid flapping)
- Alert -- notifications sent to configured channels
- Duration -- incident remains open until endpoint recovers
- Recovery -- recovery notification sent when endpoint is back
Incident Detail#
Each incident records:
- Start and end time
- Total downtime duration
- Error details (timeout, DNS failure, HTTP error, etc.)
- Response body (if any)
Alerts#
Uptime monitors automatically create alerts:
- Endpoint down -- after 2 consecutive failures
- Slow response -- response time exceeds threshold
- SSL certificate expiring -- within 14 days of expiry
- Recovery -- when endpoint comes back up
Status Page#
Share uptime status with your users by enabling the public status page. This shows:
- Current status of all monitors
- Uptime percentages
- Recent incident history
- Scheduled maintenance windows
Best Practices#
- Monitor critical paths -- check your most important endpoints, not just
/ - Use health check endpoints -- create dedicated
/healthendpoints that verify database connectivity - Check from multiple regions -- detect regional outages
- Set appropriate intervals -- critical services at 1 minute, others at 5-15 minutes
- Configure meaningful timeouts -- too short causes false alarms, too long delays detection