Workflow Automation

Automate responses to alerts, incidents, and errors with configurable triggers, conditions, and actions.

What Is Workflow Automation?#

Workflow automation lets you define rules that automatically take actions when specific events occur in your system. Instead of manually creating incidents, assigning issues, or sending notifications, workflows handle repetitive operational tasks for you.

A workflow has three parts:

Trigger -- the event that starts the workflow (e.g., an alert fires)
Conditions -- optional filters that narrow when the workflow runs (e.g., only for production, only for SEV1)
Actions -- what the workflow does (e.g., create an incident, send a webhook)

Triggers#

Triggers define which events start a workflow. Each workflow has exactly one trigger.

alert_fired#

Runs when any alert rule transitions to the firing state.

Available context:

alert.name -- name of the alert rule
alert.severity -- critical, warning, info
alert.service -- affected service name
alert.environment -- production, staging, etc.
alert.metric -- the metric that breached the threshold
alert.value -- the current metric value
alert.threshold -- the configured threshold

incident_declared#

Runs when an incident is created (manually or automatically).

Available context:

incident.title -- incident title
incident.severity -- SEV1, SEV2, SEV3, SEV4
incident.services -- list of affected services
incident.environment -- environment
incident.commander -- assigned commander (if any)

error_new#

Runs when a new error group is created (first occurrence of a unique error).

Available context:

error.message -- error message
error.type -- error type (TypeError, RangeError, etc.)
error.service -- service where the error originated
error.environment -- environment
error.file -- source file path
error.count -- occurrence count (always 1 for new errors)

error_regression#

Runs when a previously resolved error group reappears in a new release.

Available context:

error.message -- error message
error.service -- service name
error.resolvedInRelease -- the release where it was resolved
error.regressedInRelease -- the release where it reappeared
error.regressionCount -- how many times this error has regressed

deploy_completed#

Runs when a new release is tracked.

Available context:

deploy.version -- release version
deploy.service -- service name
deploy.environment -- environment
deploy.commitCount -- number of commits in the release

slo_budget_warning#

Runs when an SLO error budget drops below a threshold.

Available context:

slo.name -- SLO name
slo.service -- service name
slo.budgetRemaining -- percentage remaining
slo.burnRate -- current burn rate
slo.target -- SLO target percentage

custom#

Runs when your code sends a custom event via the API. Use this to trigger workflows from your own tooling.

// Fire a custom workflow trigger
await fetch('/api/ingest/workflow-event', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': 'YOUR_API_KEY',
  },
  body: JSON.stringify({
    event: 'custom',
    name: 'canary_deploy_failed',
    context: {
      service: 'payment-service',
      version: 'v2.3.1',
      failureReason: 'Error rate exceeded 5% threshold',
    },
  }),
});

Conditions#

Conditions filter when a workflow should run after it's triggered. If all conditions match, the actions execute. If any condition fails, the workflow skips silently.

Severity Filter#

Match on severity level:

condition: severity IN [SEV1, SEV2]

Works with: alert_fired, incident_declared

Service Filter#

Match on one or more service names:

condition: service IN [payment-service, checkout-api]

Works with: all triggers

Environment Filter#

Match on environment:

condition: environment = production

Works with: all triggers

Tag Filter#

Match on custom tags:

condition: tags.team = platform
condition: tags.region = us-east-1

Works with: alert_fired, error_new, error_regression

Metric Value Filter#

Match on the metric value that triggered the alert:

condition: alert.value > 100
condition: alert.value BETWEEN 50 AND 200

Works with: alert_fired

Budget Threshold#

Match on remaining error budget:

condition: slo.budgetRemaining < 25

Works with: slo_budget_warning

Combining Conditions#

Multiple conditions are combined with AND logic. All conditions must match for the workflow to execute.

Trigger:    alert_fired
Conditions:
  AND severity IN [critical]
  AND service IN [payment-service, checkout-api]
  AND environment = production

Actions#

Actions define what happens when a workflow's trigger fires and all conditions are met. A workflow can have multiple actions that execute in parallel.

webhook#

Send an HTTP POST request to an external URL. Use this to integrate with Slack, PagerDuty, custom tooling, or any system that accepts webhooks.

{
  "action": "webhook",
  "config": {
    "url": "https://hooks.slack.com/services/T00/B00/xxxx",
    "method": "POST",
    "headers": {
      "Content-Type": "application/json"
    },
    "body": {
      "text": "Alert fired: {{alert.name}} on {{alert.service}} ({{alert.severity}})",
      "channel": "#incidents"
    },
    "timeoutMs": 10000,
    "retries": 2
  }
}

Template variables (wrapped in {{ }}) are replaced with values from the trigger context.

email#

Send an email notification:

{
  "action": "email",
  "config": {
    "to": ["oncall@company.com", "platform-team@company.com"],
    "subject": "SEV1 Incident: {{incident.title}}",
    "body": "A SEV1 incident has been declared.\n\nServices: {{incident.services}}\nCommander: {{incident.commander}}\n\nView: {{incident.url}}"
  }
}

create_incident#

Automatically create an incident:

{
  "action": "create_incident",
  "config": {
    "title": "Auto: {{alert.name}} on {{alert.service}}",
    "severity": "SEV1",
    "services": ["{{alert.service}}"],
    "description": "Automatically created from alert {{alert.name}}.\nMetric: {{alert.metric}} = {{alert.value}} (threshold: {{alert.threshold}})"
  }
}

assign_issue#

Assign an error group or incident to a team member based on ownership rules:

{
  "action": "assign_issue",
  "config": {
    "strategy": "ownership_rules",
    "fallback": "user_abc123"
  }
}

Strategy options:

ownership_rules -- use CODEOWNERS / ownership rules to determine assignee
round_robin -- rotate assignment among team members
specific_user -- always assign to a specific user

run_query#

Execute a saved query and include the results in a notification or incident:

{
  "action": "run_query",
  "config": {
    "queryId": "saved_query_xyz",
    "attachTo": "incident",
    "format": "table"
  }
}

update_status#

Update the status of a linked resource:

{
  "action": "update_status",
  "config": {
    "target": "error_group",
    "status": "acknowledged",
    "comment": "Auto-acknowledged by workflow"
  }
}

Visual Workflow Builder#

Create and edit workflows visually in Dashboard > Settings > Workflows.

Building a Workflow#

Click Create Workflow
Give it a name and optional description
Select a trigger from the dropdown
Add conditions by clicking Add Condition (each condition adds a filter row)
Add one or more actions by clicking Add Action
Toggle the workflow Enabled / Disabled
Click Save

Workflow Canvas#

The visual builder shows your workflow as a flowchart:

┌─────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Trigger    │────▶│   Conditions     │────▶│    Actions      │
│ alert_fired  │     │ severity=critical│     │ create_incident │
│              │     │ env=production   │     │ webhook (Slack) │
└─────────────┘     └──────────────────┘     │ email (oncall)  │
                                              └─────────────────┘

Each block is clickable and opens a configuration panel.

Testing Workflows#

Click Test on any workflow to simulate it with sample data. The test shows:

Whether conditions matched
What actions would execute
The rendered templates (with variables replaced)
Any errors in configuration

Testing does not execute real actions -- webhooks are not sent, incidents are not created.

Examples#

Auto-Create Incident on Critical Alert#

Name:        Auto-Incident on Critical Alert
Trigger:     alert_fired
Conditions:
  - severity = critical
  - environment = production
Actions:
  1. create_incident
     - title: "Auto: {{alert.name}} on {{alert.service}}"
     - severity: SEV1
  2. webhook (Slack)
     - channel: #incidents
     - message: "SEV1 incident created for {{alert.service}}"

Auto-Assign Errors by Service#

Name:        Assign Payment Errors
Trigger:     error_new
Conditions:
  - service IN [payment-service, billing-service]
  - environment = production
Actions:
  1. assign_issue
     - strategy: ownership_rules
     - fallback: @payments-team-lead
  2. webhook (Slack)
     - channel: #payments-errors
     - message: "New error in {{error.service}}: {{error.message}}"

Notify on SLO Budget Depletion#

Name:        SLO Budget Warning
Trigger:     slo_budget_warning
Conditions:
  - slo.budgetRemaining < 25
Actions:
  1. email
     - to: platform-team@company.com
     - subject: "SLO Budget Warning: {{slo.name}}"
  2. webhook (PagerDuty)
     - routing_key: "R00..."
     - severity: warning

Auto-Reopen on Regression#

Name:        Regression Alert
Trigger:     error_regression
Conditions:
  - environment = production
Actions:
  1. update_status
     - target: error_group
     - status: reopened
  2. assign_issue
     - strategy: ownership_rules
  3. webhook (Slack)
     - channel: #regressions
     - message: "Regression: {{error.message}} reappeared in {{error.regressedInRelease}}"

Custom: Canary Deploy Failure#

Name:        Canary Rollback Notification
Trigger:     custom (name = canary_deploy_failed)
Conditions:
  - context.service IN [payment-service, checkout-api]
Actions:
  1. create_incident
     - title: "Canary failed: {{context.service}} {{context.version}}"
     - severity: SEV2
  2. webhook (Slack)
     - message: "Canary deploy of {{context.service}} {{context.version}} failed: {{context.failureReason}}"

Limits and Quotas#

| Resource | Limit | |----------|-------| | Workflows per site | 20 | | Conditions per workflow | 10 | | Actions per workflow | 5 | | Webhook timeout | 10 seconds | | Webhook retries | 2 (with exponential backoff) | | Email recipients per action | 10 | | Custom event payload size | 10 KB |

Rate Limiting#

Workflows are rate-limited to prevent runaway execution:

Each workflow can fire at most once per minute for the same trigger context
If an alert is flapping (firing and resolving rapidly), the workflow runs on the first firing and suppresses subsequent firings for 60 seconds
Custom event triggers are limited to 100 per hour per site

Workflow Execution History#

View the execution history for any workflow under Dashboard > Settings > Workflows > [Workflow] > History.

Each execution shows:

Timestamp -- when it ran
Trigger event -- the event that triggered it
Conditions result -- which conditions matched (or didn't)
Actions result -- success/failure for each action, with response details
Duration -- how long the workflow took to execute

Use the history to debug workflows that aren't firing as expected or to audit automated actions.

Disabling and Deleting#

Disable a workflow to stop it from executing without deleting it. Useful during maintenance windows.
Delete a workflow to permanently remove it. Execution history is retained for 30 days after deletion.
Pause all -- a global kill switch under Dashboard > Settings > Workflows > Pause All Workflows stops all workflow execution site-wide.