Workflow Automation

Automate responses to alerts, incidents, and errors with configurable triggers, conditions, and actions.

What Is Workflow Automation?#

Workflow automation lets you define rules that automatically take actions when specific events occur in your system. Instead of manually creating incidents, assigning issues, or sending notifications, workflows handle repetitive operational tasks for you.

A workflow has three parts:

  1. Trigger -- the event that starts the workflow (e.g., an alert fires)
  2. Conditions -- optional filters that narrow when the workflow runs (e.g., only for production, only for SEV1)
  3. Actions -- what the workflow does (e.g., create an incident, send a webhook)

Triggers#

Triggers define which events start a workflow. Each workflow has exactly one trigger.

alert_fired#

Runs when any alert rule transitions to the firing state.

Available context:

  • alert.name -- name of the alert rule
  • alert.severity -- critical, warning, info
  • alert.service -- affected service name
  • alert.environment -- production, staging, etc.
  • alert.metric -- the metric that breached the threshold
  • alert.value -- the current metric value
  • alert.threshold -- the configured threshold

incident_declared#

Runs when an incident is created (manually or automatically).

Available context:

  • incident.title -- incident title
  • incident.severity -- SEV1, SEV2, SEV3, SEV4
  • incident.services -- list of affected services
  • incident.environment -- environment
  • incident.commander -- assigned commander (if any)

error_new#

Runs when a new error group is created (first occurrence of a unique error).

Available context:

  • error.message -- error message
  • error.type -- error type (TypeError, RangeError, etc.)
  • error.service -- service where the error originated
  • error.environment -- environment
  • error.file -- source file path
  • error.count -- occurrence count (always 1 for new errors)

error_regression#

Runs when a previously resolved error group reappears in a new release.

Available context:

  • error.message -- error message
  • error.service -- service name
  • error.resolvedInRelease -- the release where it was resolved
  • error.regressedInRelease -- the release where it reappeared
  • error.regressionCount -- how many times this error has regressed

deploy_completed#

Runs when a new release is tracked.

Available context:

  • deploy.version -- release version
  • deploy.service -- service name
  • deploy.environment -- environment
  • deploy.commitCount -- number of commits in the release

slo_budget_warning#

Runs when an SLO error budget drops below a threshold.

Available context:

  • slo.name -- SLO name
  • slo.service -- service name
  • slo.budgetRemaining -- percentage remaining
  • slo.burnRate -- current burn rate
  • slo.target -- SLO target percentage

custom#

Runs when your code sends a custom event via the API. Use this to trigger workflows from your own tooling.

// Fire a custom workflow trigger
await fetch('/api/ingest/workflow-event', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': 'YOUR_API_KEY',
  },
  body: JSON.stringify({
    event: 'custom',
    name: 'canary_deploy_failed',
    context: {
      service: 'payment-service',
      version: 'v2.3.1',
      failureReason: 'Error rate exceeded 5% threshold',
    },
  }),
});

Conditions#

Conditions filter when a workflow should run after it's triggered. If all conditions match, the actions execute. If any condition fails, the workflow skips silently.

Severity Filter#

Match on severity level:

condition: severity IN [SEV1, SEV2]

Works with: alert_fired, incident_declared

Service Filter#

Match on one or more service names:

condition: service IN [payment-service, checkout-api]

Works with: all triggers

Environment Filter#

Match on environment:

condition: environment = production

Works with: all triggers

Tag Filter#

Match on custom tags:

condition: tags.team = platform
condition: tags.region = us-east-1

Works with: alert_fired, error_new, error_regression

Metric Value Filter#

Match on the metric value that triggered the alert:

condition: alert.value > 100
condition: alert.value BETWEEN 50 AND 200

Works with: alert_fired

Budget Threshold#

Match on remaining error budget:

condition: slo.budgetRemaining < 25

Works with: slo_budget_warning

Combining Conditions#

Multiple conditions are combined with AND logic. All conditions must match for the workflow to execute.

Trigger:    alert_fired
Conditions:
  AND severity IN [critical]
  AND service IN [payment-service, checkout-api]
  AND environment = production

Actions#

Actions define what happens when a workflow's trigger fires and all conditions are met. A workflow can have multiple actions that execute in parallel.

webhook#

Send an HTTP POST request to an external URL. Use this to integrate with Slack, PagerDuty, custom tooling, or any system that accepts webhooks.

{
  "action": "webhook",
  "config": {
    "url": "https://hooks.slack.com/services/T00/B00/xxxx",
    "method": "POST",
    "headers": {
      "Content-Type": "application/json"
    },
    "body": {
      "text": "Alert fired: {{alert.name}} on {{alert.service}} ({{alert.severity}})",
      "channel": "#incidents"
    },
    "timeoutMs": 10000,
    "retries": 2
  }
}

Template variables (wrapped in {{ }}) are replaced with values from the trigger context.

email#

Send an email notification:

{
  "action": "email",
  "config": {
    "to": ["oncall@company.com", "platform-team@company.com"],
    "subject": "SEV1 Incident: {{incident.title}}",
    "body": "A SEV1 incident has been declared.\n\nServices: {{incident.services}}\nCommander: {{incident.commander}}\n\nView: {{incident.url}}"
  }
}

create_incident#

Automatically create an incident:

{
  "action": "create_incident",
  "config": {
    "title": "Auto: {{alert.name}} on {{alert.service}}",
    "severity": "SEV1",
    "services": ["{{alert.service}}"],
    "description": "Automatically created from alert {{alert.name}}.\nMetric: {{alert.metric}} = {{alert.value}} (threshold: {{alert.threshold}})"
  }
}

assign_issue#

Assign an error group or incident to a team member based on ownership rules:

{
  "action": "assign_issue",
  "config": {
    "strategy": "ownership_rules",
    "fallback": "user_abc123"
  }
}

Strategy options:

  • ownership_rules -- use CODEOWNERS / ownership rules to determine assignee
  • round_robin -- rotate assignment among team members
  • specific_user -- always assign to a specific user

run_query#

Execute a saved query and include the results in a notification or incident:

{
  "action": "run_query",
  "config": {
    "queryId": "saved_query_xyz",
    "attachTo": "incident",
    "format": "table"
  }
}

update_status#

Update the status of a linked resource:

{
  "action": "update_status",
  "config": {
    "target": "error_group",
    "status": "acknowledged",
    "comment": "Auto-acknowledged by workflow"
  }
}

Visual Workflow Builder#

Create and edit workflows visually in Dashboard > Settings > Workflows.

Building a Workflow#

  1. Click Create Workflow
  2. Give it a name and optional description
  3. Select a trigger from the dropdown
  4. Add conditions by clicking Add Condition (each condition adds a filter row)
  5. Add one or more actions by clicking Add Action
  6. Toggle the workflow Enabled / Disabled
  7. Click Save

Workflow Canvas#

The visual builder shows your workflow as a flowchart:

┌─────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Trigger    │────▶│   Conditions     │────▶│    Actions      │
│ alert_fired  │     │ severity=critical│     │ create_incident │
│              │     │ env=production   │     │ webhook (Slack) │
└─────────────┘     └──────────────────┘     │ email (oncall)  │
                                              └─────────────────┘

Each block is clickable and opens a configuration panel.

Testing Workflows#

Click Test on any workflow to simulate it with sample data. The test shows:

  • Whether conditions matched
  • What actions would execute
  • The rendered templates (with variables replaced)
  • Any errors in configuration

Testing does not execute real actions -- webhooks are not sent, incidents are not created.

Examples#

Auto-Create Incident on Critical Alert#

Name:        Auto-Incident on Critical Alert
Trigger:     alert_fired
Conditions:
  - severity = critical
  - environment = production
Actions:
  1. create_incident
     - title: "Auto: {{alert.name}} on {{alert.service}}"
     - severity: SEV1
  2. webhook (Slack)
     - channel: #incidents
     - message: "SEV1 incident created for {{alert.service}}"

Auto-Assign Errors by Service#

Name:        Assign Payment Errors
Trigger:     error_new
Conditions:
  - service IN [payment-service, billing-service]
  - environment = production
Actions:
  1. assign_issue
     - strategy: ownership_rules
     - fallback: @payments-team-lead
  2. webhook (Slack)
     - channel: #payments-errors
     - message: "New error in {{error.service}}: {{error.message}}"

Notify on SLO Budget Depletion#

Name:        SLO Budget Warning
Trigger:     slo_budget_warning
Conditions:
  - slo.budgetRemaining < 25
Actions:
  1. email
     - to: platform-team@company.com
     - subject: "SLO Budget Warning: {{slo.name}}"
  2. webhook (PagerDuty)
     - routing_key: "R00..."
     - severity: warning

Auto-Reopen on Regression#

Name:        Regression Alert
Trigger:     error_regression
Conditions:
  - environment = production
Actions:
  1. update_status
     - target: error_group
     - status: reopened
  2. assign_issue
     - strategy: ownership_rules
  3. webhook (Slack)
     - channel: #regressions
     - message: "Regression: {{error.message}} reappeared in {{error.regressedInRelease}}"

Custom: Canary Deploy Failure#

Name:        Canary Rollback Notification
Trigger:     custom (name = canary_deploy_failed)
Conditions:
  - context.service IN [payment-service, checkout-api]
Actions:
  1. create_incident
     - title: "Canary failed: {{context.service}} {{context.version}}"
     - severity: SEV2
  2. webhook (Slack)
     - message: "Canary deploy of {{context.service}} {{context.version}} failed: {{context.failureReason}}"

Limits and Quotas#

| Resource | Limit | |----------|-------| | Workflows per site | 20 | | Conditions per workflow | 10 | | Actions per workflow | 5 | | Webhook timeout | 10 seconds | | Webhook retries | 2 (with exponential backoff) | | Email recipients per action | 10 | | Custom event payload size | 10 KB |

Rate Limiting#

Workflows are rate-limited to prevent runaway execution:

  • Each workflow can fire at most once per minute for the same trigger context
  • If an alert is flapping (firing and resolving rapidly), the workflow runs on the first firing and suppresses subsequent firings for 60 seconds
  • Custom event triggers are limited to 100 per hour per site

Workflow Execution History#

View the execution history for any workflow under Dashboard > Settings > Workflows > [Workflow] > History.

Each execution shows:

  • Timestamp -- when it ran
  • Trigger event -- the event that triggered it
  • Conditions result -- which conditions matched (or didn't)
  • Actions result -- success/failure for each action, with response details
  • Duration -- how long the workflow took to execute

Use the history to debug workflows that aren't firing as expected or to audit automated actions.

Disabling and Deleting#

  • Disable a workflow to stop it from executing without deleting it. Useful during maintenance windows.
  • Delete a workflow to permanently remove it. Execution history is retained for 30 days after deletion.
  • Pause all -- a global kill switch under Dashboard > Settings > Workflows > Pause All Workflows stops all workflow execution site-wide.