欢迎使用 FlowZap,快速、清晰、掌控的绘图应用。

事件响应工作流

devops

生产事件响应工作流,具有基于严重程度的路由、战情室协调、事件指挥官分配、客户沟通、根本原因分析和事后审查调度。

完整 FlowZap 代码

Monitoring { # Monitoring System
n1: circle label:"Start"
n2: rectangle label:"Alert triggered"
n3: rectangle label:"Page on-call responder"
n4: rectangle label:"Log incident timeline"
n5: circle label:"End"
n1.handle(right) -> n2.handle(left)
n2.handle(bottom) -> OnCall.n6.handle(top) [label="SEV-1 Alert"]
n3.handle(bottom) -> OnCall.n6.handle(top) [label="Paged"]
n4.handle(right) -> n5.handle(left)
}
OnCall { # On-Call Responder
n6: rectangle label:"Acknowledge incident"
n7: rectangle label:"Assess severity and impact"
n8: diamond label:"Severity level?"
n9: rectangle label:"Declare major incident"
n10: rectangle label:"Begin troubleshooting"
n11: rectangle label:"Escalate to team lead"
n6.handle(right) -> n7.handle(left)
n7.handle(right) -> n8.handle(left)
n8.handle(right) -> n9.handle(left) [label="SEV-1/2"]
n8.handle(bottom) -> n10.handle(top) [label="SEV-3/4"]
n9.handle(bottom) -> IncidentCommand.n12.handle(top) [label="Open bridge"]
n10.handle(bottom) -> Resolution.n18.handle(top) [label="Investigate"]
n11.handle(bottom) -> IncidentCommand.n12.handle(top) [label="Needs help"]
}
IncidentCommand { # Incident Command
n12: rectangle label:"Open war room bridge"
n13: rectangle label:"Assign incident commander"
n14: rectangle label:"Coordinate response teams"
n15: rectangle label:"Communicate status updates"
n16: diamond label:"Customer impact?"
n17: rectangle label:"Notify customer success"
n12.handle(right) -> n13.handle(left)
n13.handle(right) -> n14.handle(left)
n14.handle(right) -> n15.handle(left)
n15.handle(right) -> n16.handle(left)
n16.handle(right) -> n17.handle(left) [label="Yes"]
n16.handle(bottom) -> Resolution.n18.handle(top) [label="No"]
n17.handle(bottom) -> Resolution.n18.handle(top) [label="Notified"]
}
Resolution { # Resolution
n18: rectangle label:"Identify root cause"
n19: rectangle label:"Implement fix"
n20: diamond label:"Issue resolved?"
n21: rectangle label:"Verify recovery"
n22: rectangle label:"Close incident"
n23: rectangle label:"Schedule postmortem"
n18.handle(right) -> n19.handle(left)
n19.handle(right) -> n20.handle(left)
n20.handle(right) -> n21.handle(left) [label="Yes"]
n20.handle(bottom) -> n18.handle(bottom) [label="No - Continue"]
n21.handle(right) -> n22.handle(left)
n22.handle(right) -> n23.handle(left)
n23.handle(top) -> Monitoring.n4.handle(bottom) [label="Complete"]
}

Quick Answer

事件响应工作流 is a workflow template that when production goes down, every minute costs money and customer trust.

为什么需要这个工作流?

When production goes down, every minute costs money and customer trust. A well-defined incident response workflow reduces Mean Time to Recovery (MTTR) by ensuring the right people are notified, war rooms are coordinated, and postmortems capture learnings. This workflow implements industry-standard severity-based routing.

工作原理

  1. Step 1: Monitoring alerts trigger the workflow with severity classification (SEV1-SEV4).
  2. Step 2: SEV1/SEV2 incidents immediately page the on-call engineer via PagerDuty or Opsgenie.
  3. Step 3: An incident commander is assigned and a war room (Slack channel or Zoom) is created.
  4. Step 4: Customer communication is drafted and sent through status page updates.
  5. Step 5: Root cause analysis begins in parallel with mitigation efforts.
  6. Step 6: Once resolved, a postmortem is scheduled and action items are tracked in Jira.

替代方案

Ad-hoc incident response leads to finger-pointing, missed notifications, and repeated incidents. Enterprise tools like ServiceNow or PagerDuty Incident Response cost $20-50 per user/month. This workflow provides a visual runbook that integrates with your existing alerting stack.

Key Facts

Template Name事件响应工作流
Categorydevops
Steps6 workflow steps
FormatFlowZap Code (.fz file)

相关模板

备份恢复工作流程

devops

备份和恢复工作流程,包含**定时备份**、**异地复制**、**保留策略执行**、**恢复测试**和**RTO/RPO 验证**。

证书续期工作流

devops

SSL/TLS证书续期工作流,包含到期监控、按类型(DV/OV/EV)证书请求、域名验证、部署到负载均衡器以及健康检查验证与回滚。

返回所有模板