Bienvenue sur FlowZap, l'application pour créer des diagrammes avec Rapidité, Clarté et Contrôle.

Flux de travail de réponse aux incidents

devops

Flux de travail de réponse aux incidents de production avec routage basé sur la sévérité, coordination en salle de crise, désignation d’un commandant d’incident, communication client, analyse de la cause racine et planification du post-mortem.

Code FlowZap complet

Monitoring { # Monitoring System
n1: circle label:"Start"
n2: rectangle label:"Alert triggered"
n3: rectangle label:"Page on-call responder"
n4: rectangle label:"Log incident timeline"
n5: circle label:"End"
n1.handle(right) -> n2.handle(left)
n2.handle(bottom) -> OnCall.n6.handle(top) [label="SEV-1 Alert"]
n3.handle(bottom) -> OnCall.n6.handle(top) [label="Paged"]
n4.handle(right) -> n5.handle(left)
}
OnCall { # On-Call Responder
n6: rectangle label:"Acknowledge incident"
n7: rectangle label:"Assess severity and impact"
n8: diamond label:"Severity level?"
n9: rectangle label:"Declare major incident"
n10: rectangle label:"Begin troubleshooting"
n11: rectangle label:"Escalate to team lead"
n6.handle(right) -> n7.handle(left)
n7.handle(right) -> n8.handle(left)
n8.handle(right) -> n9.handle(left) [label="SEV-1/2"]
n8.handle(bottom) -> n10.handle(top) [label="SEV-3/4"]
n9.handle(bottom) -> IncidentCommand.n12.handle(top) [label="Open bridge"]
n10.handle(bottom) -> Resolution.n18.handle(top) [label="Investigate"]
n11.handle(bottom) -> IncidentCommand.n12.handle(top) [label="Needs help"]
}
IncidentCommand { # Incident Command
n12: rectangle label:"Open war room bridge"
n13: rectangle label:"Assign incident commander"
n14: rectangle label:"Coordinate response teams"
n15: rectangle label:"Communicate status updates"
n16: diamond label:"Customer impact?"
n17: rectangle label:"Notify customer success"
n12.handle(right) -> n13.handle(left)
n13.handle(right) -> n14.handle(left)
n14.handle(right) -> n15.handle(left)
n15.handle(right) -> n16.handle(left)
n16.handle(right) -> n17.handle(left) [label="Yes"]
n16.handle(bottom) -> Resolution.n18.handle(top) [label="No"]
n17.handle(bottom) -> Resolution.n18.handle(top) [label="Notified"]
}
Resolution { # Resolution
n18: rectangle label:"Identify root cause"
n19: rectangle label:"Implement fix"
n20: diamond label:"Issue resolved?"
n21: rectangle label:"Verify recovery"
n22: rectangle label:"Close incident"
n23: rectangle label:"Schedule postmortem"
n18.handle(right) -> n19.handle(left)
n19.handle(right) -> n20.handle(left)
n20.handle(right) -> n21.handle(left) [label="Yes"]
n20.handle(bottom) -> n18.handle(bottom) [label="No - Continue"]
n21.handle(right) -> n22.handle(left)
n22.handle(right) -> n23.handle(left)
n23.handle(top) -> Monitoring.n4.handle(bottom) [label="Complete"]
}

Quick Answer

Flux de travail de réponse aux incidents is a workflow template that when production goes down, every minute costs money and customer trust.

Pourquoi ce workflow ?

When production goes down, every minute costs money and customer trust. A well-defined incident response workflow reduces Mean Time to Recovery (MTTR) by ensuring the right people are notified, war rooms are coordinated, and postmortems capture learnings. This workflow implements industry-standard severity-based routing.

Comment ça fonctionne

  1. Step 1: Monitoring alerts trigger the workflow with severity classification (SEV1-SEV4).
  2. Step 2: SEV1/SEV2 incidents immediately page the on-call engineer via PagerDuty or Opsgenie.
  3. Step 3: An incident commander is assigned and a war room (Slack channel or Zoom) is created.
  4. Step 4: Customer communication is drafted and sent through status page updates.
  5. Step 5: Root cause analysis begins in parallel with mitigation efforts.
  6. Step 6: Once resolved, a postmortem is scheduled and action items are tracked in Jira.

Alternatives

Ad-hoc incident response leads to finger-pointing, missed notifications, and repeated incidents. Enterprise tools like ServiceNow or PagerDuty Incident Response cost $20-50 per user/month. This workflow provides a visual runbook that integrates with your existing alerting stack.

Key Facts

Template NameFlux de travail de réponse aux incidents
Categorydevops
Steps6 workflow steps
FormatFlowZap Code (.fz file)

Modèles associés

Flux de travail de revue des accès

devops

Workflow trimestriel de revue des accès utilisateurs avec certification par le manager, validation de la séparation des tâches, suivi des remédiations et reporting de conformité pour les audits.

Flux de travail de renouvellement de certificats

devops

Flux de travail de renouvellement de certificats SSL/TLS avec surveillance des dates d’expiration, demande de certificat par type (DV/OV/EV), validation de domaine, déploiement sur les répartiteurs de charge et vérification de l’état de santé avec possibilité de rollback.

Flux de travail de chaos engineering

devops

Flux de travail de chaos engineering avec définition de l’hypothèse, surveillance de l’état stable, injection contrôlée de pannes, limitation du périmètre d’impact et validation de la résilience.

Retour à tous les modèles