Here’s a quick explainer on what *seems* to be the cause of the CrowdStrike outage.
UPDATE: Official Tech analysis here - https://lnkd.in/g3uCyS6e
Kinda the same but with the agent crashing, not a Windows OS process.
Original post below 👇
1. Crowdstrike publishes a content update for their threat feed, which is basically a list of patterns of “bad things”
2. Software agents get this update and apply the controls to block things that match this pattern
3. The update has a pattern which matches a critical Windows process but the software blocks it anyway
4. Windows crashes with a Blue Screen of Death (BSOD) and reboots
5. On reboot, CrowdStrike kills the process again and Windows reboots
6. And it’s now a loop…
There are various ways of fixing this but for most systems it will involve physically visiting every affected system, booting into “safe mode” and fixing the problem manually.
For some cloud systems though, such as AWS, “safe mode” is not even possible so this fix doesn’t work. The virtual servers will need to be shut down, their disks cloned, attached to another server, edited to remove the offending files and then finally reattach to the original server.
BUT, if you’re protecting your data and using encryption at rest, you need to manually decrypt the disk with a BitLocker Recovery Key, which is probably - for most companies - on one of the servers that is currently booting over and over 🫠