Conversation
Notices
-
TL-DR: no, it's not your beloved "Install Linux" moment.
First, how do I understand this incident. This is an important part because misunderstanding will lead to wrong conclusions (and it can if I'm not correct in this part).
Crowdstrike has some "early antivirus" type of program analyzing your system to detect yet unidentified threats. To do this it needs kernel-level access achieved their driver.
Any crash of app with this access level will cause BSOD / kernel panic or so on depending on your favorite OS.
Can Microsoft block this access level? No, because they are obliged to allow it via the EU court (and let's be fair, if they blocked kernel level access, we would read much more MS blaming from Linux fans).
So, did Crowdstrike update driver causing outage? Due to limitations of kernel level driver updates, it will take a long time, and updating it with every new virus won't be the case. So they have made some configuration files that we can simplify as virus databases (because technically they are).
And "virus database" update caused kernel-level driver error and BSOD.
As far as I understand Windows have some default fixing mechanisms that couldn't work because of faulty driver working on early boot process causing endless reboots.
So, could similar things happen under linux. Well, it would be unfair to say what could have happened because it [happened already before](https://www.reddit.com/r/debian/comments/1c8db7l/linuximage61020_killed_all_my_debian_vms/).
Next, we've seen an online discussions about instruments that could be used to fix the problem. The most interesting solution I've found was ssh built-in into initramfs for remote maintenance.
Well, if you can put ssh server into initramfs, you may also have antivirus auto installed into initramfs causing problems at the early boot stages.
It doesn't mean that it will get into initramfs or indeed cause problems also at that early stage in case of kernel panic. But it it means so Linux is not protected from the same kind of problem.
The next stage is manual updates. I understand that auto app updates may cause problems. I understand that app/driver updates should be tested before applying at critical infrastructure.
But do you really ask people to manually test every virus database update before applying it in the company? Test every new virus database released few times a day! In this case, let's maually test every document file people have to deal with if it doesn't break your work applications before allowing users to read it. Why not?
So, who do we have to blame?
Crowdstrike? Yes, they fucked up twice. First time by allowing driver to break in case of broken config update. Second time by sending a broken update.
Admins that doesn't disable automatic virus database updates? Sorry, but no.
Microsoft? Sorry, but this is the case when Microsoft is not the one to blame.
Regulators that allowed critical infrastructure to "put all eggs into the single basket"? Maybe.
Linux fans spreading misinformation? Yes.
Yes, I'm also a Linux fan. Linux is more convenient for me than Windows. But please don't act like some trash political propagandists spreading misinformation to achieve their goals.
Thank you.