Time is a flat circle and all that.
It used to be that if you had a need for an embedded processor, you'd either write code for a "bare metal" MCU, or grab a compact real-time OS of some sort.
Nowadays, most people just get a powerful SoC running Linux, even if the project is to blink some Christmas lights. But Linux does Linux things, and getting predictable and performant hardware I/O from userspace is a crapshoot. You do 'echo 1 >/sys/class/gpio/gpio60/value' and then, some indeterminate time later, a pin changes state. Forget gigabit data transfer speeds, you might have a hard time playing back audio on a speaker.
So now, some SoCs come with a separate processor on the die - a PRU (Programmable Real-time Unit). It's a semi-standalone device that lets you run simple code in a "bare metal" environment with a bit of RAM and program memory to orchestrate your GPIO:
In this model, Linux running on the main CPU just tells the PRU "please make fart sounds on pin 60", and your code running on the PRU handles the particulars of toggling the pin at the necessary speed.
I suspect it's just a matter of time before we find a way to run Linux on the PRU, and then demand a secondary PRU for that PRU to solve the I/O issues that arise.