Likely one large contributor is that for a normal service, if it's down it's as simple as re-routing to another service, and there is basically an unlimited amount of CPU servers around the world to spin up on demand. GPU servers are much harder to spin up on demand, as supply is so constrained.
Another factor is just it's a new field and move fast and break things is still the go to as competition is high, and the stakes are incredibly high monetary wise.
A pessimistic, but perhaps true theory is also just vibe-coding/slop is reducing their reliability.
A counter point is that regular services like github seem to go down almost as frequently.