There's also the memory side. A large model has to live entirely in GPU VRAM to run efficiently. You can't just "add more RAM" on the fly the way you can with CPU workloads. Scaling means acquiring, provisioning, and loading entirely new physical machines — which takes minutes to hours, not seconds.
So you end up with a system that's simultaneously very expensive per-request, very hard to scale horizontally in real time, and very sensitive to traffic spikes. That's a reliability engineer's nightmare even before you factor in the supply constraints the sibling comment mentioned.
Besides that, I thought the comment had something useful to say — whether ai-generated or not.
If you can’t tell then damn idk man.
Another factor is just it's a new field and move fast and break things is still the go to as competition is high, and the stakes are incredibly high monetary wise.
A pessimistic, but perhaps true theory is also just vibe-coding/slop is reducing their reliability.
A counter point is that regular services like github seem to go down almost as frequently.
also worth saying, even when things are "up" you often get different answers to the same question. that's the reliability problem nobody talks about. fine for a chatbot, not fine if you're building anything that needs to be repeatable and deterministic... i moved more to the ML route, but i guess it depends on what you are trying to do.