Hacker News

points by angarrido 1 day ago | hide | 0 comments

must people think it’s just GPU cost. In practice it’s coordination: model latency variance + queueing + retries under load. You don’t scale linearly, you get cascading slowdowns.