Local inference is getting solved pretty quickly.
What still seems unsolved is how to safely use it on real private systems (large codebases, internal tools, etc) where you can’t risk leaking context even accidentally.
In our experience that constraint changes the problem much more than the choice of runtime or SDK.
Curious to hear what constraints are there that aren't tackled by the current offering of local runtimes/SDKs for inference.