I know this is getting off-topic, but is anybody working on more direct tool calling?
LLMs are based on neural networks, so one could create an interface where activating certain neurons triggers tool calls, with other neurons encoding the inputs; another set of neurons could be triggered by the tokenized result from the tool call.
Currently, the lack of separation between data and metadata is a security nightmare, which enables prompt injection. And yet all I've seen done about is are workarounds.
You can do this. It's just sticking a different classifier head on top of the model.
Before foundation models it was a standard Deep RL approach. It probably still is within that space (I haven't kept up on the research).
You don't hear about it here because if you do that then every use case needs a custom classifier head which needs to be trained on data for that use case. It negates the "single model you can use for lots of things" benefit of LLMs.