Each text token already represents the activation of certain neurons. There is nothing "more direct." And you cannot fully separate data and metadata if you want them to influence the output. At best you can clearly distinguish them and hope that this is enough for the model to learn to treat them differently.
Are there tokens reserved for tool calls? If yes, I can see the equivalence. If not, not so much.
Yes, typically the tags used for tool calls get their own special tokens, e.g. https://huggingface.co/google/gemma-4-E4B-it/blob/main/token...