It scales fine if done correctly.
Even with the weights the extra context allows it to move to the correct space.
Much the same as humans there are terms that are meaningless without knowing the context.
Would it be possible to make GPT3 from GPT2 just by prompting? It doesn't work/scale