I agree on the commodity point, that's why I went multi-model from start.
The registry question is the one I'm thinking about the most. Right now it's flat. I plan to integrate usage data (success rates, cost, trust scores). So the registry tells you which skills actually work well, and that's valuable.
Your article looks interesting, I'll read it.