I just read LLM Wiki in more detail. I have heard about it second-hand before this project. The "no-code" idea was inspired by Karpathy.

As I have understood it, in LLM Wiki, the human is very much in the loop in what gets written. In ReadMe, the human control is mostly on the policy (prompt) level, and it is done once, the agent then goes full autonomously afterwards.

After a quick skim of your project.

I have tried an embedding-based knowledge base as well, but it is a bit tricky to make the embedding match a user query. For example, "What happened?" is not at all similar to "Batman defeats Joker." You need to reformulate the query using an LLM, which is tricky given that the query is conditioned on the whole chat history. That's partly why I abandoned embedding-based methods.

But given that MCPTube already works on Gemini CLI, I could see it work natively without embeddings. Gemini is capable of reading video files natively. Worth a try?