I have been struggling with the same issue but help me understand this:

> The lack of predictable output/outcomes

How does that actually show up in practice for you? Asking because "lack of predictable output" could mean different things depending on the context.

adriand1 hour ago | | | parent | | on: 47771518
> How does that actually show up in practice for you?

It shows up as inconsistency. One of the key things I built in this architecture is the ability of users to define standard operating procedures (SOPs). These are the instructions (i.e. prompt) for the agent to do tasks (I've integrated Sonnet via OpenRouter into the SOP drafting UI, so people have help creating these - and the system prompt for this knows about the API endpoints that the agents have access to, so people get good advice as they write them).

Anyway, it's not uncommon for someone to write an SOP, test it a couple of times, decide that it works, and then tell people it's good to go. There's probably a 1 in 3 possibility that it doesn't actually work when someone else tries it. The reasons for that are almost endless it seems.

This is just one aspect. It seems like something new fails every day. Today:

- the agent stopped responding to incoming email. I dug into it. Somehow the tailscale hostname had changed. I had not changed it. I have no idea why it changed. This is not OpenClaw's fault, but it speaks to my point that there are too many moving parts with these things.

- the agent stopped sending emails when tasks were completed. This runs on a "cron" job. I went through the list of the cron jobs. The "task reporter" cron job was disabled. Why? No idea. I didn't disable it. I'm the sole "operator" of the OpenClaw instance, so if I didn't disable it, then something inside OpenClaw did. Why? I don't know.

What I do know is that someone pings me every day with a complaint that something is not working, which is a new experience for me, and it's embarrassing.