Hacker News

points by kristopolous 6 days ago | hide | 0 comments

It performs substantially better.

just try it out. It's not the same.

There's a One Minute video showing the difference on the github page.

You can also try it yourself:

    $ uvx agent-cli-helper run-command vim

It sends this:

  <session id="vim" current-program="vim">
  <screen-capture>
     1
  ~
  ~
  ~
  ~
  ~                              VIM - Vi IMproved
  ~
  ~                               version 9.2.218
  ~                           by Bram Moolenaar et al.
  ~                   Modified by team+vim@tracker.debian.org
  ~                 Vim is open source and freely distributable
  ~
  ~                           Sponsor Vim development!
  ~                type  :help sponsor<Enter>    for information
  ~
  ~                type  :q<Enter>               to exit
  ~                type  :help<Enter>  or  <F1>  for on-line help
  ~                type  :help version9<Enter>   for version info
  ~
  ~
  ~
  ~
  ~
               0,0-1         All

  </screen-capture>
  </session>
  <instructions>
  The command has started. To send keystrokes run `agent-cli-helper send-keystrokes` followed by the id and the keystrokes. For instance:

      $ agent-cli-helper send-keystrokes vim "^X"

  Run `agent-cli-helper send-keystrokes --help` to find out the full syntax
  </instructions>
  <important>When you are done, use finish-command to finish the session. For example: agent-cli-helper finish-command vim</important>
  <random-usage-tip>If you need to see the current screen without sending keystrokes, use agent-cli-helper get-screen-capture <session-id></random-usage-tip>

Some important parts:

* it returns the output to the stdout making it easier to loop on

* it formulates everything in fake xml so the that the agent knows the status and doesn't send weird keystrokes to the wrong command

* it includes reminders and "random-usage-tip" which is not random, it reminds the agent of how the tool behaves. The "random-usage-tip" is non-random, it is called that to make it look unrelated but it really is.

All of these things have been tested and eval'd. Calling it "screen-capture" "current-program", these have all been tested with variations for a variety of tasks/harness/model triplets.

The tool is optimized for agent usage and has been designed that way as a first principle