It would be helpful to add in some cases that do not contain any vulnerabilities to assess false-positive rate as well.
This is a good idea.
Will incorporate false-positive rates into the rubric from the next run onwards.
At winfunc, we spent a lot of research time taming these models to eradicate false-positive rates (it's high!) so this does feel important enough to be documented. Thanks!
Any code that is certain that it doesn't have any vulnerabilities is going to be pretty trivial to verify.