To score a task, type workflowai task score [task name]. This will score the task using the default model gpt-4-1106-preview. You will be given a table that includes an accuracy percentage of your task in its current form as well as the amount of task runs that percentage was calculated from.

It is recommended to start with generating and annotating 20 task runs. (See section 3.A for a tip on how to do this quickly!)

Example scenario (contโ€™d): Your task had one run and the output was not correct, after annotating the example and marking it as incorrect, you will get a score of 0%, since the only run was incorrect.

If you ran the task 20 times, annotated all runs, and 5 of them were incorrect, then the score would be 75%.