Annotating Task Runs

To quickly access the task runs for a task from VSCode, type: workflowai task dashboard <task name> .
Select a run that does not have an example.
Select Add as Example.
If the output is correct, check the box and select Save as Example.
If the output is incorrect, first adjust the output to what you would expect, then proceed with checking the box next to it and select Save as Example.

Once you’ve annotated your runs, you can move on to scoring the task (see below).

Annotating (also referred to as adding examples) is a way of noting if the AI’s output content is accurate.

Example scenario (cont’d): You want to indicate to the AI that an output arrival time of 30 minutes before a flight is not correct. When you locate the task run with the flight event, you add it as an example and adjust the output to be 120 minutes (ie. 2 hours).

5. Scoring a Task

To score a task, type workflowai task score [task name]. This will score the task using the default model gpt-4-1106-preview. You will be given a table that includes an accuracy percentage of your task in its current form as well as the amount of task runs that percentage was calculated from.

It is recommended to start with generating and annotating 20 task runs. (See section 3.A for a tip on how to do this quickly!)

Example scenario (cont’d): Your task had one run and the output was not correct, after annotating the example and marking it as incorrect, you will get a score of 0%, since the only run was incorrect.

If you ran the task 20 times, annotated all runs, and 5 of them were incorrect, then the score would be 75%.

Overview

Get Started

Use Cases

Annotating Task Runs

5. Scoring a Task

Overview

Get Started

Use Cases

​5. Scoring a Task

5. Scoring a Task