The thresholds
Automation allows you to avoid the "human in the loop" for full Straight Through Processing (STP).
Automation has three main settings:
Automate everything: Mark all documents as STP so that a human is never in the loop
Threshold-based: Mark a document as STP if each field in the document has a confidence level higher than the associated confidence threshold
Automate nothing: Mark all documents as Flagged for Review, to ensure a human is always in the loo
For the threshold-based automation, Confidence thresholds are key. These confidence thresholds are set for each type of field. It can be quite cumbersome to identify the right confidence threshold for each field individually, as you may care more about the trade-off that you need to make on the document level between the error rate and the degree of automation.
Trade-off error-rate with automation
We allow you to intuitively trade-off the error-rate and the degree of automation through a dedicated chart in the automation pane.
This chart is built based on historical data and shows you what kind of trade-offs are possible with the performance of the Machine Learning pipeline and models of the time of the initial prediction. If needed, you can update the predictions from the past with the latest ML pipeline and models through the sampling pane.
This graph is interactive, meaning that when you click on any of the blue dots, the thresholds below are updated to match the dot clicked.
Set individual confidence thresholds
You can also set individual confidence thresholds. This may be useful if a certain field is "nice-to-have" and you don't want it to adversely impact the automation degree even if the confidence level is low. Conversely, you may want to set the bar very high for certain fields if you really can't afford to make a mistake on that particular field.
You can immediately see what the confidence threshold would have meant on historical data (in terms of true/false positives/negatives).
These thresholds can be tweaked for both the fields and also line items. So it is possible to tweak thresholds preferences per specific line in case some lines are more or less important to extract correct values than the rest of the table.