Configuring predictor settings

Have you noticed that a majority of your files are from the same supplier? Do you know up front which are the values you expect for a specific field? Would you like to set a default prediction for one of the fields in the format? Are there any values that you want to block from being predicted?

If the answer of any of those questions was positive, you might be interested in our predictor settings feature. In the predictor settings, you will be able to give hints to the model per field through the parameters fallback, expected_values, whitelist, and blacklist.

  • fallback: values that you would like to set as default prediction if the model did not find any other prediction candidate

  • expected_values: values that you would like the model to look into first, as you expect that they might be the correct ones

  • whitelist: values that you allow the model to return

  • blacklist: values that you do not want the model to predict

  • table extraction settings: line item specific settings

    • field_settings: field specific settings, indexed by field name

      • expected_pattern: a regex pattern to look for, in case a table cell contains more than what you need to extract. Also helpful in case you have uncommon patterns for typical data types. E.g. a custom date format, a weird looking amount or a custom unit.

      • override_default_settings: indicates whether or not the values in the expected patterns should override the expected patterns will reign over what the model predicted

      • nested_field_patterns: regex patterns, indexed by field name in case fields are nested inside each other

      • default_value: a default string value to extract for this field, in case nothing was found on a row

Payload example: (have to remove the “*”)

{ "fallback": {}, "expected_values": {}, "whitelist": {}, "blacklist": {}, "table_extraction_settings": { "field_settings": { "field_name_1": { "expected_pattern": "string", "override_default_patterns": true, "default_value": "string", "nested_field_patterns": { "nested_field_name_1": "custom_regex_pattern_1", "nested_field_name_2": "custom_regex_pattern_1" } }, "field_name_2": { "expected_pattern": "string", "override_default_patterns": true, "default_value": "string" } } } }

Adding predictor settings for header fields

Let’s say for example you have the following information:

  • You process only invoices and 70% of your invoices come from Supplier X and Supplier Y combined, whose VAT numbers you know and are BE0123456789 and BE0987654321

    • Put them in expected_values so that you can increase recall or true positives

  • You know that you mostly receive invoices in the Euro currency and have experienced that the currency field is mostly empty

    • Put them in fallback so that you can decrease false negatives

  • You sometimes get your own company number VAT (BE0111222333) as predicted "sender_VAT", which is incorrect

    • Put them in blacklist so that you can decrease false positives

You would then send the following payload under PATCH/predictor_settings/{scope} and add the scope (which can be your inbox UUID or project UUID) where you would like these predictor settings to work:

{ "fallback": { "currency": "EUR" }, "expected_values": { "sender_VAT": ["BE0123456789", "BE0987654321"] }, "whitelist": {}, "blacklist": { "sender_VAT": "BE0111222333" } }

 

Adding predictor settings for line items

The same logic applies to line items. Let’s use the following examples:

  • You process invoices with tables

  • In majority of your tables, you notice the description contains the package size of the shipped item. You know that it’s always a volume that looks like this: 20x30x40cm. You can add that to the nested_field_patterns so the model knows that the volume can be found inside the description.

{ "table_extraction_settings": { "field_settings": { "description": { "nested_field_patterns": { "volume": "\\d+x\\d+x\\d+cm" } } } } }

 

 

 

Overwriting:

  • It is possible to leave out a parameter for example having “expected_pattern” and “default_value” but not “override_default_patterns” and “nested_field_patterns” in the payload. However, if we leave out a parameter, this overrides the existing value for that parameter (by being empty).

    • However if we have “fallback” values set, and we now just send payload for table extraction, then we will not “erase” what’s in the fallback

  • So we should use “PATCH” as if it will completely replace everything (so if we have existing predictor settings and we just want to add some rules, we need to first GET, the modify what we want to change, and PATCH the whole new thing)

 

Notes:

  • Scope: If the scope is set for a project, then the predictor setting will be applied to all inboxes that are inside the project

 

Shortcomings: