Ever wanted to add a field that does not exist in our out-of-the-box models and you thought: We don’t need a smart model to do this. I just need to look at these certain words. For this use-case we provide a predictor you can control through the predictor settings. First add a tag field to your format. We choose the name email_coming_from
with the tag options no_reply
and info
After, add your specified rule as a regex in the predictor settings. On the swagger page of your server you can find the endpoint /predictor_settings/{scope}. The scope is the inbox/project for which you would want this predictor to run. Inside key_value_pairs::rule_config you can specify per tag_field which regexes you want to match for a given field_name. Note that the field_name and tag options have to match exactly (case sensitive). Otherwise the prediction will be empty. We created these rules to match no-reply@contract.fit
and info@contract.fit
respectively:
{ "key_value_pairs": { "rule_config": { "email_coming_from": { "no_reply": { "rules": [ { "confidence": 97, "+rule": ["L:no-reply@contract.fit"] } ] }, "info": { "rules": [ { "confidence": 97, "+rule": ["L:info@contract.fit", "L:"] } ] } } } } }
Per tag field (level of no_reply
and info
) you can specify a list of rules with a confidence per rule. The rule that matched with the highest confidence will be presented in the prediction. If we see no-reply@contract.fit
somewhere in the uploaded document, a prediction with confidence 97 for the tag_option no_reply
will be returned.
Specifying the search space
The rule we have specified now will look in the whole document, this is way too broad. We only want to search inside the email field which specifies the from. For this we can use the where_to_search
option specified on the same level of the +rule
. To specify where and how to search there are 4 options. All of these are optional:
preprocess_text: This is a bool specifying if the text needs to be cleaned.
search_in: This field specifies which part of the text we look in. There are 5 options which can be combined. When left empty we will look in all these options:
email_from
email_to
email_subject
email_body
attachment
limits: This field specifies which limits we apply to our search_space (add link to further here).
granularity: This field specifies what the granularity for a match should be (add link to further here).
In the case of our example the rule would now look like this:
{ "confidence": 97, "+rule": ["L:no-reply@contract.fit"] "where_to_search": {"search_in": ["email_from"]} }
Limits
You can apply different limits
Granularity
And-or-not logic
In some cases matching normal regexes in specified places don’t accomodate your needs anymore. /