...
Per tag field (level of no_reply
and info
) you can specify a list of rules with a confidence per rule. The rule that matched with the highest confidence will be presented in the prediction. If we see no-reply@contract.fit
somewhere in the uploaded document, a prediction with confidence 97 for the tag_option no_reply
will be returned.
“+rule”
The simplest version of a rule should be specified with +rule
the value with this key is a list of string prefixed with L: or D:. This list will be concatenated into one regex.
...
D stands definition which prefixes a variable.
Specifying the search space
The rule we have specified now will apply to the whole document, this is way too broad. We only want to search inside the email field which specifies the from. For this we can use the where_to_search
option specified on the same level of the +rule
. To specify where and how to search there are 4 options. All of these are optional:
...
Code Block | ||
---|---|---|
| ||
{ "confidence": 97, "+rule": ["L:no-reply@contract.fit"] "where_to_search": {"search_in": ["email_from"]} } |
Limits
Anchor | ||||
---|---|---|---|---|
|
You can apply different limits to the search space of your query.
...
Code Block | ||
---|---|---|
| ||
{ "confidence": 97, "+rule": ["L:no-reply@contract.fit"], "where_to_search": { "search_in": ["email_from"], "limits": { "characters": [[0,10], [-20]] } } } |
Granularity
Anchor | ||||
---|---|---|---|---|
|
The granularity allows you to specify in which blocks of text we want to search.
...
full (default if nothing is specified)
page
sentence
paragraph
line
Other operators next to “+rule”
In some cases matching just regexes - which already have and, or and not operators built-in - does not accomodate your rules anymore. To satisfy your needs we give you the ability to combine different regexes with higher level and-or-not operators and allowed to extend them with different low level operators like the +rule. A not operator you can specify by replacing the + in front of your operator by a -.
...
Code Block | ||
---|---|---|
| ||
{ "confidence": 97, "+and": [ { "+rule": ["L:no-reply@contract.fit"], "where_to_search": { "search_in": ["email_from"], "limits": [[0,10], [-20]]} }, { "-rule": ["L:\\+32\\d{9}"], "where_to_search": { "search_in": ["email_body"] } } ] } |
Lemma
This is a list of strings that are contained in the granularity. Here we don’t look at the original text, but the lemmatised version of the text.
...
Note |
---|
IMPORTANT: Regex is quite a bit more efficient than the and/or operators. Try to use regexes as much as possible. |
Info |
---|
Note that when using different operators the where_to_search will be passed down. If on a lower level one is found, that one will be used. This way you can:
|
Variables
Anchor | ||||
---|---|---|---|---|
|
When writing rules you might come across the case where you have added the same regex in many rules. With variables you can avoid this problem. Specify them on the highest level in a dictionary and use them with the prefix D: in your rules. Re-using the telephone number, our full dictionary would now look like this:
Code Block | ||
---|---|---|
| ||
{ "no_reply": { "variables": { "var1": ["L:\\+32\\d{9}"] }, "rules": [ { "confidence": 97, "+and": [ { "+rule": ["L:no-reply@contract.fit"], "where_to_search": { "search_in": ["email_from"], "limits": [[0,10], [-20]]} }, { "-rule": ["D:var1"], "where_to_search": { "search_in": ["email_body"] } } ] } ] } } |
FAQ
Expand | |||||
---|---|---|---|---|---|
| |||||
By adding the option where_to_search::search_in to your rule. An example field would look like this:
|
...