...
Specifying the search space
The rule we have specified now will apply to the whole document, this is way too broad. We search space is the area of the document our rules apply to, ie. where they try to find a matching result. Coming back to our first example above, the rules here applied to the whole document. This is too broad for this use-case (“is the sender no-reply@contract.fit
or info@contract.fit
?”). In this case we only want to search inside the email field which specifies the fromsender.
For this we can use the where_to_search
option specified on the same level of the +rule
. To specify where and how to search there are 4 options. All of these .
This function contains 4 options, all are optional:
preprocess_text: This is a boolean specifying if the text needs to be cleaned (= removal of noise such as unexpected characters eg. multiple dashes).
search_in (for email): This field specifies which part of the email we look in. There are 5 options which can be combined. When left empty we will look in all these options:
email_from
email_to
email_subject
email_body
attachment
limits: This field specifies which limits we apply to our search_space: more info here.granularity: This field specifies what the granularity for a match should be: more info here.. See below.
Granularity
Search_in
In the case of our example, we only want to look in the email_from, so the rule will look like thiswe add the following:
Code Block | ||
---|---|---|
| ||
{ "confidence": 97, "+rule": ["L:no-reply@contract.fit"] "where_to_search": {"search_in": ["email_from"]} } |
...
document_types: list of document types which , these can be combined.
pages: list of slices to specify which pages you want to search in.
lines: list of slices to specify which lines you want to search in.
characters: list of slices to specify which characters you want to search in.
...
full (default if nothing is specified)
page
sentence
paragraph
line
...
Logical additions to “+rule”
In some cases matching just regexes - which already have and, or and not operators built-in - does not accomodate your rules anymore. To satisfy your needs we give you the ability to combine different regexes with higher level and-or-not operators and allowed to extend them with different low level operators like the +rule. A not operator you can specify by replacing the + in front of your operator by a -.
...