Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Specifying the search space

The rule we have specified now will apply to the whole document, this is way too broad. We search space is the area of the document our rules apply to, ie. where they try to find a matching result. Coming back to our first example above, the rules here applied to the whole document. This is too broad for this use-case (“is the sender no-reply@contract.fit or info@contract.fit ?”). In this case we only want to search inside the email field which specifies the fromsender.

For this we can use the where_to_search option specified on the same level of the +rule. To specify where and how to search there are 4 options. All of these .

This function contains 4 options, all are optional:

  1. preprocess_text: This is a boolean specifying if the text needs to be cleaned (= removal of noise such as unexpected characters eg. multiple dashes).

  2. search_in (for email): This field specifies which part of the email we look in. There are 5 options which can be combined. When left empty we will look in all these options:

    1. email_from

    2. email_to

    3. email_subject

    4. email_body

    5. attachment

  3. limits: This field specifies which limits we apply to our search_space: more info here.granularity: This field specifies what the granularity for a match should be: more info here.. See below.

  4. Granularity

Search_in

In the case of our example, we only want to look in the email_from, so the rule will look like thiswe add the following:

Code Block
languagejson
{
  "confidence": 97,
  "+rule": ["L:no-reply@contract.fit"]
  "where_to_search": {"search_in": ["email_from"]}
}

...

  1. document_types: list of document types which , these can be combined.

  2. pages: list of slices to specify which pages you want to search in.

  3. lines: list of slices to specify which lines you want to search in.

  4. characters: list of slices to specify which characters you want to search in.

...

  1. full (default if nothing is specified)

  2. page

  3. sentence

  4. paragraph

  5. line

...

Logical additions to “+rule”

In some cases matching just regexes - which already have and, or and not operators built-in - does not accomodate your rules anymore. To satisfy your needs we give you the ability to combine different regexes with higher level and-or-not operators and allowed to extend them with different low level operators like the +rule. A not operator you can specify by replacing the + in front of your operator by a -.

...