Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

You can apply different limits

Granularity

...

to the place you are searching in. For this we use the notion of a python slice. To specify the part of the full object you want, you need to specify a list of slices. This syntax of a slice is as follows:

Code Block
[start, stop]    # items from start through stop-1
[-start:-stop]   # items from start (counting from end) through stop-1 (counting from end)
[start]          # items from start through end (only allowed for the last slice)
[-start]         # items frpm start (counting from end) through end (only allowed for the last slice)

These are the 5 options in limits:

  1. document_types: list of document types which can be combined.

  2. pages: list of slices to specify which pages you want to search in.

  3. lines: list of slices to specify which lines you want to search in.

  4. characters: list of slices to specify which characters you want to search in.

If we go back to our example and only want to look in the first 10 and last 20 characters of the email_from our rule would now look like this:

Code Block
languagejson
{
  "confidence": 97,
  "+rule": ["L:no-reply@contract.fit"],
  "where_to_search": {
    "search_in": ["email_from"],
    "limits": [[0,10], [-20]]}
}

Granularity

TODO

Other operators next to “+rule”

In some cases matching normal regexes in specified places don’t accomodate your needs anymore. /just regexes - which already have and, or and not operators built-in - does not accomodate your rules anymore. To satisfy your needs we give you the ability to combine different regexes with higher level and-or-not operators and allowed to extend them with different low level operators like the +rule. A not operator you can specify by replacing the + in front of your operator by a -.

This is the full list of operators with a dash between the higher level and low level operators:

  • +and: list of higher or lower level operators

  • -and: not version of and

  • +or

  • -or: not version of or

...

  • +rule

  • -rule: not version of rule

  • +lemma

  • -lemma: not version of lemma

Let’s say we want to match the emails coming from no-reply@contract.fit if and only if in the body we don’t see a phone number. Our rule would now look like this:

Code Block
languagejson
{
  "confidence": 97,
  "+and": [
    {
      "+rule": ["L:no-reply@contract.fit"],
      "where_to_search": {
        "search_in": ["email_from"],
        "limits": [[0,10], [-20]]}
    },
    {
      "-rule": ["L:\\+32\\d{9}"],
      "where_to_search": {
        "search_in": ["email_body"]
      }
    }
  ]
}

Note

IMPORTANT: Regex is quite a bit more efficient than the and/or operators. Try to use regexes as much as possible.

Info

Note that when using different operators the where_to_search will be passed down. If on a lower level one is found, that one will be used.

This way you can:

  • Specify a granularity that applies to different and/or rules

  • Limit the search space for different and/or rules without having to define the where_to_search multiple times

Lemmatisation

FAQ

Expand
titleHow can I only look in the email subject for my regex?

By adding the option where_to_search::search_in to your rule. An example field would look like this:

Code Block
languagejson
"rules": [
  {
      "confidence": 97,                     
      "+rule": ["L:no-reply@contract.fit"]     
      "where_to_search":
        {
          "search_in": ["email_subject"]
        }
  }
]

...