Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Per tag field (level of no_reply and info) you can specify a list of rules with a confidence per rule. The rule that matched with the highest confidence will be presented in the prediction. If we see no-reply@contract.fit somewhere in the uploaded document, a prediction with confidence 97 for the tag_option no_reply will be returned.

“+rule”

incl L & D clarification

Specifying the search space

...

You can apply different limits to limit the search space of your query. For this we use the notion of a python slice.

These are the 4 options in to limits:

  1. document_types: list of document types which can be combined.

  2. pages: list of slices to specify which pages you want to search in.

  3. lines: list of slices to specify which lines you want to search in.

  4. characters: list of slices to specify which characters you want to search in.

To specify the part of the full object you want, you need to specify a list of slices. The syntax of a slice is as follows: For option 2, 3 and 4 we use the method of a Python Slice. They require a specified start and stop value to be defined.

Code Block
[start, stop]    # items from start through stop-1
[-start:-stop]   # items from start (counting from end) through stop-1 (counting from end)
[start]          # items from start through end (only allowed for the last slice)
[-start]         # items frpm start (counting from end) through end (only allowed for the last slice)

If we go back to Eg. Imagine that in our example and we only want to look in the first 10 and last 20 characters of the email_from. In this case we would change our rule would now look like thisas follows:

Code Block
languagejson
{
  "confidence": 97,
  "+rule": ["L:no-reply@contract.fit"],
  "where_to_search": {
    "search_in": ["email_from"],
    "limits": {
      "characters": [[0,10], [-20]]
    }
  }
}

...

  • +rule: list of strings starting with L: or D: . When evaluating they will be appended into one regex

  • -rule: not version of rule+lemma: list of strings which are contained in the lemma (add link here)

  • -lemma: not version of lemma

Let’s say we want to match the emails coming from no-reply@contract.fit if and only if in the body we don’t see a phone number. Our rule would now look like this:

...

This is a list of strings that are contained in the granularity. Here we don’t look at the original text, but the lemmatised version of the text.

  • +lemma: list of strings which are contained in the lemma (add link here)

  • -lemma: not version of lemma

TODO: add example

Note

IMPORTANT: Regex is quite a bit more efficient than the and/or operators. Try to use regexes as much as possible.

Info

Note that when using different operators the where_to_search will be passed down. If on a lower level one is found, that one will be used.

This way you can:

  • Specify a granularity that applies to different and/or rules

  • Limit the search space for different and/or rules without having to define the where_to_search multiple times

...