Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Per tag field (level of no_reply and info) you can specify a list of rules with a confidence per rule. The rule that matched with the highest confidence will be presented in the prediction. If we see no-reply@contract.fit somewhere in the uploaded document, a prediction with confidence 97 for the tag_option no_reply will be returned.

“+rule”

incl L & D clarificationThe simplest version of a rule should be specified with +rule the value with this key is a list of string prefixed with L: or D:. This list will be concatenated into one regex.

L stands for literal and prefixes a normal regex. Note that the regex should be double escaped, so the regex for digit becomes \\d instead of \d.

D stands definition which prefixes a variable.

Specifying the search space

...

  • +lemma: list of strings which are contained in the lemma (add link here)

  • -lemma: not version of lemma

TODO: add example

Note

IMPORTANT: Regex is quite a bit more efficient than the and/or operators. Try to use regexes as much as possible.

Info

Note that when using different operators the where_to_search will be passed down. If on a lower level one is found, that one will be used.

This way you can:

  • Specify a granularity that applies to different and/or rules

  • Limit the search space for different and/or rules without having to define the where_to_search multiple times

Variables
Anchor
Variables
Variables

When writing rules you might come across the case where you have added the same regex in many rules. With variables you can avoid this problem. Specify them on the highest level in a dictionary and use them with the prefix D: in your rules. Re-using the telephone number, our full dictionary would now look like this:

...