...
Per tag field (level of no_reply
and info
) you can specify a list of rules with a confidence per rule. The rule that matched with the highest confidence will be presented in the prediction. If we see no-reply@contract.fit
somewhere in the uploaded document, a prediction with confidence 97 for the tag_option no_reply
will be returned.
“+rule”
incl L & D clarification
Specifying the search space
...
You can apply different limits to limit the search space of your query. For this we use the notion of a python slice.
These are the 4 options in to limits:
document_types: list of document types which can be combined.
pages: list of slices to specify which pages you want to search in.
lines: list of slices to specify which lines you want to search in.
characters: list of slices to specify which characters you want to search in.
To specify the part of the full object you want, you need to specify a list of slices. The syntax of a slice is as follows: For option 2, 3 and 4 we use the method of a Python Slice. They require a specified start and stop value to be defined.
Code Block |
---|
[start, stop] # items from start through stop-1 [-start:-stop] # items from start (counting from end) through stop-1 (counting from end) [start] # items from start through end (only allowed for the last slice) [-start] # items frpm start (counting from end) through end (only allowed for the last slice) |
If we go back to Eg. Imagine that in our example and we only want to look in the first 10 and last 20 characters of the email_from. In this case we would change our rule would now look like thisas follows:
Code Block | ||
---|---|---|
| ||
{ "confidence": 97, "+rule": ["L:no-reply@contract.fit"], "where_to_search": { "search_in": ["email_from"], "limits": { "characters": [[0,10], [-20]] } } } |
...
+rule: list of strings starting with L: or D: . When evaluating they will be appended into one regex
-rule: not version of rule+lemma: list of strings which are contained in the lemma (add link here)
-lemma: not version of lemma
Let’s say we want to match the emails coming from no-reply@contract.fit if and only if in the body we don’t see a phone number. Our rule would now look like this:
...
This is a list of strings that are contained in the granularity. Here we don’t look at the original text, but the lemmatised version of the text.
+lemma: list of strings which are contained in the lemma (add link here)
-lemma: not version of lemma
TODO: add example
Note |
---|
IMPORTANT: Regex is quite a bit more efficient than the and/or operators. Try to use regexes as much as possible. |
Info |
---|
Note that when using different operators the where_to_search will be passed down. If on a lower level one is found, that one will be used. This way you can:
|
...