...
Code Block | ||
---|---|---|
| ||
{ "confidence": 97, "+rule": ["L:no-reply@contract.fit"], "where_to_search": { "search_in": ["email_from"], "limits": [[0,10], [-20]]} } |
Granularity
TODOThe granularity allows you to specify in which blocks of text we want to search.
The options are:
full (default if nothing is specified)
page
sentence
paragraph
line
Other operators next to “+rule”
...
+and: list of higher or lower level operators which should all match
-and: not version of and
+or: list of higher or lower level operators for which one should match
-or: not version of or
...
+rule: list of strings starting with L: or D: . When evaluating they will be appended into one regex
-rule: not version of rule
+lemma: list of strings which are contained in the lemma (add link here)
-lemma: not version of lemma
...
Code Block | ||
---|---|---|
| ||
{ "confidence": 97, "+and": [ { "+rule": ["L:no-reply@contract.fit"], "where_to_search": { "search_in": ["email_from"], "limits": [[0,10], [-20]]} }, { "-rule": ["L:\\+32\\d{9}"], "where_to_search": { "search_in": ["email_body"] } } ] } |
Lemma
This is a list of strings that are contained in the granularity. Here we don’t look at the original text, but the lemmatised version of the text.
TODO: add example
Note |
---|
IMPORTANT: Regex is quite a bit more efficient than the and/or operators. Try to use regexes as much as possible. |
Info |
---|
Note that when using different operators the where_to_search will be passed down. If on a lower level one is found, that one will be used. This way you can:
|
...
Variables
When writing rules you might come across the case where you have added the same regex in many rules. With variables you can avoid this problem. Specify them on the highest level in a dictionary and use them with the prefix D: in your rules. Re-using the telephone number, our full dictionary would now look like this:
Code Block | ||
---|---|---|
| ||
{
"no_reply": {
"variables": {
"var1": ["L:\\+32\\d{9}"]
},
"rules": [
{
"confidence": 97,
"+and": [
{
"+rule": ["L:no-reply@contract.fit"],
"where_to_search": {
"search_in": ["email_from"],
"limits": [[0,10], [-20]]}
},
{
"-rule": ["D:var1"],
"where_to_search": {
"search_in": ["email_body"]
}
}
]
}
]
}
} |
FAQ
Expand | |||||
---|---|---|---|---|---|
| |||||
By adding the option where_to_search::search_in to your rule. An example field would look like this:
|
Expand | ||
---|---|---|
| ||
In the front-end you can go to a certain page and select the text view (see attached image). This way you can copy the text you are searching in. On this site you can test the regex you created: https://regex101.com/. PS: Don’t forget to set the language to python on the left hand side of the screen and remove the double escaping. |
Expand | |||||
---|---|---|---|---|---|
| |||||
By adding this option in the tag field and making a new rule in the predictor settings. Note that the confidence should be slightly lower than the lowest confidence of the other rules since the . will match anything. An example rule would look as follows:
|