1 | Code | Description |
2 | | Opening bracket for the entire JSON format |
3 | Code Block |
---|
"id": "SAMPLE_ID", |
| Information on the universally unique identifier (UUID) of the file |
4 | Code Block |
---|
"original_filename": "SAMPLE_ORIGINAL_FILENAME", |
| Information on the original file name including the extension |
5 | Code Block |
---|
"inbox": "SAMPLE_INBOX", |
| Information on the the universally unique identifier (UUID) of the inbox |
6 | Code Block |
---|
"page_count": SAMPLE_PAGE_COUNT, |
| Information on the total number of pages contained in the file |
7 | Code Block |
---|
"prediction": { |
| Information on the predictions made for the fields of the predicted document format The prediction is the information that is read and extracted automatically by the machine upon processing the uploaded file. |
8 | Code Block |
---|
"annotations": {
"SAMPlE_FIELDNAME": [
{
"text": "SAMPLE_TEXT",
"confidence": SAMPLE_CONFIDENCE,
"version": "SAMPLE_VERSION",
"value": "SAMPLE_VALUE",
"upper_left": SAMPLE_UPPER_LEFT_ARRAY,
"lower_right": SAMPLE_LOWER_RIGHT_ARRAY,
"flag_for_review": SAMPLE_FFR
}
]
}, |
| Information on the predictions made for any header field Per field prediction, we have the following pieces of information: field_name, text, confidence, version, value, upper_left, lower_right, and flag_for_review, included from line 2 until line 12. The same structure will appear for each field prediction made. |
9 | Code Block |
---|
"lines": {
"SAMPLE_LI_TABLE": [
{
"SAMPLE_LI_FIELDNAME": {
"text": "SAMPLE_LI_TEXT",
"value": "SAMPLE_LI_VALUE",
"confidence": SAMPLE_LI_CONFIDENCE,
"upper_left": SAMPLE_UPPER_LEFT_ARRAY,
"lower_right": SAMPLE_LOWER_RIGHT_ARRAY,
"flag_for_review": SAMPLE_LI_FFR
}
]
}, |
| Information on the predictions made for any line-item field Per line-item field prediction, we have the following pieces of information: table_name, field_name, text, value, confidence, upper_left, lower_right, and flag_for_review, included from line 2 until line 12. The same structure will appear for each line-item field prediction made. |
10 | Code Block |
---|
"sections": [
{
"page": SAMPLE_PAGE_START,
"document_type": "SAMPLE_DOCUMENT_TYPE",
"format": "SAMPLE_FORMAT",
"confidence": SAMPLE_SECTIONS_CONFIDENCE,
"version": "SAMPLE_SECTIONS_VERSION",
"flag_for_review": SAMPLE_SECTIONS_FFR
}
] |
| Information on the split sections of the file One file (ex: pdf) can contain more than one document (ex: invoice) with multiple pages (page: 0, 1, 2, etc.). One section is a logical group of pages that make up one document. Per section, we have the following pieces of information: page, document_type, format, confidence, version, and flag_for_review, included from line 2 until line 9. The same structure will appear for each document section predicted. |
11 | | Closing bracket for prediction |
12 | | Information on the feedback made for the fields of the predicted document format The feedback is the information sent from the human reviewer to the machine. This information is used for the evaluation of statistics. |
13 | Code Block |
---|
"annotations": {
"SAMPLE_FIELDNAME": [
{
"text": "SAMPLE_TEXT",
"gold": "SAMPLE_GOLD",
"visual_coord": VISUAL_COORD_ARRAY,
"textual_coord": TEXTUAL_COORD_ARRAY,
"confidence": SAMPLE_CONFIDENCE,
"evaluation": "SAMPLE_EVALUATION",
"flag_for_review": SAMPLE_FFR
}
],
}, |
| Information on the feedback made for any header field Per field feedback, we have the following pieces of information: field_name, text, gold (for FE feedback), visual_coord, textual_coord, confidence (only for GET), evaluation (only for GET), and flag_for_review (only for GET), included from line 2 until line 12. The same structure will appear for each field feedback made. When feedback is sent from the front-end, we will have one annotation feedback for all of the fields in the format, even if we send empty feedback (TP, TN, FP, FN). When feedback is sent from the back-end, we will only have annotation feedback for all of the fields which we explicitly post feedback for. SAMPLE_FIELDNAME =
SAMPLE_TEXT =
the value that was highlighted/selected on the document by the human reviewer A string of characters, ex: 30 dec. 2021
SAMPLE_GOLD = (only present if feedback is sent through FE)
the value that was highlighted/selected on the document by the human reviewer but formatted into the data_type that is configured in the format A string of characters / For data_types that are “date”, the date format, ex: 30/12/2021
VISUAL_COORD_ARRAY and TEXTUAL_COORD_ARRAY arrays =
"visual_coord" = upper_left and lower_right coordinates based on the pixels of the document image "textual_coord" = upper_left and lower_right coordinates based on the document text Both visual_coord and textual_coord arrays contain 6 numbers: 1st number = page number of the upper_left 2nd number = row number of the upper_left 3rd number = column number of the upper_left 4th number = page number of the lower_right 5th number = row number of the lower_right 6th number = column number of the lower_right
Code Block |
---|
[
0,
25,
82
0,
29,
95
] |
When a prediction is manually typed in, computed or doesn’t directly come from somewhere on the document, it will give: "visual_coord": [0, 0, 0, 0, 0, 0] "textual_coord": [0, 0, -1, 0, 0, -1]
SAMPLE_CONFIDENCE = (only present on GET)
SAMPLE_EVALUATION = (only present on GET)
the outcome statistic of the feedback compared to the prediction (TP, TN, FP, FN) A 2-letter accuracy code, ex: TN
SAMPLE_FFR = (only present on GET)
An indicator that shows whether the prediction of that field is flagged for review or went STP (straight through processing) A boolean value either true or false: true meaning FFR; false meaning STP, ex: false
|
14 | Code Block |
---|
"sections": [
{
"page": SAMPLE_PAGE_START,
"document_type": "SAMPLE_DOCUMENT_TYPE",
"format": "SAMPLE_FORMAT",
"confidence": SAMPLE_SECTIONS_CONFIDENCE,
"evaluation": "SAMPLE_SECTIONS_EVALUATION",
"flag_for_review": SAMPLE_SECTIONS_FFR }
], |
| Information on the feedback of split sections of the file Per section feedback, we have the following pieces of information: page, document_type, format, confidence, evaluation, and flag_for_review, included from line 2 until line 8. The same structure will appear for each document section predicted. |
15 | Code Block |
---|
"lines": [
{
"_cls": "LineEvaluationAtom",
"table_type": "SAMPLE_LI_TABLE",
"line_elements": {
"SAMPLE_LI_FIELDNAME": {
"text": "SAMPLE_LI_TEXT",
"value": "SAMPLE_LI_VALUE",
"visual_coord": VISUAL_COORD_ARRAY,
"textual_coord": TEXTUAL_COORD_ARRAY,
"confidence": SAMPLE_LI_CONFIDENCE,
"evaluation": "FN",
"row_to_line": "SAMPLE_LI_ROW_TO_LINE_NUMBER",
"flag_for_review": SAMPLE_LI_FFR
}
], |
| Information on the feedback made for any line-item field Per line-item field feedback, we have the following pieces of information: field_name, text, value, visual_coord, textual_coord, confidence, evaluation, row_to_line and flag_for_review, included from line 6 until line 14. The same structure will appear for each line-item field prediction made within the same table. For field feedback in a new table, lines 3 until line 5 will appear before the individual feedbacks of all line-items that belong to the same table appear. |
16 | Code Block |
---|
"name": "SAMPLE_NAME",
"source": "SAMPLE_SOURCE",
"is_evaluated": SAMPLE_IS_EVALUATED,
"timestamp": "SAMPLE_TIMESTAMP",
"flag_for_review": SAMPLE_FFR,
"document_fully_correct": SAMPLE_DFC,
"sections_fully_correct": SAMPLE_SFC_ARRAY |
| Information on the feedback name the name of the feedback version. Every time you send feedback (by submitting, by reprocessing and renaming the version, or sending feedback through Swagger and adding a name to that version) SAMPLE_NAME = A string of characters ex: submitted , 2ndPassFFR , 2ndPassSTP , FieldLevelSampling , or any other custom name saved when feedback sent
Information on the source of the feedback the source where the feedback came from: “human” meaning it came from a human reviewer, “machine” meaning it came from the machine SAMPLE_SOURCE = A string of characters ex: human or machine
Information on whether the feedback is evaluated (a copy of the VERSIONS section) Information on the timestamp of the feedback Information on flag for review (a copy of the VERSIONS section) the indicator showing whether this current version was flagged for review SAMPLE_FFR = A boolean value, ex: false
Information on the document accuracy based on the feedback (a copy of the VERSIONS section) the indicator showing whether the document was predicted fully correctly or not as compared to the current version: “true” meaning the current version confirms that the prediction for the whole document was accurate, “false” meaning that the current version is different from the prediction. SAMPLE_DFC = A boolean value, ex: false
Information on the sections being fully correct (a copy of the VERSIONS section) the indicator showing whether the predictions of the sections are fully correct as compared to the current version SAMPLE_SFC_ARRAY = An array of boolean values, ex: [ true, false, false]
|
17 | | Closing bracket for feedback |
18 | Code Block |
---|
"versions": [
{ |
| Information on all the versions made for the fields of the predicted document format All versions include the initially predicted versions and all feedback versions. |
19 | Code Block |
---|
"annotations": {
"SAMPLE_FIELDNAME": [
{
"text": "SAMPLE_TEXT",
"value": "SAMPLE_VALUE",
"textual_coord": SAMPLE_TEXTUAL_COORD_ARRAY,
"confidence": SAMPLE_CONFIDENCE,
"evaluation": "SAMPLE_EVALUATION",
"version": "SAMPLE_VERSION",
"flag_for_review": SAMPLE_FFR
}
]
}, |
| Cf. rows #8 and #13 |
20 | Code Block |
---|
"sections": [
{
"page": 0,
"document_type": "order",
"format": "6102dad4914afc65bdb3c499",
"confidence": 100,
"evaluation": "FN",
"flag_for_review": false
}
], |
| Cf. rows #10 and #14 |
21 | Code Block |
---|
"lines": [
{
"_cls": "LineEvaluationAtom",
"table_type": "SAMPLE_LI_TABLE",
"line_elements": {
"SAMPLE_LI_FIELDNAME": {
"text": "SAMPLE_LI_TEXT",
"value": "SAMPLE_LI_VALUE",
"visual_coord": VISUAL_COORD_ARRAY,
"textual_coord": TEXTUAL_COORD_ARRAY,
"confidence": SAMPLE_LI_CONFIDENCE,
"evaluation": "FN",
"row_to_line": "SAMPLE_LI_ROW_TO_LINE_NUMBER",
"flag_for_review": SAMPLE_LI_FFR
}
], |
| Cf. rows #9 and #15 |
22 | Code Block |
---|
"name": "SAMPLE_NAME",
"source": "SAMPLE_SOURCE",
"is_evaluated": SAMPLE_IS_EVALUATED,
"timestamp": "SAMPLE_TIMESTAMP",
"flag_for_review": SAMPLE_FFR,
"document_fully_correct": SAMPLE_DFC,
"sections_fully_correct": SAMPLE_SFC_ARRAY |
| Cf. row #16 |
23 | | Closing bracket for versions |
24 | Code Block |
---|
"files": [
{
"filename": "SAMPLE_FILES_FILENAME",
"page": SAMPLE_FILES_PAGE,
"page_count": SAMPLE_FILES_PAGE_COUNT,
"filehash": "SAMPLE_FILEHASH",
"leaf": SAMPLE_LEAF,
"embedded_attachment": SAMPLE_EMBEDDED_ATTACHMENT
}
], |
| Information on the files attached Per file included in the main file, we have the following pieces of information: filename, page, page_count, filehash, leaf, and embedded_attachment, included from line 2 until line 9. The same structure will appear for each file included in the main file. A main file can be for example an email, while an embedded image inside the email body and a PDF file attached to the email are files contained in the main file. |
25 | Code Block |
---|
"usage_data": {
"pages": SAMPLE_USAGE_PAGES,
"sections": SAMPLE_USAGE_SECTIONS,
"annotations": SAMPLE_USAGE_ANNOTATIONS,
"lines": SAMPLE_USAGE_LINES,
"line_items": SAMPLE_USAGE_LINE_ITEMS,
"total": SAMPLE_USAGE_TOTAL
}, |
| Information on the usage data of the file This information only appears once for the whole JSON file. It pertains to the volume statistics of the file and it is created automatically upon upload. This information is however updated and replaced by the volume statistics of any newer reprocessed version of the file. |
26 | Code Block |
---|
"meta_information": {
"reference_field": "SAMPLE_REFERENCE_FIELD"
}, |
| Information on the meta info of the file This is a wildcard attribute, more specifically for internal use. This is where more custom or extra information is kept. Usually, the reference_field is contained in this attribute, however there can also be other fields. SAMPLE_REFERENCE_FIELD =
the reference number of the uploaded file for internal use A fixed format containing an M, a short date stamp, and a counter of how many files have been uploaded that day, ex: M20210130-0000005
|
27 | Code Block |
---|
"flag_for_review": SAMPLE_FFR_VALUE, |
| Information on the FFR attribute |
28 | Code Block |
---|
"timings": {
"receive_time": "SAMPLE_RECEIVE_TIME",
"start_time": "SAMPLE_START_TIME",
"done_time": "SAMPLE_DONE_TIME",
"feedback_time": "SAMPLE_FEEDBACK_TIME",
"processing_period": SAMPLE_PROCESSING_PERIOD
}, |
| Information on the timings of the whole document |
29 | Code Block |
---|
"lock": {
"value": SAMPLE_LOCKED_VALUE,
"since": "SAMPLE_LOCKED_SINCE",
"by": "SAMPLE_LOCKED_BY"
}, |
| Information on the lock attribute |
30 | Code Block |
---|
"escalate": {
"value": SAMPLE_ESCALATE_VALUE,
"since": "SAMPLE_ESCALATED_SINCE",
"by": "SAMPLE_ESCALATED_BY"
}, |
| Information on the escalate attribute |
31 | Code Block |
---|
"submitted": {
"value": SAMPLE_SUBMITTED_VALUE,
"since": "SAMPLE_SUBMITTED_SINCE",
"by": "SAMPLE_SUBMITTED_BY"
}, |
| Information on the submit attribute |
32 | Code Block |
---|
"reject": {
"value": SAMPLE_REJECT_VALUE,
"since": "SAMPLE_REJECTED_SINCE",
"by": "SAMPLE_REJECTED_BY"
}, |
| Information on the reject attribute |
33 | Code Block |
---|
"last_version": SAMPLE_LAST_VERSION, |
| Information on the last_version attribute |
34 | Code Block |
---|
"status_data": {
"reject": SAMPLE_REJECT,
"sampling": SAMPLE_SAMPLING,
"lock": SAMPLE_LOCK,
"feedback": SAMPLE_FEEDBACK,
"escalate": SAMPLE_ESCALATE,
"success": SAMPLE_SUCCESS,
"archived": SAMPLE_ARCHIVED,
"reject_attempts": SAMPLE_REJECT_ATTEMPTS,
"submit_attempts": SAMPLE_SUBMIT_ATTEMPTS,
"ready_attempts": SAMPLE_READY_ATTEMPTS
} |
| Information on the status data of the whole document considering predicted , feedback , and versions : SAMPLE_REJECT: boolean value to indicate whether the file was rejected by a human reviewer or not ex: true SAMPLE_SAMPLING: boolean value to indicate whether the file went through a 2ndPass sampling or not ex: true SAMPLE_LOCK: boolean value to indicate whether the file is locked by a user and the lock has not yet expired ex: true SAMPLE_FEEDBACK: boolean value to indicate whether the file has received feedback or not ex: true SAMPLE_ESCALATE: boolean value to indicate whether the file was escalated by a human reviewer or not ex: true SAMPLE_SUCCESS: boolean value to indicate whether that file was successfully processed,ex: true SAMPLE_ARCHIVED: boolean value to indicate whether that file was archived (ie original file was deleted), ex: true SAMPLE_REJECT_ATTEMPTS: An integer to count how many times the file a reject message was sent to the client’s site, ex: 0 SAMPLE_SUBMIT_ATTEMPTS: An integer to count how many times the file a submit message was sent to the client’s site, ex: 0 SAMPLE_READY_ATTEMPTS: An integer to count how many times the file a ready message was sent to the client’s site, ex: 0
|
35 | }
| Closing bracket for the entire JSON format |