JSON response detailed explanation

JSON Syntax Rules

  • Data is in name/value pairs

  • Data is separated by commas

  • Curly braces hold objects

  • Square brackets hold arrays

 

Skeleton of the JSON format

{ "id": "SAMPLE_ID", "original_filename": "SAMPLE_ORIGINAL_FILENAME", "inbox": "SAMPLE_INBOX", "page_count": SAMPLE_PAGE_COUNT, "prediction": { "annotations": {}, "lines": {}, "sections": {}, }, "feedback": { "annotations": {}, "sections": {}, "lines": {}, "name": "SAMPLE_NAME", "source": "SAMPLE_SOURCE", "is_evaluated": SAMPLE_IS_EVALUATED, "timestamp": "SAMPLE_TIMESTAMP", "flag_for_review": SAMPLE_FFR, "document_fully_correct": SAMPLE_DFC, "sections_fully_correct": [] }, "versions": { "annotations": {}, "sections": {}, "lines": {}, "name": "SAMPLE_NAME", "source": "SAMPLE_SOURCE", "is_evaluated": SAMPLE_IS_EVALUATED, "timestamp": "SAMPLE_TIMESTAMP", "flag_for_review": SAMPLE_FFR, "document_fully_correct": SAMPLE_DFC, "sections_fully_correct": [] }, "files": [], "usage_data": {}, "meta_information": {}, "flag_for_review": SAMPLE_FFR_VALUE, "timings": {}, "lock": {}, "escalate": {}, "submitted": {}, "reject": {}, "last_version": null, "status_data": {}, }

 

1

Code

Description

2
{

Opening bracket for the entire JSON format

3
"id": "SAMPLE_ID",

Information on the universally unique identifier (UUID) of the file

  • SAMPLE_ID = A string of 24 alphanumeric characters, ex: 1234a56b789c0de123456fg7

4

Information on the original file name including the extension

  • SAMPLE_ORIGINAL_FILENAME = A string of characters, ex: invoice.pdf

5

Information on the the universally unique identifier (UUID) of the inbox

  • SAMPLE_INBOX = A string of 24 alphanumeric characters, ex: 1234a56b789c0de123456fg7

6

Information on the total number of pages contained in the file

  • SAMPLE_PAGE_COUNT = An integer starting at 0, ex: 4

7

Information on the predictions made for the fields of the predicted document format

The prediction is the information that is read and extracted automatically by the machine upon processing the uploaded file.

  • It holds the following sub-groups of information in order

    • “annotations”

    • “lines”

    • “sections”

8

Information on the predictions made for any header field

Per field prediction, we have the following pieces of information: field_name, text, confidence, version, value, upper_left, lower_right, and flag_for_review, included from line 2 until line 12. The same structure will appear for each field prediction made.

  • SAMPLE_FIELDNAME =

    • the technical name of the field

    • A string of characters, ex: invoice_date

  • SAMPLE_TEXT =

    • the value read directly from the document

    • A string of characters, ex: 30 dec. 2021

  • SAMPLE_CONFIDENCE =

    • the percentage of the prediction’s confidence

    • An integer, ex: 95

  • SAMPLE_VERSION =

    • the universally unique identifier (UUID) of the version

    • A string of 36 characters, ex: 12345a12-b123-1234-c1de-fg12h1ijklm1

  • SAMPLE_VALUE =

    • the value that was read directly from the document but formatted into the data_type that is configured in the format

    • A string of characters / For data_types that are “date”, the date format, ex: 30/12/2021

  • SAMPLE_UPPER_LEFT_ARRAY and SAMPLE_LOWER_RIGHT_ARRAY =

  • "upper_left" = the location of the upper-left corner of the prediction section

  • "lower_right" = the location of the lower-right corner of the prediction section

    • Both upper_left and lower_right arrays contain 3 numbers:

      • 1st number = page number (ex: 0 for the first page of the file, 1 for the second page of the file, etc.)

      • 2nd number = row number

      • 3rd number = column number

  • SAMPLE_FFR =

    • An indicator that shows whether the prediction of that field is flagged for review or went STP (straight through processing)

    • A boolean value either true or false: true meaning FFR; false meaning STP, ex: false

9

Information on the predictions made for any line-item field

Per line-item field prediction, we have the following pieces of information: table_name, field_name, text, value, confidence, upper_left, lower_right, and flag_for_review, included from line 2 until line 12. The same structure will appear for each line-item field prediction made.

  • SAMPLE_LI_TABLE =

    • the technical name of the table

    • A string of characters, ex: line_items, receipt_lines, or TaxTotal

  • SAMPLE_LI_FIELDNAME =

    • the technical name of the line-item field

    • A string of characters, ex: unit_amount

  • SAMPLE_LI_TEXT =

    • the value read directly from the document

    • A string of characters, ex: 12,40

  • SAMPLE_LI_VALUE =

    • the value that was read directly from the document but formatted into the data_type that is configured in the format

    • A string of characters / For data_types that are “float”, the float format, ex: 12.4

  • SAMPLE_LI_CONFIDENCE =

    • the percentage of the prediction’s confidence

    • An integer, ex: 95

  • SAMPLE_UPPER_LEFT_ARRAY and SAMPLE_LOWER_RIGHT_ARRAY =

  • "upper_left" = the location of the upper-left corner of the prediction section

  • "lower_right" = the location of the lower-right corner of the prediction section

    • Both upper_left and lower_right arrays contain 3 numbers:

      • 1st number = page number (ex: 0 for the first page of the file, 1 for the second page of the file, etc.)

      • 2nd number = row number

      • 3rd number = column number

    • When a prediction is computed or doesn’t directly come from somewhere on the document, it will give:

      • "upper_left": [0, 0, -1]

      • "lower_right": [0, 0, -1]

  • SAMPLE_LI_FFR =

    • An indicator that shows whether the prediction of that line-item field is flagged for review or went STP (straight through processing)

    • A boolean value either true or false: true meaning FFR; false meaning STP, ex: false

10

Information on the split sections of the file

One file (ex: pdf) can contain more than one document (ex: invoice) with multiple pages (page: 0, 1, 2, etc.). One section is a logical group of pages that make up one document.

Per section, we have the following pieces of information: page, document_type, format, confidence, version, and flag_for_review, included from line 2 until line 9. The same structure will appear for each document section predicted.

  • SAMPLE_PAGE_START =

    • the page number of the first page of the whole section

    • An integer starting at 0, ex: 2

  • SAMPLE_DOCUMENT_TYPE =

    • the technical name of the document type

    • A string of characters, ex: credit_note

  • SAMPLE_FORMAT =

    • the universally unique identifier (UUID) of the format

    • A string of 24 alphanumeric characters, ex: 1234abc5678def90ghi1j123

  • SAMPLE_SECTIONS_CONFIDENCE =

    • the percentage of the section-prediction’s confidence

    • An float, ex: 95.0

  • SAMPLE_SECTIONS_VERSION =

    • the universally unique identifier (UUID) of the version

    • A string of 36 characters, ex: 12345a12-b123-1234-c1de-fg12h1ijklm1

  • SAMPLE_SECTIONS_FFR =

    • An indicator that shows whether the prediction of that field is flagged for review or went STP (straight through processing)

    • A boolean value either true or false: true meaning FFR; false meaning STP, ex: false

11

Closing bracket for prediction

12

 

 

Information on the feedback made for the fields of the predicted document format

The feedback is the information sent from the human reviewer to the machine. This information is used for the evaluation of statistics.

  • It holds the following objects in order

    • “annotations”

    • “sections”

    • “lines”

    • “name”

    • “source”

    • “is_evaluated”

    • “timestamp”

    • “flag_for_review”

    • “document_fully_correct”

    • “sections_fully_correct”

13

Information on the feedback made for any header field

Per field feedback, we have the following pieces of information: field_name, text, gold (for FE feedback), visual_coord, textual_coord, confidence (only for GET), evaluation (only for GET), and flag_for_review (only for GET), included from line 2 until line 12. The same structure will appear for each field feedback made.

When feedback is sent from the front-end, we will have one annotation feedback for all of the fields in the format, even if we send empty feedback (TP, TN, FP, FN). When feedback is sent from the back-end, we will only have annotation feedback for all of the fields which we explicitly post feedback for.

  • SAMPLE_FIELDNAME =

    • the technical name of the field

    • A string of characters, ex: invoice_date

  • SAMPLE_TEXT =

    • the value that was highlighted/selected on the document by the human reviewer

    • A string of characters, ex: 30 dec. 2021

  • SAMPLE_GOLD = (only present if feedback is sent through FE)

    • the value that was highlighted/selected on the document by the human reviewer but formatted into the data_type that is configured in the format

    • A string of characters / For data_types that are “date”, the date format, ex: 30/12/2021

  • VISUAL_COORD_ARRAY and TEXTUAL_COORD_ARRAY arrays =

    • "visual_coord" = upper_left and lower_right coordinates based on the pixels of the document image

    • "textual_coord" = upper_left and lower_right coordinates based on the document text

    • Both visual_coord and textual_coord arrays contain 6 numbers:

      • 1st number = page number of the upper_left

      • 2nd number = row number of the upper_left

      • 3rd number = column number of the upper_left

      • 4th number = page number of the lower_right

      • 5th number = row number of the lower_right

      • 6th number = column number of the lower_right

    • When a prediction is manually typed in, computed or doesn’t directly come from somewhere on the document, it will give:

      • "visual_coord": [0, 0, 0, 0, 0, 0]

      • "textual_coord": [0, 0, -1, 0, 0, -1]

  • SAMPLE_CONFIDENCE = (only present on GET)

    • the percentage of the feedback’s confidence

    • An integer, ex: 95

  • SAMPLE_EVALUATION = (only present on GET)

    • the outcome statistic of the feedback compared to the prediction (TP, TN, FP, FN)

    • A 2-letter accuracy code, ex: TN

  • SAMPLE_FFR = (only present on GET)

    • An indicator that shows whether the prediction of that field is flagged for review or went STP (straight through processing)

    • A boolean value either true or false: true meaning FFR; false meaning STP, ex: false

14

 

 

Information on the feedback of split sections of the file

Per section feedback, we have the following pieces of information: page, document_type, format, confidence, evaluation, and flag_for_review, included from line 2 until line 8. The same structure will appear for each document section predicted.

  • SAMPLE_PAGE_START =

    • the page number of the first page of the whole section

    • An integer starting at 0, ex: 2

  • SAMPLE_DOCUMENT_TYPE =

    • the technical name of the document type

    • A string of characters, ex: credit_note

  • SAMPLE_FORMAT =

    • the universally unique identifier (UUID) of the format

    • A string of 24 alphanumeric characters, ex: 1234abc5678def90ghi1j123

  • SAMPLE_SECTIONS_CONFIDENCE =

    • the percentage of the section-prediction’s confidence

    • An float, ex: 95.0

  • SAMPLE_SECTIONS_EVALUATION =

    • the outcome statistic of the feedback compared to the prediction (TP, TN, FP, FN)

    • A 2-letter accuracy code, ex: TN

  • SAMPLE_SECTIONS_FFR =

    • An indicator that shows whether the prediction of that field is flagged for review or went STP (straight through processing)

    • A boolean value either true or false: true meaning FFR; false meaning STP, ex: false

15

Information on the feedback made for any line-item field

Per line-item field feedback, we have the following pieces of information: field_name, text, value, visual_coord, textual_coord, confidence, evaluation, row_to_line and flag_for_review, included from line 6 until line 14. The same structure will appear for each line-item field prediction made within the same table. For field feedback in a new table, lines 3 until line 5 will appear before the individual feedbacks of all line-items that belong to the same table appear.

  • SAMPLE_LI_TABLE =

    • the technical name of the table

    • A string of characters, ex: line_items, receipt_lines, or TaxTotal

  • SAMPLE_LI_FIELDNAME =

    • the technical name of the line item field

    • A string of characters, ex: unit_amount

  • SAMPLE_LI_TEXT =

    • the value read directly from the document

    • A string of characters, ex: 12,40

  • SAMPLE_LI_VALUE =

    • the value that was read directly from the document but formatted into the data_type that is configured in the format

    • A string of characters / For data_types that are “float”, the float format, ex: 12.4

  • VISUAL_COORD_ARRAY and TEXTUAL_COORD_ARRAY arrays =

    • "visual_coord" = upper_left and lower_right coordinates based on the pixels of the document image

    • "textual_coord" = upper_left and lower_right coordinates based on the document text

    • Both visual_coord and textual_coord arrays contain 6 numbers:

      • 1st number = page number of the upper_left

      • 2nd number = row number of the upper_left

      • 3rd number = column number of the upper_left

      • 4th number = page number of the lower_right

      • 5th number = row number of the lower_right

      • 6th number = column number of the lower_right

    • When a prediction is manually typed in, computed or doesn’t directly come from somewhere on the document, it will give:

      • "visual_coord": [0, 0, 0, 0, 0, 0]

      • "textual_coord": [0, 0, -1, 0, 0, -1]

  • SAMPLE_LI_CONFIDENCE =

    • the percentage of the feedback’s confidence

    • An integer, ex: 95

  • SAMPLE_LI_EVALUATION =

    • the outcome statistic of the feedback compared to the prediction (TP, TN, FP, FN)

    • A 2-letter accuracy code, ex: TN

  • SAMPLE_LI_ROW_TO_LINE_NUMBER =

    • the row number that the line-item field belongs to

    • An integer starting 0, ex: 2

  • SAMPLE_LI_FFR =

    • An indicator that shows whether the prediction of that field is flagged for review or went STP (straight through processing)

    • A boolean value either true or false: true meaning FFR; false meaning STP, ex: false

16

 

Information on the feedback name

  • the name of the feedback version. Every time you send feedback (by submitting, by reprocessing and renaming the version, or sending feedback through Swagger and adding a name to that version)

  • SAMPLE_NAME = A string of characters ex: submitted , 2ndPassFFR, 2ndPassSTP, FieldLevelSampling, or any other custom name saved when feedback sent

Information on the source of the feedback

  • the source where the feedback came from: “human” meaning it came from a human reviewer, “machine” meaning it came from the machine

  • SAMPLE_SOURCE = A string of characters ex: human or machine

Information on whether the feedback is evaluated (a copy of the VERSIONS section)

  • A feedback is evaluated whenever it is reviewed

  • SAMPLE_IS_EVALUATED = A boolean value, ex: false

Information on the timestamp of the feedback

  • The timestamp of when the feedback was sent

  • SAMPLE_TIMESTAMP = A timestamp format such as YYYY-MM-DDTHH:MM:SS.MMMMMM ex: 2021-10-12T15:48:09.688000

Information on flag for review (a copy of the VERSIONS section)

  • the indicator showing whether this current version was flagged for review

  • SAMPLE_FFR = A boolean value, ex: false

Information on the document accuracy based on the feedback (a copy of the VERSIONS section)

  • the indicator showing whether the document was predicted fully correctly or not as compared to the current version: “true” meaning the current version confirms that the prediction for the whole document was accurate, “false” meaning that the current version is different from the prediction.

  • SAMPLE_DFC = A boolean value, ex: false

Information on the sections being fully correct (a copy of the VERSIONS section)

  • the indicator showing whether the predictions of the sections are fully correct as compared to the current version

  • SAMPLE_SFC_ARRAY = An array of boolean values, ex: [ true, false, false]

17

Closing bracket for feedback

18

 

 

Information on all the versions made for the fields of the predicted document format

All versions include the initially predicted versions and all feedback versions.

  • It holds the following objects in order

    • “annotations”

    • “sections”

    • “lines”

    • “name”

    • “source”

    • “is_evaluated”

    • “timestamp”

    • “flag_for_review”

    • “document_fully_correct”

    • “sections_fully_correct”

19

Cf. rows #8 and #13

20

Cf. rows #10 and #14

21

Cf. rows #9 and #15

22

Cf. row #16

 

23

Closing bracket for versions

24

Information on the files attached

Per file included in the main file, we have the following pieces of information: filename, page, page_count, filehash, leaf, and embedded_attachment, included from line 2 until line 9. The same structure will appear for each file included in the main file. A main file can be for example an email, while an embedded image inside the email body and a PDF file attached to the email are files contained in the main file.

  • SAMPLE_FILES_FILENAME =

    • the filename of the file concerned

    • A string of characters, ex: email.txt or attachedinvoice.pdf

  • SAMPLE_FILES_PAGE =

    • the number of the page where the concerned file starts

    • An integer, ex: 0

  • SAMPLE_FILES_PAGE_COUNT =

    • the number of pages included in the concerned file

    • An integer, ex: 5

  • SAMPLE_FILEHASH =

    • the unique ID for that concerned file

    • A string of alphanumeric characters, ex: 48708bd2e2a847270efa1c4688e18437327088920d97bc41dfeec0559892b6ba

  • SAMPLE_LEAF =

    • a leaf to the root is equal to an attachment file to the main file, such as an attached PDF file to the received email

    • A boolean, ex: false

  • SAMPLE_EMBEDDED_ATTACHMENT =

    • an indicator whether the concerned file was an embedded attachment or not. For example, an image inside of the email body is embedded but a PDF attached to that email is not embedded

    • A boolean, ex: false

25

Information on the usage data of the file

This information only appears once for the whole JSON file. It pertains to the volume statistics of the file and it is created automatically upon upload. This information is however updated and replaced by the volume statistics of any newer reprocessed version of the file.

  • SAMPLE_USAGE_PAGES =

    • the number of pages of the whole file uploaded

    • An integer, ex: 1

  • SAMPLE_USAGE_SECTIONS =

    • the number of predicted sections for that file uploaded (a section in this context is the split document (ex: 2 invoices merged in 1 file)

    • An integer, ex: 1

  • SAMPLE_USAGE_ANNOTATIONS =

    • the number of predicted header fields for that file uploaded

    • An integer, ex: 1

  • SAMPLE_USAGE_LINES =

    • the number of rows predicted for all tables in the whole file

    • An integer, ex: 1

  • SAMPLE_USAGE_LINE_ITEMS =

    • the number of columns predicted for all tables in the whole file

    • An integer, ex: 1

  • SAMPLE_USAGE_TOTAL =

    • the sum of all usage counters

    • An integer, ex: 1

26

Information on the meta info of the file

This is a wildcard attribute, more specifically for internal use. This is where more custom or extra information is kept. Usually, the reference_field is contained in this attribute, however there can also be other fields.

  • SAMPLE_REFERENCE_FIELD =

    • the reference number of the uploaded file for internal use

    • A fixed format containing an M, a short date stamp, and a counter of how many files have been uploaded that day, ex: M20210130-0000005

27

Information on the FFR attribute

  • SAMPLE_FFR_VALUE =

    • Indicates whether the file was flagged for review or not: “true” meaning it was FFR; “false” meaning it was not FFR

    • A boolean value ex: true

28

Information on the timings of the whole document

  • SAMPLE_RECEIVE_TIME =

    • The timestamp of when the file was received by the machine

    • A timestamp format such as YYYY-MM-DDTHH:MM:SS.MMMMMM ex: 2021-10-12T15:48:09.688000

  • SAMPLE_START_TIME =

    • The timestamp of when the machine started processing the file

    • A timestamp format such as YYYY-MM-DDTHH:MM:SS.MMMMMM ex: 2021-10-12T15:48:09.688000

  • SAMPLE_DONE_TIME =

    • The timestamp of when the machine finished processing the file

    • A timestamp format such as YYYY-MM-DDTHH:MM:SS.MMMMMM ex: 2021-10-12T15:48:09.688000

  • SAMPLE_FEEDBACK_TIME =

    • The timestamp of when the feedback received by the machine

    • A timestamp format such as YYYY-MM-DDTHH:MM:SS.MMMMMM ex: 2021-10-12T15:48:09.688000

  • SAMPLE_PROCESSING_PERIOD =

    • The amount of time between the start_time and the done_time

    • A float indicating the period in seconds, ex: 5.4800

29

Information on the lock attribute

  • SAMPLE_LOCKED_VALUE =

    • Indicates whether the file is locked by a user and the lock has not yet expired: “true” meaning it was locked; “false” meaning it was not locked

    • A boolean value, ex: true

  • SAMPLE_LOCKED_SINCE = (only present if SAMPLE_SUBMITTED_VALUE is true)

    • The timestamp of when the file has been locked last

    • A timestamp format such as YYYY-MM-DDTHH:MM:SS.MMMMMM, ex: 2021-10-12T15:48:09.688000

  • SAMPLE_LOCKED_BY = (only present if SAMPLE_SUBMITTED_VALUE is true)

    • the universally unique identifier (UUID) of the user who locked the file

    • A string of 14 alphanumeric characters, ex: 1a2b345c6d78e9f01gh2i3j4

30

Information on the escalate attribute

  • SAMPLE_ESCALATE_VALUE =

    • Indicates whether the file was escalated by a human reviewer or not: “true” meaning it was escalated; “false” meaning it was not escalated

    • A boolean value, ex: true

  • SAMPLE_ESCALATED_SINCE = (only present if SAMPLE_ESCALATE_VALUE is true)

    • The timestamp of when the file has been escalated last

    • A timestamp format such as YYYY-MM-DDTHH:MM:SS.MMMMMM, ex: 2021-10-12T15:48:09.688000

  • SAMPLE_ESCALATED_BY = (only present if SAMPLE_ESCALATE_VALUE is true)

    • the universally unique identifier (UUID) of the user who escalated the file

    • A string of 14 alphanumeric characters, ex: 1a2b345c6d78e9f01gh2i3j4

31

Information on the submit attribute

  • SAMPLE_SUBMITTED_VALUE =

    • Indicates whether the file was submitted or not: “true” meaning it was submitted; “false” meaning it was not submitted

    • A boolean value, ex: true

  • SAMPLE_SUBMITTED_SINCE = (only present if SAMPLE_SUBMITTED_VALUE is true)

    • The timestamp of when the file has been submitted last

    • A timestamp format such as YYYY-MM-DDTHH:MM:SS.MMMMMM, ex: 2021-10-12T15:48:09.688000

  • SAMPLE_SUBMITTED_BY = (only present if SAMPLE_SUBMITTED_VALUE is true)

    • the universally unique identifier (UUID) of the user who submitted the file

    • A string of 14 alphanumeric characters, ex: 1a2b345c6d78e9f01gh2i3j4

32

Information on the reject attribute

  • SAMPLE_REJECT_VALUE =

    • Indicates whether the file was rejected by a human reviewer or not: “true” meaning it was rejected; “false” meaning it was not rejected

    • A boolean value ex: true

  • SAMPLE_REJECTED_SINCE = (only present if SAMPLE_REJECT_VALUE is true)

    • The timestamp of when the file has been rejected last

    • A timestamp format such as YYYY-MM-DDTHH:MM:SS.MMMMMM, ex: 2021-10-12T15:48:09.688000

  • SAMPLE_REJECTED_BY = (only present if SAMPLE_REJECT_VALUE is true)

    • the universally unique identifier (UUID) of the user who rejected the file

    • A string of 14 alphanumeric characters, ex: 1a2b345c6d78e9f01gh2i3j4

33

Information on the last_version attribute

  • SAMPLE_LAST_VERSION = A string of characters, ex: random_feedback_version_name

34

Information on the status data of the whole document considering predicted, feedback, and versions:

  • SAMPLE_REJECT: boolean value to indicate whether the file was rejected by a human reviewer or not ex: true

  • SAMPLE_SAMPLING: boolean value to indicate whether the file went through a 2ndPass sampling or not ex: true

  • SAMPLE_LOCK: boolean value to indicate whether the file is locked by a user and the lock has not yet expired ex: true

  • SAMPLE_FEEDBACK: boolean value to indicate whether the file has received feedback or not ex: true

  • SAMPLE_ESCALATE: boolean value to indicate whether the file was escalated by a human reviewer or not ex: true

  • SAMPLE_SUCCESS: boolean value to indicate whether that file was successfully processed, ex: true

  • SAMPLE_ARCHIVED: boolean value to indicate whether that file was archived (ie original file was deleted), ex: true

  • SAMPLE_REJECT_ATTEMPTS: An integer to count how many times the file a reject message was sent to the client’s site, ex: 0

  • SAMPLE_SUBMIT_ATTEMPTS: An integer to count how many times the file a submit message was sent to the client’s site, ex: 0

  • SAMPLE_READY_ATTEMPTS: An integer to count how many times the file a ready message was sent to the client’s site, ex: 0

35

}

Closing bracket for the entire JSON format