Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

A format essentially defines what you want to get out of a certain type of document. It defines the structure of the output and is very visible in Data view of the Data Entry Companion.

Format is a structured collection of Fields of interest and consists of

  • Text fields: Pieces of information that appear literally in the document

  • Tag fields: Information that does not appear literally in the document, but that can be derived. Tags come from a finite set of options.

  • Rows: Text and Tag fields that logically belong together

Format consists of at least one table: Format table.

There can be as many additional tables as needed for information that follows a row/table logic.

 


Format Table

The format table contains text and tag fields. These fields have a number of properties.

Property

Description

Default

Mandatory

You can indicate whether or not a field is mandatory. Mandatory fields will be marked as read when not found and will block submission if not filled

false

Field

The number of options is virtually unlimited. You can specify a data type other than string to reduce the number of options for a text field. For example, here you can specify that the text should be an amount

string

Annotation, tag, separator, computed

Indicates whether the field of interest is an annotation, a tag, a separator, or computed

Data type

Indicates whether this is a text field or a tag field

text field

Scope*

Indicates whether the scope is a page, a section or a document

section

Visible

You can indicate if a field of interest is visible

true

Multiple*

You can indicate if a field of interest can appear multiple times.

false

Count in evaluation

A flag to decide whether the specific field should be evaluated. Setting this property on false enables some fields not to be taken into account in evaluations. This is useful for commentary and other optional fields.

true

Conditional*

Some fields of interest are only relevant depending on the value of other fields of interest. For example, you may only be interested in the VAT number of an invoice if if has been confirmed that the document is a valid invoice (valid_invoice field == true).

No condition (always show)

Display name

This is the label that will be shown in the FrontEnd to the user of the data entry companion and in the stats pane

No default, must be specified

Technical name

This is the label that will be used when communicating to servers

Same as display name

Description

A short text to characterise the field of interest

Scope and the importance of page/subpage splitting

It is important to distinguish the different types of scope: file, document, page and sections.

  • A file is a container in which some data is stored. These can be a .pdf, .jpg, .eml, .zip and so on. The illustration above shows one file.

  • A document is a representation of information that can be understood by a human. Here, we talk about invoices, receipts, ID cards, and so on. Each file contains at least one document but can contain more than one. In the illustration above, this one file contains 3 documents: a contract, an invoice, and an ID card document.

  • A page is one side of a document. In the review pane, you can view one page at a time. Page splitting enables the classification of different formats within one file. Each file contains at least one page, and the same goes for documents. The illustration below has 4 pages (1 page of a contract, 2 pages of invoices, and 1 page of an identity card).

  • A section is a part of a bigger item: A file can be split into different documents (sections), and a page can be split into different subpages (sub-sections). Subpage splitting enables the detection of different sections within one page. Sections are encountered usually when smaller receipts or ID cards are processed. For example, the front and back side of ID cards are usually saved in one single page, or several receipts are usually grouped together in one single page. The illustration below shows that the fourth page is divided in 2 sections.

For fields in tables, there can only be one scope chosen per table.

Multiple and the importance of page/subpage splitting

This property is especially useful if splitting (page or subpage) is enabled. If the multiple box is ticked, it allows the tool to search for the field more than once.

  • Scenario 1 - page splitting enabled: Let’s say your client sent you a pdf file of not only once invoice but two or more merged into one pdf file, then this multiple box tells the Contract.fit solution that there will be not only one invoice_date but multiple. If the multiple box is not ticked, then the solution will only take into account the first invoice_date that it finds and assigns it to the whole file or document.

  • Scenario 2 - subpage splitting enabled: Often times when the finance department receives receipts, they receive a document with multiple receipts in one page. Once again, it would be logical to tick “multiple” for the field “gross_amount” or “invoice_date” as the chances that these fields are the same for all receipts are pretty slim.

Conditional

It is possible to add logical AND/OR conditions for specific fields. Choose any field of your liking and click on “show” in the conditional column. On the pop-up screen you can add rules that condition the situation where your selected field will be shown. It is also possible to add groups to manipulate the AND and OR logic. In the illustration below, the condition would work as follows: The field of interest would only be relevant if

  1. The gross amount is not null, AND

  2. Either the amount payable or the net amount is not null.

Predefined data types

It is important to note that some fields already have predefined predictors. Creating “currency”, “language”, and “country” fields now will have standardised predictions with our pre-filled predictors such as “USD“, “EUR“, “GBP“, etc. for currency. This also means that annotating currency signs ($) in the Data Entry Companion for currency fields will be correctly formatted to “USD”.


Other tables

You may want to maintain the row logic for tables in the extraction of information. To this end, we allow you to specify tables, which contain one or more row types.

For example: you may have a table of line items with four columns (description, unit price, quantity, line total). You would then have two types of rows for this table: line items and a total line. The total line would be of a different type as it will not have a unit price. It will have a fixed description ("total"), a quantity and an overall total.

Rows are essentially a sorted list of text and tag fields. In addition, they will have a row_type, which defines to which table they belong.


Edit Formats

  • Add new format: Click on the green button on the bottom right corner. Type your new format name in the pop-out box and click confirm. This will automatically add a new entry in the formats table.

  • Remove format: Tick the box beside the format that you want to remove and click on the red button on the bottom right corner.

  • Add new field: Click on [edit] of the desired format, add a new field by typing the field name on the Label placeholder on the bottom of the table and adjust the properties for that field. Then click on the green publish button.

  • Add conditional to the field: Click on [edit] of the desired format, go to the line of the field, column conditional, and add your conditions. More on this is mentioned just above, on this page, on the section Conditional.

  • Remove a field: Click on [edit] of the desired format, click on the cross on the right hand of the table for the field that you would like to remove. Then click on the green publish button.

Edit Formats through the Studio Controls in the Data Entry Companion

 Format defines the fields you are interested in for a given document type. It is reflected in the data table that you can see in the Data Entry Companion. We have also made it possible for you to edit the format directly from within the Data Entry Companion.

To do this, you need to enable the Studio Controls from within the Data Entry Companion. You will recognise the Studio controls by the three dots icon.

Visibility of fields

Revision of ‘visible flag’, ‘visible if condition’ and ‘mandatory flag’ properties of fields as specified in the format

  • ‘Visible flag’ and ‘visible if condition’

Visible

Visible if
(conditional)

Result

Explanation

TRUE

n/a (not set)

Visible

If no condition is set for visible_if, then the visible flag is used

FALSE

n/a (not set)

Not visible

TRUE

TRUE (evaluates to)

Visible

When configured, “visible_if” has a higher priority than “visible”

FALSE

TRUE (evaluates to)

Visible

TRUE

FALSE (evaluates to)

Not Visible

FALSE

FALSE (evaluates to)

Not visible

  • ‘Mandatory flag’

    • Mandatory field: Field can not be null (not specified, which means was not predicted), but can be an empty string. A mandatory field which is null is treated as an action

    • Not mandatory field: Field can be null. A non-mandatory field which is null is not treated as an action

    • Computed field is mandatory: computed field which evaluates to False is treated as an action

    • Computed field is not mandatory: computed field is not treated as an action

  • No labels