More about the solution

How does it work?

Our solution will help you automate the processing of incoming communication. We refer to an incoming documentation as a document. There is a lot of value trapped in these documents. We'll help you unlock that value by translating such a document into fields of interest. For example, in an invoice, you may be interested in finding back the amount, the VAT rate, etc. For different types of documents, you may be interested in different fields of interest. We call the definition of the fields you are interested in for a given document type the format, as it will define the format of the output that we create.

 


Document

Documents come in all sizes and shapes. Some are paper-based, others are digital. Some are written by people, others are generated by an application. At contract.fit, we want to provide you with a solution for all these different types of documents. We will help you translate all sorts of unstructured or semi-structured documents into structured information.

Here are a few of the document types that we already support out of the box:

  • Images: .gif, .jpeg, .bmp, .tiff

  • Office documents: .ppt(x), .doc(x), .xls(x), .rtf, .txt

  • Emails: .msg, .eml

  • PDF documents: scanned, or natively digital .pdf

 

Field

Essentially, a field is a snippet of information that you want to extract or derive from the document. We support two main types of fields:

  • Text fields: this is information that appears literally in the document, e.g., an identifier, a name

  • Tag fields: this is information that does not appear literally in the document, but that can be derived from the document, e.g., the language the document is written in, the department to which the document should be sent

Another main difference between text fields and tag fields is that text fields are not limited in the values they can take, whereas tag fields need to take a value from a finite set of options.

For tabular information, we furthermore support rows:

  • A row consists of a number of text and tag fields that logically belong together, e.g., line items in an invoice

 

Form

A form essentially describes which fields you are interested in for a certain type of document. This will define which fields we try to predict with our machine learning pipeline. Similarly, it will define which fields you can indicate manually on a given document.

For example, for a possible invoice, you may be interested in the following elements

  • Text fields: Total amount, Beneficiary, Account number beneficiary, VAT number beneficiary

  • Tag fields: Language, Validity of invoice

  • Rows: Line items with the following fields:

    • Description

    • Unit price

    • Quantity

    • Line total