INDEX | FORMAT NAME | DESCRIPTION |
1 | invoice | recognises invoices |
2 | receipt | recognises receipts |
3 | recognises e-mails | |
4 | bank_card | recognises bank cards |
5 | ID | recognises IDs |
6 | salary_slip | recognises salary slips, is mainly trained on Belgian ones |
7 | secci-format | does not work well yet for classification, only extraction for supported fields |
8 | delivery_note | does not work well yet for classification, only extraction for supported fields |
9 | credit_note | does not work well yet for classification, only extraction for supported fields |
10 | purchase_order | does not work well yet for classification, only extraction for supported fields |
11 | quote | work in progress |
12 | tax_return | work in progress |
As of now, the documents that can be properly classified are invoices, receipts, e-mails, bank card, IDs and salary slips. Classification on delivery notes, credit notes, purchase orders, sales quotes and tax returns does not work properly yet.
Document [would lead to a split of the file]
Photo
Identity document
International passport
National ID
Communication [has sender/receiver]
Transaction
Invoice
Hospital invoice
Receipt
Pharmacy receipt (BVAC)
Purchase order
Quote
Delivery note
Credit card statement
Email
Proof of income
Salary slip
Tax return
Proof of pension
European Accident Form
SECCI document
The above would be document from a customer perspective. Internally, I would use the is_a field to indicate that your vote does not contradict another one.
E.g., if Athar builds a visual card classifier and “cutter” to separate if there are multiple on one page. It will be known that that is a card, but not what type of card. The customer will probably never be interested in document_type card, but if we know this and we see an MRZ of an identity document, then we will know it is a national identity card and not an international passport.
Similarly, if something is an email, and another classifier votes for “invoice”, we should be able to understand that this is not a contradiction. An invoice can be an email, but a european accident form cannot be an invoice.