Financial Documents Information Extraction

Information extraction from financial documents with contents in tabular form

Information was extracted from different types of financial documents (machine-readable and scanned pdf with both bordered and borderless tables). Different AI models and rule-based approaches were used for different types of documents. Also, the language of the documents was non-English (Japanese, Chinese, and Russian).

Technologies used: Tensorflow, Pytorch, OpenCV, OCRs and Docker