It is widely recognized that about 80% of valuable business information is hidden in free text.
Information extraction from such text is therefore an important element in business intelligence and one that has become increasingly important over the past decade. Furthermore economists have demonstrated that various aspects in financial discourse can effectively forecast business trends and movements, even more effectively than sheer numerical data analysis.
Financial documents are structurally highly complex full of free text sections, footnotes, cross-references, as well as lists of topics, tables that spread over multiple pages and of course images and charts. To be able to mine such information requires the development multiple integrated document content processing capabilities that go far beyond traditional keyword based search engines or standard text mining tools.
Our European research in financial information extraction covers a wide gamut of multi-disciplinary aspects involving linguistics, finance, computer science and statistical learning. We apply document structure discovery to drive information extraction , and integrate it with our state of the art robust hybrid parsing tool, FactSpotter , that we have been developing since the late 90s. Our key research efforts aim at locating and disambiguating valuable facts as well as their semantic connections to obtain deep and actionable business insights. This allows us to address needs spanning from simple reference retrieved from invoices in a Business Process Outsourcing scenario to more complex needs such as anomaly detection in financial reports, detection of fraud allegations, detecting organizational changes or risks discussed in sections related to the evolution of business strategy, or even writers’ sentiments or attempts to obfuscate the truth.