Challenges
- Variety of internal document data structures (e.g. PDF, Excel, JSON) across different service providers
 - Constant data structure changes
 - Sufficient quality and quantity of data for model training
 - Data anonymization for model training
 
Solutions
- A custom tool for extracted data review, approval, and pushing to destination systems
 - Data lake and data warehouse for collection and management of the extracted data from documents
 - Clustering for different vendors and types of documents
 - UI interface for the team to undertake data labeling for ML model training
 - A tool for document classification and named entity recognition
 
Results
- Ability to process industry leading service provider documents
 - Manual efforts associated with data extraction decreased by 60%
 - Automated document processing