Leveraging Machine Learning for Intelligent Document Processing

Anne Marsden
January 7, 2021
02:54 pm

By: Troy Allen | VP Cloud Services

UPDATED February 9, 2023: Businesses thrive on information, but due to the complexity and wide variety of data available within an organization, finding usable information can be challenging and time-consuming. As organizations are inundated with documents, forms, data streams, and more, it’s becoming increasingly difficult to extract meaningful information efficiently, funnel that information into the systems that need it, and present it in a fashion that drives better business decisions. Machine Learning (ML) and Artificial Intelligence (AI) tools are helping solve those challenges. ML and AI tools have rapidly become more sophisticated and capable of allowing organizations to gather critical information out of their content, rich media files, and data to facilitate better Intelligent Document Processing, or IDP* (interpreting unstructured documents into recognizable sets of information). IDP is primarily focused on the information companies commonly mine: textual-based information, videos, and graphics. AWS has recognized the importance of Document Understanding and developed services to help drive visibility and analysis that companies desperately need. Amazon’s Textract and Rekognition machine learning services are designed to gather meaning out of documents, rich media files, and data.

Getting More Out of Text

While Optical Character Recognition (OCR) has been around for many years, most organizations tend to overlook its strengths and ability to improve data processing. AWS Textract, while it does provide OCR functionality as a cloud-based service, offers much more than one might expect because of its ability to bring Machine Learning-based models to business applications. In order for data to be useful, it must first be collected; Textract goes beyond simple OCR by providing the ability to distinguish key-value pairs of information, table data extraction, and recognition of checkboxes and radio buttons. Amazon Textract makes it easy to export the extracted data into a database or into off-the-shelf or custom applications. Traditional OCR solutions require additional tools to provide this level of data recognition and extraction.

More than OCR

Textract by AWS goes beyond OCR by not only collecting the content but understanding where the content came from. Textract provides the ability to not only perform standard character recognition but is designed to understand formatting and how content is aligned within a page. This is accomplished by recognizing and creating bounding boxes around key information and text areas to support the content, table extraction, and form extraction.

Textract retrieves multiple blocks of information from each page of the image it investigates:

– The lines and words of detected text
– The relationships between the lines and words of detected text
– The page that the detected text appears on
– The location of the lines and words of text on the document page

Table Data Exposed

AWS Textract is well equipped to locate table data within documents. It recognizes the table construct and can establish key-value pairs with the cells by referencing the row and column information.

In addition to detecting text, Textract has the ability to recognize selection elements such as checkboxes and radio buttons. A check box that has not been selected, such as  or Ο is represented as a status of NOT_SELECTED whereas checked boxes and circles are represented as SELECTED and can be tied to a key-value pair as well. This can be extremely helpful in finding values in both tables and forms.

The Power of Key-Value Pairs

Businesses have been interacting with their clients and vendors for decades through forms. Textract provides the ability to read form data and clearly define key-value pairs of information from them. Many organizations struggle with the fact that forms change over time, and it can be difficult to train legacy OCR tools to find data when those tools are specific to a particular form layout. Textract removes that limitation by reading the actual text rather than a location on a form to get its information and analyzes documents and forms for relationships between detected text.

Getting More from Images

AWS Rekognition makes it easy to analyze image and video files using proven, highly scalable, deep learning technology that requires no machine learning expertise to use. Amazon Rekognition provides the ability to identify objects, people, text, scenes, and activities in images and videos, as well as detect any inappropriate content. It provides those capabilities while also delivering highly accurate facial analysis and facial search capabilities that can be used to detect, analyze, and compare faces for a wide variety of user verification, people counting, and public safety use cases.

Using AI to See More

With AWS Rekognition custom labels, objects and scenes in images can be identified for specific business requirements and actions. Models can be configured to classify specific machine parts, identify the use of Personal Protection Equipment (PPE) for employees from surveillance videos, capture model numbers in images, and detect persons of interest for image classification to name a few use cases. AWS Rekognition custom labels allows organizations to quickly identify objects and images that have value to their specific business and processes.

Uncovering Hidden Data

As with AWS Textract, Rekognition provides a way for companies to identify key information that can be stored, processed, and shared with other applications enabling Intelligent Document Processing across files and data. Context of information is critical to assigning value and defining how it can be best utilized.

AWS Rekognition helps companies realize value in their images and videos across many different use cases across the enterprise:

- Discover inappropriate content – filter images and videos for objects and scenes containing inappropriate content such as nudity, weapons, graphic violence, and even inappropriate text in the videos or images.
- Identify key objects – Rekognition can be utilized to filter social media video and image files to identify products, brands, people, and even landmarks.
- Help improve workplace safety – with the support of video, AWS Rekognition can be utilized to inspect surveillance videos and identify issues such as people not wearing Personal Protective Equipment (PPE) and obstructive objects in the workplace.
- Support identity verification – facial recognition and person recognition can be accomplished through Rekognition by detecting humans, identifying facial features, and even comparing those to documented photographs of people for identifying people in images and video files.
- Capture text information – AWS Rekognition also provides the ability to perform text capture and recognition in video and image files. This can help an organization gather data and information contextual to a video or image such as the model number of a part from a photograph of a manufacturer’s plate or even identify names of streets from street signs in a video to assist in determining the location of the event filmed.

You Are Not Alone

Business solutions can be complex, but making them work for your requirements doesn’t have to be. Clearly defining your goals and objectives is half of the battle, the other half is knowing what tools will help you achieve those goals. Are your off-the-shelf solutions and applications collecting all the information you have? Do you need a business solution to manage all of your documents and data, but don’t know where to start? Are you looking to move off an outdated legacy application that no longer supports your business direction? You are not alone.

Thousands of companies are facing the same questions and are finding the best answers by engaging with experts from Amazon and experts from solution service providers. TekStream, along with Amazon, is excited to speak with you about your IDP needs and how the right tools and solutions can have a positive impact on how you conduct business. TekStream is offering a free Digital Transformation assessment where we will work with you to identify your document processing needs and provide process and technology recommendations to help you transform your business with ease. Reach out to us at info@TekStream.com for more details or call 1-844-TEK-STRM.

*Intelligent Document Processing may also be referred to as Document Understanding.