Course info
Mar 31, 2020
1h 20m

Businesses are moving to an instantaneous and digital world, but we will still need physical documents for quite some time. In this course, Extracting Text and Data with Amazon Textract, you will learn to use OCR technology to extract text, and key-value pairs of data from scanned documents. First, you will explore how to detect printed text and numbers in a scan or rendering of a document. Next, you will discover how to detect key-value pairs in document images automatically so that they can retain the inherent context of the document without any manual intervention. Finally, you will learn how to preserve the composition of data stored in tables during extraction. When finished with this course, you will have the skills and knowledge of how to use Amazon Textract to create smart search indexes, build automated approval workflows, and better maintain compliance with document archival rules by flagging data that may require manual input, as well as being able to export data contained within those documents to other systems.

About the author
About the author

Eduardo is a technology enthusiast, software architect and customer success advocate. He's designed enterprise .NET solutions that extract, validate and automate critical business processes such as Accounts Payable and Mailroom solutions. He's a well-known specialist in the Enterprise Content Management market segment, specifically focusing on data capture & extraction and document process automation.

More from the author
Creating Animations with Vue
Apr 27, 2021
Getting Started with Blue Prism RPA
1h 26m
Apr 6, 2021
More courses by Eduardo Freitas
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
[Autogenerated] Hi, everyone. Welcome to my course. Extracting text and data with Amazon Textract I'm a software developer, a data capture and business automation specialist, Amazon Textract is a service that automatically extract text and data from scan documents. Amazon Textract goes beyond simple optical character recognition to also identify the contents of fields in forms and information stored in tables. Many companies today extract data from documents and forms through manual data entry that is slow and expensive, or through simple optical character recognition that requires manual customization or configuration. Rules and work flows for each document and form often need to be hard coded and updated with each change to the form or when dealing with multiple forms. If the form deviates from the rules, the output is often scramble and unusable. Textract overcomes these challenges by using machine learning without the need for any manual effort or custom code. Some of the major topics we will cover include core concept and getting started with textract text detection and analysis lines, forms, tables and selection elements, key value and table extraction, working with multi page documents, extracting and saving data from pdf's and finally watching these technologies and principles getting applied by creating our own scripts with some very cool demos. By the end of this course, you'll know the fundamentals of extracting text and data with Amazon Textract and be able to write code that uses it before beginning the course. You should have some good knowledge of python as well as being able to find your way around the latest version of visual studio code. I hope you will join me on this journey to learn the ins and outs of extracting texts and data with Amazon textract at Pluralsight.