Amazon SageMaker Ground Truth helps you build highly accurate training datasets for machine learning quickly. SageMaker Ground Truth offers easy access to public and private human labelers and provides them with built-in workflows and interfaces for common labeling tasks. Additionally, SageMaker Ground Truth can lower your labeling costs by up to 70% using automatic labeling, which works by training Ground Truth from data labeled by humans so that the service learns to label data independently.

Successful machine learning models are built on the shoulders of large volumes of high-quality training data. But, the process to create the training data necessary to build these models is often expensive, complicated, and time-consuming. The majority of models created today require a human to manually label data in a way that allows the model to learn how to make correct decisions. For example, building a computer vision system that is reliable enough to identify objects – such as traffic lights, stop signs, and pedestrians – requires thousands of hours of video recordings that consist of hundreds of millions of video frames. Each one of these frames needs all of the important elements like the road, other cars, and signage to be labeled by a human before any work can begin on the model you want to develop.

Amazon SageMaker Ground Truth significantly reduces the time and effort required to create datasets for training to reduce costs. These savings are achieved by using machine learning to automatically label data. The model is able to get progressively better over time by continuously learning from labels created by human labelers.

Where the labeling model has high confidence in its results based on what it has learned so far, it will automatically apply labels to the raw data. Where the labeling model has lower confidence in its results, it will pass the data to humans to do the labeling. The human-generated labels are provided back to the labeling model for it to learn from and improve. Over time, SageMaker Ground Truth can label more and more data automatically and substantially speed up the creation of training datasets.

Automated Data Labeling

Amazon SageMaker Ground Truth provides automated data labeling using machine learning. SageMaker Ground Truth will first select a random sample of data and send it to humans to be labeled. The results are then used to train a labeling model that attempts to label a new sample of raw data automatically. The labels are committed when the model can label the data with a confidence score that meets or exceeds a threshold you set. Where the confidence score falls below your threshold, the data is sent to human labelers. Some of the data labeled by humans is used to generate a new training dataset for the labeling model, and the model is automatically retrained to improve its accuracy. This process repeats with each sample of raw data to be labeled. The labeling model becomes more capable of automatically labeling raw data with each iteration, and less data is routed to humans. 

Flexibility in how you work with labeling professionals

Amazon SageMaker Ground Truth supports multiple choices for human labeling directly in the SageMaker Ground Truth Console. You can use your private team of labelers for in-house labeling jobs, especially for handling data that needs to stay within your organization.

If you want to scale up to a large number of labelers and your data that does not contain confidential or personally identifiable information, you have access to an on-demand 24×7 workforce of over 500,000 independent contractors worldwide, powered by Amazon Mechanical Turk. Mechanical Turk is a crowdsourcing marketplace that connects your labeling jobs with a distributed workforce who can perform these tasks virtually.

Alternatively, you can use a third-party vendor who specializes in data labeling. These vendors have been screened by Amazon to provide high-quality labels and follow security processes. Labeling services from these vendors are provided through AWS Marketplace. All relevant details are provided including pricing and customer reviews to help you select the best vendor for your needs.