How Image Annotation Teaches Machines to See

Everything you need to know about image data labelling and image annotation  

Computer vision technology has massive potential. From detecting cancer cells to allowing consumers to make payments on their phones through facial recognition, there is almost nothing computer vision models can’t do, if trained on the right data.  

However, precise data labelling is crucial to training accurate computer vision models. Imagine showing a child a banana and saying it is an orange. The next time the child sees a banana, he is likely to classify it as an orange. Machine learning models learn in the same way, which is why accuracy in data labelling is key to success.

What is computer vision and image annotation?

Computer vision, an artificial intelligence technology, uses deep learning to process images and video in a way that lets machines see the world around them and react appropriately. These computer vision models can make sense of the visual data because of one crucial data annotation process: image annotation.  

Image annotation is the process of identifying individual elements (objects, faces, etc.) in images by attaching labels to them. Data scientists and other AI professionals then use the annotated data to train AI models to accurately identify and track different elements within an image and predict the behavior of those elements.  

Using image annotation for machine learning, computer vision models can see the world around them and ideally react in a similar way as a human would (think of a self-driving car being able to make on-the-fly decisions based on random external stimuli). In this blog, we’ll take a closer look at: 

  • types of image annotation, 
  • how image annotation is performed,  
  • use cases.

Types of Image Annotation 

There are several different techniques for annotating images for deep learning. They include:  

Bounding Boxes:  
In this type of image annotation, bounding boxes in the shape of a rectangle are drawn tightly around the edges of each object to be identified. This helps detect and recognize different classes of objects.  

2D and 3D Cuboid Annotations:  
Cuboid Annotations are used for multidimensional images – this type of annotation allows for more precise annotations, as it gives a more detailed look at the various dimensions of 2D and 3D objects.  

Image Classification: 
Using predefined categories, image classification separates images into these categories to form a set.  

Polygon Lines:  
Polygon annotations are a precise way to annotate objects by only including the pixels that belong to them.  

Semantic Annotations: 
These provide accurate annotations at a pixel level.  

Semantic Segmentations:  
A precise type of a pixel-wise segmentation where every pixel in the image is assigned to a class. 

Pose Estimation 
Used in a series of images, this is a technique that predicts and tracks the location of a person or object. This is done by looking at a combination of the pose and the orientation of a given person/object.  

Object Detection, Tracking and Identification 
Object Annotation allows machines to detect objects on the line and determine proper positioning of the object. This is useful in quality control in food packaging, for example, or confirming if safety protocols are being followed, such as safety equipment.  

How to Perform Image Annotation 

Using a crowd, image annotation can be done large volumes of image quickly and accurately. The first step in image annotation is to identify the use case and which annotation technique will be the most effective.  

When the annotation technique is identified, the contributors are shown pictures and asked to identify the relevant elements in the picture. As with any type of annotation, the more annotators per dataset, the better the quality of the annotated data.

Real-World Applications of Image Annotations 

Giving machines the ability to see through computer vision has many exciting applications in the real world. Here are a few examples of image annotation in use:  

  • Autonomous vehicles: To make sense of the world around it, the technology used for self-driving cars needs some context about what it is looking at. Autonomous vehicles need to be able to identify traffic lights (and the colors within), pedestrians, road signs, driving lanes and numerous other objects on the road. 
  • Facial recognition: Landmark annotation, which uses key points labeled at specific locations is the most useful type of image annotation for facial recognition technology. Used in security settings, for social media, photo applications and several other ways, facial recognition models are among the more controversial models within AI. However, the potential usefulness of facial recognition may outweigh the concerns.  
  • Manufacturing: Particularly in large-scale production, computer vision in manufacturing can save hours of time and cut back on costs significantly when used in predictive maintenance, package inspection and identifying defects. Semantic segmentation is the ideal annotation tool in manufacturing as, for example, it identifies tire defects on a manufacturing line.  
  • Agriculture: Within agriculture, computer vision is being used for crop maintenance, to identify environmental conditions and check on the condition of specific crop yields, to name a few uses. In short, image annotation is a valuable tool within computer vision technology used in agriculture to help identify very specific objects in large-scale images.   

Image Annotation at DefinedCrowd  

At DefinedCrowd, we have worked many companies to provide high-quality, crowd-sourced training data for computer vision models. Here are just two examples:  

Global Electronics Maker using Facial Recognition Technology 

In this first case study, our client needed to be able to detect individual people in family portraits, and understand their relationship with the others in the image, i.e.) recognizing a “man”, and also his position as “father”.  

Using 1000 verified images, annotators within our crowd identified family members in each picture, providing details on age, relationship, and countries of origin, creating a highly customized dataset in just 6 weeks. Our client was able to use this dataset to train a facial recognition model to be more accurate and useful for their application.  

Automation in Utilities Inspection 

In this case study, EDP, an electric utilities company in Portugal, aimed to use computer vision models to improve asset performance management processes and better identify damage in an effective way. Using 12,500 images and multiple annotations from our crowd, the model learned to identify a utilities pole. 

An additional 900 annotated images were used to train the model to identify damage to the poles. As a result, EDP no longer need to hire helicopters and humans to traverse the sky, taking pictures of poles. Instead, drones trained on our training data did it for them, saving EDP time and money. The models were also able to predict which poles would need maintenance in the future, allowing EDP to solve problems before they occurred. It was a huge improvement to their maintenance capabilities.

The Gift of Sight

In the same way that natural language processing is helping machines understand human speech in a more natural way, computer vision is helping machines process the world around them through sight. Image annotation is fundamental to this process, resulting in more accurate behaviors. Computer vision is an extremely exciting field and it will undoubtedly change our lives for the better.    

For more information about DefinedCrowd’s computer vision services, have a look here.