What is a vision processing unit (VPU)?
by Stephen M. Walker II, Co-Founder / CEO
What is a vision processing unit (VPU)?
A Vision Processing Unit (VPU) is a specialized type of microprocessor designed specifically for accelerating computer vision tasks such as image and video processing, object detection, feature extraction, and machine learning inference. VPUs are designed to handle real-time, high-volume data streams efficiently and with low power consumption.
What are the main functions of a VPU?
A Vision Processing Unit (VPU) is a specialized type of microprocessor designed to accelerate machine vision tasks. It is a specific type of AI accelerator, distinct from video processing units, which are specialized for video processing tasks. VPUs are optimized to support tasks like image processing and are particularly useful in machine learning.
VPUs are designed to handle demanding computer vision and edge AI workloads efficiently. They achieve a balance of power efficiency and compute performance by coupling highly parallel programmable compute with workload-specific hardware acceleration. This technology enables intelligent cameras, edge servers, and AI appliances with deep neural networks and computer vision-based applications in areas such as security and safety, and industrial automation.
VPUs are often used in edge computing scenarios where there is no interaction with the cloud, providing benefits such as no latency and more privacy. They are also independent, allowing you to allocate the processing power to the VPU and free the rest of your computer for other tasks. VPUs are also affordable and versatile, capable of increasing the speed of a variety of devices, from Raspberry Pis to cameras to computers.
Examples of VPUs include the Movidius Myriad X and Myriad 2 from Intel Corporation, which are used in various applications including Google Project Tango, Google Clips, and DJI drones.
In comparison to GPUs, VPUs are generally smaller, more affordable, and more power-efficient, making them suitable for use in compact and portable devices. However, for tasks requiring high computational power and where space and budget are not constraints, GPUs may still be the preferred choice.
How does a VPU work?
A Vision Processing Unit (VPU) works by accelerating machine vision tasks, which are often used in artificial intelligence (AI) and machine learning applications. VPUs are designed to handle tasks like image processing and are particularly useful in machine learning.
VPUs operate by coupling highly parallel programmable compute with workload-specific hardware acceleration. This combination allows VPUs to achieve a balance of power efficiency and compute performance, making them ideal for demanding computer vision and edge AI workloads.
In terms of specific operations, VPUs often use tools like the OpenVINO toolkit for decoding and encoding tasks, and libraries like OpenCV for preprocessing tasks, which typically include resizing and fitting the image to the requirements of the network.
VPUs are optimized for performance per watt and have a greater emphasis on on-chip dataflow between many parallel execution units, similar to a manycore DSP. They may focus on low precision fixed point arithmetic for image processing, which is distinct from GPUs that contain specialized hardware for rasterization and texture mapping for 3D graphics.
VPUs can also interface directly with cameras, bypassing any off-chip buffers, which is beneficial for tasks that require real-time processing and low latency.
In terms of software, to make a model work with a VPU, tools like the OpenVINO toolkit are used to convert the models to an intermediate representation and then interface with the chip and camera.
What are some popular VPU architectures?
There are several popular Vision Processing Unit (VPU) architectures, each with its unique features and capabilities. Here are some of the most notable ones:
-
Movidius Myriad X — This VPU from Intel is designed for on-device artificial intelligence (AI) applications. It features a Neural Compute Engine — a dedicated hardware accelerator for deep neural network inferences. The chip also includes imaging and vision accelerators for tasks such as image signal processing and encoding/decoding of video streams.
-
Movidius Myriad 2 — Also from Intel, this VPU is optimized for high-performance, low-power vision processing. It's used in a variety of devices, from drones to smart cameras, and is capable of performing complex neural network computations at high speed while maintaining low power consumption.
-
Google Edge TPU — This is a small ASIC designed by Google that provides high-performance ML inferencing for edge devices. The Edge TPU is capable of executing state-of-the-art mobile vision models such as MobileNet v2 at 100+ fps, in a power-efficient manner.
-
Huawei Ascend 310 — This VPU is designed for AI inferencing in edge computing scenarios. It supports various AI models, including CNN, RNN, and DNN. It's capable of processing 16-channel HD video in real-time and is used in applications such as smart cities and autonomous vehicles.
Each of these architectures has its strengths and is suited to different types of applications, depending on factors such as the required processing power, energy efficiency, and specific task requirements.