Basic Concepts of Computer Vision
In my previous post, we had a holistic view and introduction to computer vision. In this post, we will discuss about some of the basic concepts related to images and image processing like types of images, pixel, channel, depth, etc.
You must have come across times when your friends ask you to take out your mobile and take a photo of them. And all you do is click/press a button, and an image shows up in the screen. Corresponding to this situation, in simplest terms, we can define image to be a 2D representation of 3D world at any instant. But do images always need to be two dimensional? Consider a statue or a hologram, where you actually get to perceive a 3D view of the item being displayed. Thus, in a broader sense, images can be 2D as well as 3D. But in our case, and henceforth in maxEmbedded, we will deal with 2D digital images only.
Now that you are familiar with the concept of images, let us get introduced to another term called a digital image. I wouldn’t be surprised if you ask me the question why digital image? Are analog images also possible? Well, ask your parents and they will recite you a story of the generation before the advent of digicams and handycams. Cameras used to come with a film roll (I remember buying film rolls of Kodak and Fujifilm) which when exposed to light would form an image onto it. The exposure time used to vary depending upon the shutter speed of the camera. Special care was taken not to expose the films to the ambient light or else the film would get damaged, and hence the cameras at that time used to be light proof, and the photos were used to develop in “dark room”. A negative of the photo film (we will deal with negatives later) was also generated as a small back up of the original image. This is what I would call to be analog images, which are obsolete these days.
These days, you can find images being captured by cameras and are stored digitally inside a chip (flash memory). They are represented by means of a multidimensional array of numbers (usually binary). They are of two types – raster and vector.
A raster image is where images are stored in matrix form. The pixels of image are arranged in forms of rows and columns. This is the most commonly found image type, and can be created very easily using cameras, software, etc. All the image processing and editing software support raster images only, and hence we will work with them.
Images which are inspired by the concepts of mathematical geometry (vectors) are called vector images. According to the rule of vectors, each point in the image has a direction and length. Such images are quite complicated to understand and process. It is not supported by many software and much work hasn’t been done in this field. Hence, we won’t be dealing with vector images anymore.
Image Processing and Computer Vision
An image, as it is, is useless to us, unless we extract some useful information from it. Basically, what we do is we take an image, process it by applying some algorithms and procedures on it, and then finally we get an output which can be image or some other characteristics.
Confused? Okay, so let’s take a photo, say this one.
This is a photo which I took from my digicam as soon as I received my new seal since I wanted to test it somewhere! As soon as I took the pic, I transferred it to my laptop only to be disappointed by its poor image quality. After looking at the pic, anybody would say that its faded, colors are washed out, etc etc. So what would you do? Open up any photo editing software like Photoshop or Picasa, and then increase the brightness, enhance the contrast, fill some light into it, highlight the shadows a little bit, smoothen the pic a little to remove some discontinuities, etc and then get something like this —
This is what I would say is Image Processing. You take an image, tweak some of its properties, and you get another enhanced image. Now let’s take another image, such as this one —
Okay, so this is me while organizing one of the club events last semester! Now, once again I do some processing on it, and get something like this —
Needless to say that this time the processing done has detected my face in the photo! This is what I’d call to be Computer Vision. You take a pic and extract some features from it like faces, objects, colors, gestures, etc. Thus, in contrast to Image Processing, where we mostly deal with applying techniques directly on the pixels to enhance the overall picture, Computer Vision involves working with higher concepts and algorithms related to artificial intelligence, which involves intense programming so that it blends with the user’s activities and requirements. The scope of Computer Vision is much large as can be seen in the following —
Here is a summarized table —
In either case, we will be working with images, and hence basic Image Processing techniques are required for Computer Vision. We will discuss these concepts slowly during a course of time. For now, lets deal with some of the simplest concepts related to images.
Pixels and Resolution
Pixels are tiny little dots that form the image. They are the smallest visual elements that can be seen. This makes them physically located somewhere in a raster image. When an image is stored, the image file contains the following information:
- Pixel Location
- Pixel Intensity
Resolution basically refers to the total number of pixels in an image. It is usually represented in m×n (pronounced m-cross-n) format where m is the width of the image and n is the height of the image. To know about the width and height of an image, scroll up and view the image illustration below the description of raster image. For instance an image having a width of 100 px (px=pixels) and height of 100 px has a resolution of 100-cross-100 or 100×100. Sometimes resolution is also represented as a multiplication of width and height, like a 100×100 image can also be referred to as 10,000 pixels image.
Now lets take the following case —
This shows a single image in seven different resolutions. The image is self explanatory. Greater resolution → Greater detail → Greater processing power required.
Aspect Ratio is basically a ratio of Width:Height of the image. For instance a 256×256 image has an aspect ratio of 1:1. You must have come across this thing in several context. Like while watching a movie or some TV shows, you must have come across different aspect ratio standards. There are basically three aspect ratio standards —
- Academy Standard – 4:3 (or) 1.33:1 – Traditional NTSC Television transmission
- US Digital Standard – 16:9 (or) 1.78:1 – HDTV transmission
- Anamorphic Scope Standard – 21:9 (or) 2.35:1 – Cinemascope production
And perhaps this is the reason why you get black bars on top and bottom (or at sides) whenever you watch a movie in your TV or monitor. The following compares different aspect ratios with different TV sizes.
You can check it out yourself! Open any video in your favorite video player and change the aspect ratio. In VLC Media Player, while the video is playing, press the “A” button in your keyboard. You will find the aspect ratio changing. Keep on pressing it and it will change the aspect ratio in the following cycle:
It is recommended to play the video in the Default mode, unless you need it for some other purpose. Personally, I feel that changing the default aspect ratio to some other removes the feel of the movie. ;)
A 3.1 MP Image
An image that is 2048 pixels in width and 1536 pixels in height has a total of 2048×1536 = 3,145,728 pixels or 3.1 megapixels. Its aspect ratio is 2048:1536 = 4:3 = 1.33:1
One could refer to it as 2048-by-1536 or a 3.1-megapixel image.
A 1080p Full HD Movie
We all must have heard of 1080p Full HD movies. Note that 1080p Full HD images don’t exist, only Full HD movies exist. Which means that each of its frame (a frame is basically a snapshot or a still image from a video) has a resolution of 1920×1080 = 2,073,600 pixels or 2.0 megapixels, having an aspect ratio of 1920:1080 = 16:9 = 1.78:1
Images can be represented in three ways —
- Black and White Images (aka Binary Images)
- Grayscale Images
- Color Images
Black and White Images
Before I set to define these images, I would like to show you this pic —
This photo of Lena (Lenna) Söderberg is the most widely used test image for computer vision applications. I have made a conscious decision to use this image for examples on this webpage. We should be respectful of the woman in the photo and know how this image came to be. Read about the photo at The Lenna Story.
Anyways, lets come back to our topic. According to you, what would you call the Lena image shown above? — Black and White or not? I am sure that most of you will call it as that. If so, then what you call the following picture? — Black and White or not?
Don’t worry if you are trolled! Even I was trolled when someone told me about this for the first time! It is a misconception that people call the former image as black and white, and this misconception is quite widespread! The latter picture is the true black and white image whereas the former one is called grayscale image. This takes us to their definition.
A black and white image is an image which, unsurprisingly, comprises of only two colors — black and white. Black is usually represented as zero (0) and white is represented as 1, thus making this image a Binary Image.
In a Grayscale Image, images are represented by several shades ranging in between black and white. Black is usually represented as 0 and white is represented as 1. But unlike binary images, intermediate values in between 0 and 1 are also possible here like 0.17, 0.45, 0.98, etc, thus resulting in different shades of gray. This can be represented in the following scale —
Now the question arises, how many different shades of gray exist in between black and white? The answer to this question lies in the term “Depth” of an image, which we will discuss later in this post.
Color Images are images formed by the combination of the three primary colors – Red, Green and Blue. Each one of these colors has its own plane of pixel intensities in form of separate channels. The channels correspond to different color spaces as well. If you are confused, do not fret, we will discuss about channels and color spaces in detail in our next post.
Depth represents the number of shades of a particular color used in the formation of an image. It applies to grayscale as well as color images. For instance, an 8-bit image has 28 = 256 shades in between black and white, whereas a 16-bit image has 216 = 65,536 shades in between black and white. Obviously, greater the depth of the image, greater number of unique colors/shades are used to represent the image.
- 1-bit : 21 = 2 shades (black & white / binary)
- 8-bit : 28 = 256 shades
- 24-bit : 224 = 16,777,216 shades (true color)
- 64-bit : 264 = 18,446,744,073,709,551,616 shades
Depth also represents the space required by each pixel to be stored in the memory. For example, considering an 8-bit 2MP image of resolution 1080p i.e. 1920×1080, it requires 1920×1080×8 = 16,588,800 bits = 2,073,600 bytes = 2,025 kB = 1.977 MB. In other words, each pixel of the image stores 8 bits of information. But this isn’t always the exact size of an image. The exact size of the image also depends upon the different color shades in the image and the compression technique used like for instance an image in PNG format will occupy much lesser space as compared to the same image in JPG format. Thus, this also comes at a cost. Greater the depth of an image, the more space is occupied by each pixel, which means we need greater processing power and time as well to process them. We will discuss more about bit representation of images in the next post along with channels and color spaces.
- An image is 2D representation of 3D world.
- Digital images can be stored in physical memory (like chips) and are easier to process.
- Raster images are represented as a 2D matrix of pixels.
- Image Processing basically results in enhancement of images, whereas Computer Vision aims at extraction of features from the images.
- A pixel is the smallest visual element of an image.
- Resolution refers to the total number of pixels in an image.
- Aspect Ratio is the ratio of Width:Height of an image.
- Binary images are comprised only of two colors – black (0) and white (1).
- Grayscale images are comprised of different shades of gray in between black (0) and white (1).
- Color images are formed by combination of different color planes (RGB, HSV, YCrCb, etc) which will be discussed in detail in next post.
- Depth of an image represents the number of shades of color in between 0 and 1 in an image.
The following image shows all the concepts in a nutshell.
So folks, that’s enough for one post! We will continue shortly with the next post where we will discuss about channels and color spaces. So, to stay updated, subscribe to maxEmbedded and/or grab the RSS Feeds! :-)
And don’t forget to share your views, doubts and queries below. Thank you.
Admin at maxEmbedded
VIT University, Vellore, India