In the recent years, hyperspectral images obtained a major role as spectral signatures. Their continuous spectrum and multi-band structure allowed them to be powerful discriminators. Before diving into details of HSI ( hyperspectral images ) and segmentation methods, let’s discover other type of images.
Binary images are basically 2D arrays consisting of only 0’s and 1’s. While a zero corresponds to a black pixel, a one indicates a white pixel.
Just like binary images, greyscale images are also 2D arrays. But their values range from 0 to 255( for 8-bit pixels) . As in binary images, 0 represents a black pixel and as this number goes up pixel becomes lighter. When it reaches 255, it eventually becomes white.
Finally, we get to add some color to our images. Previous two image types consist of a single band of pixels since they are basically 2D arrays. RGB images take the concept of greyscale imaging and improves further. An RGB image can be considered as 3 greyscale images stacked on top of each other. Each 2D array corresponds to the contribution of a main color: red, green and blue. Depending on the combination of these 3 bands, an intermediate color is formed.
A multispectral image is a 3D array consisting of multiple bands ranging from 5 to 30. They are like RGB images with the exception that they have many more bands.
After all, we reached the main topic of this post. Hyperspectral images are also n-band images like multispectral images. However, they may have tens, even hudreds, of bands. Each pixel contains a very dense and continuous spectral information which allows HSI to be used as spectral signatures. On the other hand, multispectral images are quite discrete unlike hyperspectral images. That’s why multispectral images aren’t powerful discriminators.
One other fact about hyperspectral images is that they have wavelengths beyond the visible range. The visible light lies between 380-740 nanometers in the electromagnetic spectrum. Any other wavelength out of this range is invisible to human eyes.
Now, we are all set to discuss how we can apply segmentation on HSI. Kanezaki’s paper is quite inspiring to apply the concept of “unsupervised segmentation” on hyperspectral images. In the paper, Kanezaki shows her method of “unsupervised segmentation” for RGB(three-band) images. Based on her work, let’s see how we can extend this method for a 60-band image.
An image segmentation process generally involves a ground truth which can act as a label. But in this scenario, we will not be using ground truth or any kind of label which makes the problem a complete “unsupervised segmentation”.
Since we are dealing with images, best option to extract some features is to use Convolutional Neural Networks. But wait, how can we train a CNN if we don’t have any label to calculate the loss and backpropagate? This is where Kanezaki’s work becomes really handy.
Let’s begin with dividing the whole image into smaller parts. These small parts are called “superpixels” and each superpixel contains a set of indices of the pixels. We can select K superpixels to begin with. The purpose of this operation is to identify similar textures and group them together.
Then we are good to go! Let’s pass the image through a CNN and a fully connected layer, respectively. The resulting image(yn for each pixel) is a convolved image. Hence, we can now inspect the resulting matrix to identify similar pixels.
Each pixel’s largest component ( out of 60 bands ) is selected in order to find out the most intense band of the image and its index is stored as cn . Then the process is repeated for K superpixels. This time, for each superpixel, the most frequent band index is selected and assigned to all pixels in the superpixel as c‘n . This process allows us to identify the strongest band in the superpixel.
After all, the last and the most important step is backpropagation.Since we don’t have any labels, we can use c‘n as our label. The output of the model was defined as yn. So, if we calculate the Loss (yn , c‘n), we can backpropagate by this value. This whole operation ( forward propagation + band selection + backpropagation ) is repeated until the segmentation is plausible like in the figure above.