Julia REPL: GPU Accelerated Medical Image Segmentation Framework

GPU accelerated medical image segmentation framework

Programming framework presented is a set of tools for medical image segmentation. Packages are implemented with GPU acceleration.For convenient rapid prototyping all of the programming process together with visualisation and annotation can be done with the help of Julia REPL . What is also worth pointing out is a fact that multiple metrics implemented in the presented software are to the best knowledge author are characterised by shortest execution time among all popular open source solutions.

As the preparation for the workshop I will ask participants to load earlier the dataset on which we would work on - Dataset can be found in link [2]. Additionally you can load required packages to the enviroment where You will work in [3]. In order to fully participate you need to have Nvidia GPU available.

Medical image segmentation is a rapidly developing field of Computer Vision. This area of research requires knowledge in radiologic imaging, mathematics and computer science. In order to provide assistance to the researchers multiple software packages were developed. However because of the rapidly changing scientific environment those tools can no longer be effective for some of the users.
Such situation is present in the case of Julia language users that require support for the interactive programming development style that is not popular among traditional software tools. Another characteristic of modern programming for 3 dimensional medical imaging data is GPU acceleration which can give outstanding improvement of algorithms performance in case of working with  3D medical imaging. Hence in this work the author presents sets of new Julia language software tools that are designed to fulfil emerging needs. Those tools include GPU accelerated medical image viewer with annotation possibilities that is characterised by a very convenient programming interface. CUDA accelerated Medical segmentation metrics tool that supplies state of the art implementations of algorithms required for quantification of similarity between algorithm output and gold standard. Lastly, a set of utility tools connecting those two mentioned packages with HDF5 file system and preprocessing using MONAI and PythonCall.

Main unique feature of the presented framework is ease of interoperability with other Julia packages, which in the rapidly developing ecosystem of scientific computing may spark in the opinion of the author application of multiple algorithms from fields usually not widely used in medical image segmentation like differential programming, topology etc.

I am planning to conduct a workshop with the assumption of only basic knowledge of Julia programming and no medical knowledge at all. Most of the time would be devoted to walk through end to end example medical image segmentation like in the tutorial available under link below [1], with code executed live during workshop. In order to run some parts of the workshop users would need a CUDA environment. Because of the complex nature of the problem some theoretical introductions will also be needed.

Plan for the workshop :

1. Introduction to medical imaging data format
2. Presentation of loading data and simple preprocessing using MONAI and PythonCall
3. Tutorial presenting how to use MedEye3d viewer and annotator
4. Implementing first phase of example algorithm on CPU showing some Julia features supporting work on multidimensional arrays
5. Presenting further part of the example algorithm using GPU acceleration with CUDA.jl and ParallelStencil with short introduction to GPU programming .
6. Presenting how to save and retrieve data using HDF5.jl
7. Show how to apply medical segmentation metrics from MedEval3D, and some introduction how to choose properly the metric depending on the problem
8. Discuss How one can improve the performance of the algorithm and what are some planned  future directions


[1] https://github.com/jakubMitura14/MedPipe3DTutorial 
[2]Participants can download data before task 9 from https://drive.google.com/drive/folders/1HqEgzS8BV2c7xYNrZdEAnrHk7osJJ--2   
[3] ]add Flux Hyperopt Plots UNet MedEye3d Distributions Clustering IrrationalConstants ParallelStencil CUDA HDF5 MedEval3D MedPipe3D Colors  

#julia #computervision

Julia REPL: GPU Accelerated Medical Image Segmentation Framework

How to Train Object Detector with Minimum DataSets

Train Object Detection With Small Datasets

Object detection, the task of localising and classifying objects in a scene, one of the most popular tasks in Computer Vision, has a main drawback: a large annotated dataset is necessary to train the model. Indeed, annotating a dataset is expensive, and the free available datasets are not enough, as they do not contain all the classes we are interested in. Thus, the goal of the tutorial is to introduce the main techniques to train a good object detector utilising the minimum amount of annotated data.

#computervision #opencv #machinelearning 

How to Train Object Detector with Minimum DataSets

JavaCV: Java interface to OpenCV, FFmpeg, and More

Introduction

JavaCV uses wrappers from the JavaCPP Presets of commonly used libraries by researchers in the field of computer vision (OpenCV, FFmpeg, libdc1394, FlyCapture, Spinnaker, OpenKinect, librealsense, CL PS3 Eye Driver, videoInput, ARToolKitPlus, flandmark, Leptonica, and Tesseract) and provides utility classes to make their functionality easier to use on the Java platform, including Android.

JavaCV also comes with hardware accelerated full-screen image display (CanvasFrame and GLCanvasFrame), easy-to-use methods to execute code in parallel on multiple cores (Parallel), user-friendly geometric and color calibration of cameras and projectors (GeometricCalibrator, ProCamGeometricCalibrator, ProCamColorCalibrator), detection and matching of feature points (ObjectFinder), a set of classes that implement direct image alignment of projector-camera systems (mainly GNImageAligner, ProjectiveTransformer, ProjectiveColorTransformer, ProCamTransformer, and ReflectanceInitializer), a blob analysis package (Blobs), as well as miscellaneous functionality in the JavaCV class. Some of these classes also have an OpenCL and OpenGL counterpart, their names ending with CL or starting with GL, i.e.: JavaCVCL, GLCanvasFrame, etc.

To learn how to use the API, since documentation currently lacks, please refer to the Sample Usage section below as well as the sample programs, including two for Android (FacePreview.java and RecordActivity.java), also found in the samples directory. You may also find it useful to refer to the source code of ProCamCalib and ProCamTracker as well as examples ported from OpenCV2 Cookbook and the associated wiki pages.

Please keep me informed of any updates or fixes you make to the code so that I may integrate them into the next release. Thank you! And feel free to ask questions on the mailing list or the discussion forum if you encounter any problems with the software! I am sure it is far from perfect...

Downloads

Archives containing JAR files are available as releases. The binary archive contains builds for Android, iOS, Linux, Mac OS X, and Windows. The JAR files for specific child modules or platforms can also be obtained individually from the Maven Central Repository.

To install manually the JAR files, follow the instructions in the Manual Installation section below.

We can also have everything downloaded and installed automatically with:

  • Maven (inside the pom.xml file)
  <dependency>
    <groupId>org.bytedeco</groupId>
    <artifactId>javacv-platform</artifactId>
    <version>1.5.7</version>
  </dependency>
  • Gradle (inside the build.gradle file)
  dependencies {
    implementation group: 'org.bytedeco', name: 'javacv-platform', version: '1.5.7'
  }
  • Leiningen (inside the project.clj file)
  :dependencies [
    [org.bytedeco/javacv-platform "1.5.7"]
  ]
  • sbt (inside the build.sbt file)
  libraryDependencies += "org.bytedeco" % "javacv-platform" % "1.5.7"

This downloads binaries for all platforms, but to get binaries for only one platform we can set the javacpp.platform system property (via the -D command line option) to something like android-arm, linux-x86_64, macosx-x86_64, windows-x86_64, etc. Please refer to the README.md file of the JavaCPP Presets for details. Another option available to Gradle users is Gradle JavaCPP, and similarly for Scala users there is SBT-JavaCV.

Required Software

To use JavaCV, you will first need to download and install the following software:

Further, although not always required, some functionality of JavaCV also relies on:

Finally, please make sure everything has the same bitness: 32-bit and 64-bit modules do not mix under any circumstances.

Manual Installation

Simply put all the desired JAR files (opencv*.jar, ffmpeg*.jar, etc.), in addition to javacpp.jar and javacv.jar, somewhere in your class path. Here are some more specific instructions for common cases:

NetBeans (Java SE 7 or newer):

  1. In the Projects window, right-click the Libraries node of your project, and select "Add JAR/Folder...".
  2. Locate the JAR files, select them, and click OK.

Eclipse (Java SE 7 or newer):

  1. Navigate to Project > Properties > Java Build Path > Libraries and click "Add External JARs...".
  2. Locate the JAR files, select them, and click OK.

Visual Studio Code (Java SE 7 or newer):

  1. Navigate to Java Projects > Referenced Libraries, and click +.
  2. Locate the JAR files, select them, and click OK.

IntelliJ IDEA (Android 7.0 or newer):

  1. Follow the instructions on this page: http://developer.android.com/training/basics/firstapp/
  2. Copy all the JAR files into the app/libs subdirectory.
  3. Navigate to File > Project Structure > app > Dependencies, click +, and select "2 File dependency".
  4. Select all the JAR files from the libs subdirectory.

After that, the wrapper classes for OpenCV and FFmpeg, for example, can automatically access all of their C/C++ APIs:

Sample Usage

The class definitions are basically ports to Java of the original header files in C/C++, and I deliberately decided to keep as much of the original syntax as possible. For example, here is a method that tries to load an image file, smooth it, and save it back to disk:

import org.bytedeco.opencv.opencv_core.*;
import org.bytedeco.opencv.opencv_imgproc.*;
import static org.bytedeco.opencv.global.opencv_core.*;
import static org.bytedeco.opencv.global.opencv_imgproc.*;
import static org.bytedeco.opencv.global.opencv_imgcodecs.*;

public class Smoother {
    public static void smooth(String filename) {
        Mat image = imread(filename);
        if (image != null) {
            GaussianBlur(image, image, new Size(3, 3), 0);
            imwrite(filename, image);
        }
    }
}

JavaCV also comes with helper classes and methods on top of OpenCV and FFmpeg to facilitate their integration to the Java platform. Here is a small demo program demonstrating the most frequently useful parts:

import java.io.File;
import java.net.URL;
import org.bytedeco.javacv.*;
import org.bytedeco.javacpp.*;
import org.bytedeco.javacpp.indexer.*;
import org.bytedeco.opencv.opencv_core.*;
import org.bytedeco.opencv.opencv_imgproc.*;
import org.bytedeco.opencv.opencv_calib3d.*;
import org.bytedeco.opencv.opencv_objdetect.*;
import static org.bytedeco.opencv.global.opencv_core.*;
import static org.bytedeco.opencv.global.opencv_imgproc.*;
import static org.bytedeco.opencv.global.opencv_calib3d.*;
import static org.bytedeco.opencv.global.opencv_objdetect.*;

public class Demo {
    public static void main(String[] args) throws Exception {
        String classifierName = null;
        if (args.length > 0) {
            classifierName = args[0];
        } else {
            URL url = new URL("https://raw.github.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_alt.xml");
            File file = Loader.cacheResource(url);
            classifierName = file.getAbsolutePath();
        }

        // We can "cast" Pointer objects by instantiating a new object of the desired class.
        CascadeClassifier classifier = new CascadeClassifier(classifierName);
        if (classifier == null) {
            System.err.println("Error loading classifier file \"" + classifierName + "\".");
            System.exit(1);
        }

        // The available FrameGrabber classes include OpenCVFrameGrabber (opencv_videoio),
        // DC1394FrameGrabber, FlyCapture2FrameGrabber, OpenKinectFrameGrabber, OpenKinect2FrameGrabber,
        // RealSenseFrameGrabber, RealSense2FrameGrabber, PS3EyeFrameGrabber, VideoInputFrameGrabber, and FFmpegFrameGrabber.
        FrameGrabber grabber = FrameGrabber.createDefault(0);
        grabber.start();

        // CanvasFrame, FrameGrabber, and FrameRecorder use Frame objects to communicate image data.
        // We need a FrameConverter to interface with other APIs (Android, Java 2D, JavaFX, Tesseract, OpenCV, etc).
        OpenCVFrameConverter.ToMat converter = new OpenCVFrameConverter.ToMat();

        // FAQ about IplImage and Mat objects from OpenCV:
        // - For custom raw processing of data, createBuffer() returns an NIO direct
        //   buffer wrapped around the memory pointed by imageData, and under Android we can
        //   also use that Buffer with Bitmap.copyPixelsFromBuffer() and copyPixelsToBuffer().
        // - To get a BufferedImage from an IplImage, or vice versa, we can chain calls to
        //   Java2DFrameConverter and OpenCVFrameConverter, one after the other.
        // - Java2DFrameConverter also has static copy() methods that we can use to transfer
        //   data more directly between BufferedImage and IplImage or Mat via Frame objects.
        Mat grabbedImage = converter.convert(grabber.grab());
        int height = grabbedImage.rows();
        int width = grabbedImage.cols();

        // Objects allocated with `new`, clone(), or a create*() factory method are automatically released
        // by the garbage collector, but may still be explicitly released by calling deallocate().
        // You shall NOT call cvReleaseImage(), cvReleaseMemStorage(), etc. on objects allocated this way.
        Mat grayImage = new Mat(height, width, CV_8UC1);
        Mat rotatedImage = grabbedImage.clone();

        // The OpenCVFrameRecorder class simply uses the VideoWriter of opencv_videoio,
        // but FFmpegFrameRecorder also exists as a more versatile alternative.
        FrameRecorder recorder = FrameRecorder.createDefault("output.avi", width, height);
        recorder.start();

        // CanvasFrame is a JFrame containing a Canvas component, which is hardware accelerated.
        // It can also switch into full-screen mode when called with a screenNumber.
        // We should also specify the relative monitor/camera response for proper gamma correction.
        CanvasFrame frame = new CanvasFrame("Some Title", CanvasFrame.getDefaultGamma()/grabber.getGamma());

        // Let's create some random 3D rotation...
        Mat randomR    = new Mat(3, 3, CV_64FC1),
            randomAxis = new Mat(3, 1, CV_64FC1);
        // We can easily and efficiently access the elements of matrices and images
        // through an Indexer object with the set of get() and put() methods.
        DoubleIndexer Ridx = randomR.createIndexer(),
                   axisIdx = randomAxis.createIndexer();
        axisIdx.put(0, (Math.random() - 0.5) / 4,
                       (Math.random() - 0.5) / 4,
                       (Math.random() - 0.5) / 4);
        Rodrigues(randomAxis, randomR);
        double f = (width + height) / 2.0;  Ridx.put(0, 2, Ridx.get(0, 2) * f);
                                            Ridx.put(1, 2, Ridx.get(1, 2) * f);
        Ridx.put(2, 0, Ridx.get(2, 0) / f); Ridx.put(2, 1, Ridx.get(2, 1) / f);
        System.out.println(Ridx);

        // We can allocate native arrays using constructors taking an integer as argument.
        Point hatPoints = new Point(3);

        while (frame.isVisible() && (grabbedImage = converter.convert(grabber.grab())) != null) {
            // Let's try to detect some faces! but we need a grayscale image...
            cvtColor(grabbedImage, grayImage, CV_BGR2GRAY);
            RectVector faces = new RectVector();
            classifier.detectMultiScale(grayImage, faces);
            long total = faces.size();
            for (long i = 0; i < total; i++) {
                Rect r = faces.get(i);
                int x = r.x(), y = r.y(), w = r.width(), h = r.height();
                rectangle(grabbedImage, new Point(x, y), new Point(x + w, y + h), Scalar.RED, 1, CV_AA, 0);

                // To access or pass as argument the elements of a native array, call position() before.
                hatPoints.position(0).x(x - w / 10     ).y(y - h / 10);
                hatPoints.position(1).x(x + w * 11 / 10).y(y - h / 10);
                hatPoints.position(2).x(x + w / 2      ).y(y - h / 2 );
                fillConvexPoly(grabbedImage, hatPoints.position(0), 3, Scalar.GREEN, CV_AA, 0);
            }

            // Let's find some contours! but first some thresholding...
            threshold(grayImage, grayImage, 64, 255, CV_THRESH_BINARY);

            // To check if an output argument is null we may call either isNull() or equals(null).
            MatVector contours = new MatVector();
            findContours(grayImage, contours, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);
            long n = contours.size();
            for (long i = 0; i < n; i++) {
                Mat contour = contours.get(i);
                Mat points = new Mat();
                approxPolyDP(contour, points, arcLength(contour, true) * 0.02, true);
                drawContours(grabbedImage, new MatVector(points), -1, Scalar.BLUE);
            }

            warpPerspective(grabbedImage, rotatedImage, randomR, rotatedImage.size());

            Frame rotatedFrame = converter.convert(rotatedImage);
            frame.showImage(rotatedFrame);
            recorder.record(rotatedFrame);
        }
        frame.dispose();
        recorder.stop();
        grabber.stop();
    }
}

Furthermore, after creating a pom.xml file with the following content:

<project>
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.bytedeco.javacv</groupId>
    <artifactId>demo</artifactId>
    <version>1.5.7</version>
    <properties>
        <maven.compiler.source>1.7</maven.compiler.source>
        <maven.compiler.target>1.7</maven.compiler.target>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>javacv-platform</artifactId>
            <version>1.5.7</version>
        </dependency>

        <!-- Additional dependencies required to use CUDA and cuDNN -->
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>opencv-platform-gpu</artifactId>
            <version>4.5.5-1.5.7</version>
        </dependency>

        <!-- Optional GPL builds with (almost) everything enabled -->
        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>ffmpeg-platform-gpl</artifactId>
            <version>5.0-1.5.7</version>
        </dependency>
    </dependencies>
    <build>
        <sourceDirectory>.</sourceDirectory>
    </build>
</project>

And by placing the source code above in Demo.java, or similarly for other classes found in the samples, we can use the following command to have everything first installed automatically and then executed by Maven:

 $ mvn compile exec:java -Dexec.mainClass=Demo

Note: In case of errors, please make sure that the artifactId in the pom.xml file reads javacv-platform, not javacv only, for example. The artifact javacv-platform adds all the necessary binary dependencies.

Build Instructions

If the binary files available above are not enough for your needs, you might need to rebuild them from the source code. To this end, the project files were created for:

Once installed, simply call the usual mvn install command for JavaCPP, its Presets, and JavaCV. By default, no other dependencies than a C++ compiler for JavaCPP are required. Please refer to the comments inside the pom.xml files for further details.

Instead of building the native libraries manually, we can run mvn install for JavaCV only and rely on the snapshot artifacts from the CI builds:

Download Details:
Author: bytedeco
Source Code: https://github.com/bytedeco/javacv
License: View license

#computervision  #java #opencv 

JavaCV: Java interface to OpenCV, FFmpeg, and More

How Computer Vision Can Be Used for The MetaVerse

With Facebook’s rebrand to Meta and multibillion-dollar valuations of metaverse-centric crypto projects like MANA, SAND, and GALA, it’s safe to say that the idea of metaverse has seen a resurgence in interest. As a result, plenty of people have made half-decent definitions, but most don’t appreciate the digital world that is already looming over us. This video explores just how “meta” our high-tech world already is, and why machine learning, specifically computer vision, will be key as the metaverse expands. In this video, we will discuss how Computer Vision can be used for the MetaVerse.

#metaverse #computervision #crypto #blockchain 

How Computer Vision Can Be Used for The MetaVerse

Introduction to Epipolar Geometry and Stereo Vision With OpenCV & C++

Have you ever wondered why you can experience that wonderful 3D effect when you watch a movie with those special 3D glasses? Or why is it difficult to catch a cricket ball with your one eye closed? It all relates to stereoscopic vision, which is our ability to perceive depth using both the eyes. This post uses OpenCV and stereo vision to give this power of perceiving depth to a computer. The code is provided in Python and C++.

GIF showing object detection along with distance

The cool part about the above GIF is that besides detecting different objects, the computer is also able to tell how far they are. Which means it can perceive depth! For this video, the stereo camera setup of OAK-D (OpenCV AI Kit- Depth) was used to help the computer perceive depth.

What is a stereo camera setup? How do we use it to provide a sense of depth to a computer? Does it have anything to do with stereoscopic vision?

This post will try to answer these questions by understanding fundamental concepts related to epipolar geometry and stereo vision.

Most of the post’s theoretical explanations are inspired by the book: Multiple View Geometry in Computer Vision by Richard Hartley and Andrew Zisserman. It is a very famous and standard textbook for understanding various fundamental concepts of computer vision.

This post is the first part of the Introduction to Spatial AI series. It provides a detailed introduction to various fundamental concepts and creates a strong foundation for the subsequent parts of the series.

Great! So let’s get started and help our computer to perceive depth!

Do we need more than one image to calculate depth?

When we capture (project) a 3D object in an image, we are projecting it from a 3D space to a 2D (planar) projective space. This is called the Planar Projection. The problem is that we lose the depth information due to this planar projection. 

So how do we recover the depth? Can we calculate back the depth of a scene using a single image? Let’s try a simple example.

Point-Recovery-in-3D-Infinite-Solution

Figure 1 – Locating a 3D point (X), at an unknown depth, with a single known 3D point (C1) and direction vector (L1).

In figure 1, C1 and X are points in 3D space, and the unit vector L1 gives the direction of the ray from C1 through X. Now, can we find X if we know the values of point C1 and direction vector L1? Mathematically it simply means to solve for X in the equation

X = C1 + k(L1)

Now, as the value of k is not known, we cannot find a unique value of X.

Point recovery in 3D (unique solution)

Figure 2 – Locating a 3D point (X), at an unknown depth, with two known 3D points (C1 and C2) and direction vectors (L1 and L2) – Triangulation.

In figure 2, we have an additional point C2, and L2 is the direction vector of the ray from C2 through X. Now can we find a unique value for X if C2 and L2 are also known to us?

Yes! Because the rays originating from C1 and C2 clearly intersect at a unique point, point X itself. This is called triangulation. We say we triangulated point X.

Triangulation of single 3D point in two view geometry.

Figure 3 – Extending the triangulation concept to explain how a 3D point (X) captured in two images can be calculated if the camera positions (C1 and C2) and pixel coordinates (x1 and x2) are known.

Figure 3 shows how triangulation can be used to calculate the depth of a point (X) when captured(projected) in two different views(images). In this figure, C1 and C2 are known 3D positions of the left and right cameras, respectively. x1 is the image of the 3D point X captured by the left camera, and x2 is the image of X captured by the right camera. x1 and x2 are called corresponding points because they are the projection of the same 3D point. We use x1 and C1 to find L1 and x2 and C2 to find L2. Hence we can use triangulation to find X just like we did for figure 2.

From the above example, we learned that to triangulate a 3D point using two images capturing it from different views, the key requirements are:

  1. Position of the cameras – C1 and C2.
  2. Point correspondence – x1 and x2.

Great! It is now clear that we need more than one image to find depth.

Hey! but this was just a single 3D point that we tried to calculate. How do we calculate a 3D structure of a real-world scene by capturing it from two different views? The obvious answer is by repeating the above process for all the 3D points captured in both the views. Let’s have a closer look at the practical challenges in doing this. Time for the reality check!

Practical and theoretical understanding of the two-view geometry

Real world scene captured from two different view points

Figure 4 – Real world scene captured from two different view points.

Figure 4 shows two images capturing a real-world scene from different viewpoints. To calculate the 3D structure, we try to find the two key requirements mentioned before:

  1. The position of cameras in the real world coordinate system (C1 and C2). We simplify this problem by calculating the 3D points by assuming one of the camera positions (C1 or C2) as the origin. We find it by calibrating the two view system using a known calibration pattern. This process is called stereo calibration.

2. The point correspondence (x1 and x2) for each 3D point (X) in the scene to be calculated. We will discuss various improvements for calculating point correspondence and finally understand how epipolar geometry can help us to simplify the problem.

Note that the stereo camera calibration is useful only when the images are captured by a pair of cameras rigidly fixed with respect to each other. If a single camera captures the images from two different angles, then we can find depth only to a scale. The absolute depth is unknown unless we have some special geometric information about the captured scene that can be used to find the actual scale.

Hand picked feature matches

Figure 5 – Hand picked feature matches.

Figure 5 shows different matched points that were manually marked. It is easy for us to identify the corresponding points, but how do we make a computer do that?

One method which people regularly use in the computer vision community is called feature matching. Following figure 6 shows matched features between the left and right images using ORB feature descriptors. This is one method to find point correspondence (matches). 

Feature Matching -output

Figure 6 – Results of feature detection algorithm.

However, we observe that the ratio of the number of pixels with known point correspondence to the total number of pixels is minimal. This means we will have a very sparsely reconstructed 3D scene. For dense reconstruction, we need to obtain point correspondence for the maximum number of pixels possible.  

Multiple matched points using template matching

Figure 7 – Multiple matched points using template matching.

A simplified way to find the point correspondences is to find pixels with similar neighboring pixel information. In figure 7, we observe that using this method of matching pixels with similar neighboring information results in a single-pixel from one image having multiple matches in the other image. We find it challenging to write an algorithm to determine the true match.

Is there a way to reduce our search space? Some theorem which we can use to eliminate all the extra false matches that lead to inaccurate correspondence? We make use of epipolar geometry here.

All this explanation and build-up was to introduce the concept of epipolar geometry. Now we will understand the importance of epipolar geometry in reducing search space for point correspondence.

Epipolar geometry and its use in point correspondence

Image explaining epipolar geometry

Figure 8 – Image explaining epipolar geometry.

In figure 8, we assume a similar setup to figure 3. A 3D point X is captured at x1 and x2 by cameras at C1 and C2, respectively. As x1 is the projection of X, If we try to extend a ray R1 from C1 that passes through x1, it should also pass through X. This ray R1 is captured as line L2, and X is captured as x2 in the image i2. As X lies on R1, x2 should lie on L2. This way, the possible location of x2 is constrained to a single line, and hence we can say that the search space for a pixel in image i2, corresponding to pixel x1, is reduced to a single line L2. We use epipolar geometry to find L2. 

Time to define some technical terms now! Along with Xwe can also project the camera centers in the respective opposite images. e2 is the projection of camera center C1 in image i2, and e1 is the projection of camera center C2 in image i1. The technical term for e1 and e2 is epipole. Hence in a two-view geometry setup, an epipole is the image of the camera center of one view in the other view. 

The line joining the two camera centers is called a baseline. Hence epipole can also be defined as the intersection of baseline with the image plane. 

Figure 8 shows that using R1 and baseline, we can define a plane P. This plane also contains X, C1, x1, x2, and C2. We call this plane the epipolar plane. Furthermore, the line obtained from the intersection of the epipolar plane and the image plane is called the epipolar line. Hence in our example, L2 is an epipolar line. For different values of X, we will have different epipolar planes and hence different epipolar lines. However, all the epipolar planes intersect at baseline, and all the epipolar lines intersect at epipole. All this together forms the epipolar geometry.

Revisiting figure 8 with all the technical terms we have learned till now.

We have epipolar plane P created using baseline B and ray R1. e1 and e2 are epipoles, and L2 is the epipolar line. Based on the epipolar geometry of the given figure, search space for pixel in image i2 corresponding to pixel x1 is constrained to a single 2D line which is the epipolar line l2. This is called the epipolar constraint.

Is there a way to represent the entire epipolar geometry by a single matrix? Furthermore, can we calculate this matrix using just the two captured images? The good news is that there is such a matrix, and it is called the Fundamental matrix. 

In the next two sections, we first understand what we mean by projective geometry and homogeneous representation and then try to derive the Fundamental matrix expression. Finally, we calculate the epipolar lines and represent the epipolar constraint by using the fundamental matrix.

Understanding projective geometry and homogeneous representation

How do we represent a line in a 2D plane? Equation of a line in a 2D plane is ax + by + c = 0. With different values of a, b, and c, we get different lines in a 2D plane. Hence a vector (a,b,c) can be used to represent a line.

Suppose we have line ln1 defined as 2x + 3y + 7 = 0 and line ln2 as 4x + 6y + 14 = 0. Based on our above discussion, l1 can be represented by the vector (2,3,7) and l2 by the vector (4,6,14). We can easily say that l1 and l2 essentially represent the same line and that the vector (4,6,14) is basically the scaled version of the vector (2,3,7), scaled by a factor of 2.

Hence any two vectors (a,b,c) and k(a,b,c), where k is a non-zero scaling constant, represent the same line. Such equivalent vectors, which are related by just a scaling constant, form a class of homogeneous vectors. The vector (a,b,c) is the homogeneous representation of its respective equivalent vector class. 

The set of all equivalent classes, represented by (a,b,c), for all possible real values of a, b, and c other than a=b=c=0, forms the projective space. We use the homogeneous representation of homogeneous coordinates to define elements like points, lines, planes, etc., in projective space. We use the rules of projective geometry to perform any transformations on these elements in the projective space.

Fundamental matrix derivation

In figure 3, Assume that we know the camera projection matrices for both the cameras, say P1 for the camera at C1 and P2 for the camera at C2.

What is a projection matrix? The camera’s projection matrix defines the relation between the 3D world coordinates and their corresponding pixel coordinates when captured by the camera. To know more about the camera projection matrix, read this post on camera calibration.  

Just like P1 projects 3D world coordinates to image coordinates, we define P1inv, the pseudo inverse of P1, such that we can define the ray R1 from C1 passing through x1 and X as:

X(k) = P1inv \times x1 + kC1

k is a scaling parameter as we do not know the actual distance of X from C1. We need to find the epipolar line Ln2 to reduce the search space for a pixel in i2 corresponding to pixel x1 in i1 as we know that Ln2 is the image of ray R1 captured in i2. Hence to calculate Ln2, we first find two points on ray R1, project them in image i2 using P2 and use the projected images of the two points to find Ln2.

The first point that we can consider on R1 is C1, as the ray starts from this point. The second point can be calculated by keeping k=0. Hence we get the points as C1 and (P1inv)(x1).

Using the projection matrix P2 we get the image coordinates of these points in the image i2 as P2*C1 and P2*P1inv*x1 respectively. We also observe that P2*C1 is basically the epipole e2 in image i2.

A line can be defined in projective geometry using two points p1 and p2 by simply finding their cross product p1 x p2. Hence

Ln2 = P2 * C1 \times P2 * P1inv * x1 \\  as \; e2 = P2 * C1 \\  \therefore Ln2 = e2 \times P2 * P1inv * x1 \\  F = e2 \times P2 * P1inv  \; (The fundamental matrix) \\  \therefore Ln2  = F*x1

In projective geometry, if a point x lies on a line L, we can write it in the form of the equation

x^T \times L = 0

Hence, as x2 lies on the epipolar line Ln2, we get

x2^T \times Ln2 = 0

By replacing the value of Ln2 from the above equation, we get the equation:

x2^T \times F \times x1 = 0

This is a necessary condition for the two points x1 and x2 to be corresponding points, and it is also a form of epipolar constraint. Thus F represents the overall epipolar geometry of the two-view system.

What else is so special about this equation? It can be used to find the epipolar lines!

Using Fundamental matrix to find epipolar lines

As x1 and x2 are corresponding points in the equation, if we can find correspondence for some points, using feature matching methods like ORB or SIFT, we can use them to solve the above equation for F.

The findFundamentalMat() method of OpenCV provides implementations of various algorithms, like 7-Point Algorithm, 8-Point Algorithm, RANSAC algorithm, and LMedS Algorithm, to calculate Fundamental matrix using matched feature points. 

Once F is known, we can find the epipolar line Ln2  using the formula 

Ln2 = F * x1

If we know Ln2, we can restrict our search for pixel x2 corresponding to pixel x1 using the epipolar constraint.

A special case of two-view vision – parallel imaging planes

We have been trying to solve the correspondence problem. We started by using feature matching, but we observed that it leads to a sparse 3D structure, as the point correspondence for a tiny fraction of the total pixels is known. Then we saw how we could use a template-based search for pixel correspondence. We learned how epipolar geometry could be used to reduce the search space for point correspondence to a single line – the epipolar line.

Can we simplify this process of finding dense point correspondences even further? 

Feature matching and epipolar lines for non-parallel imaging planes

Figure 9. Upper pair of images showing results for feature matching and lower pair of images showing a points in one image (left) and the corresponding points lying on respective epipolar lines in the second image (right).

Feature matching and epipolar lines for parallel imaging planes

Figure 10. A Special case of two view geometry. Upper pair of images showing results for feature matching and lower pair of images showing a points in one image (left) and the corresponding points lying on respective epipolar lines in the second image (right). Source – 2005 Stereo Dataset

Figure 9 and Figure 10 show the feature matching results and epipolar line constraint for two different pairs of images. What is the most significant difference between the two figures in terms of feature matching and the epipolar lines?

Yes! You got it right! The matched feature points have equal vertical coordinates in Figure 10. All the corresponding points have equal vertical coordinates. All the epipolar lines in Figure 10 have to be parallel and have the same vertical coordinate as the respective point in the left image. Well, what is so great about that?

Exactly! Unlike the case of figure 9, there is no need to calculate each epipolar line explicitly. If the pixel in the left image is at (x1,y1), the equation of the respective epipolar line in the second image is y=y1.

We search for each pixel in the left image for its corresponding pixel in the same row of the right image. This is a special case of two-view geometry where the imaging planes are parallel. Hence, the epipoles (image of one camera captured by the other camera) form at infinity. Based on our understanding of epipolar geometry, epipolar lines meet at epipoles. Hence in this case, as the epipoles are at infinity, our epipolar lines are parallel.

Awesome! This significantly simplifies the problem of dense point correspondence. However, we still have to perform triangulation for each point. Can we simplify this problem as well? Well, once again, the special case of parallel imaging planes has good news for us! It helps us to apply stereo disparity. It is similar to stereopsis or stereoscopic vision, the method that helps humans perceive depth. Let’s understand this in detail. 

Understanding stereo disparity

The following gif is generated using images from the Middlebury Stereo Datasets 2005. It demonstrates the pure translation motion of the camera, making the imaging planes parallel. Can you tell which objects are closer to the camera?

Gif showing left and right images where the imaging planes are parallel.

Gif showing left and right images where the imaging planes are parallel.

We can clearly say that the toy cow at the bottom is closer to the camera than the toys in the topmost row. How did we do this? We basically see the shift in the object in the two images. The more the shift closer is the object. This shift is what we call as disparity.  

How do we use it to avoid point triangulation for calculating depth? We calculate the disparity (shift of the pixel in the two images) for each pixel and apply a proportional mapping to find the depth for a given disparity value. This is further justified in figure 12.

Image explaining the relationship between disparity and depth mapFigure 12. Image from OpenCV documentation explaining the relation between disparity (x – x’) and depth Z.

Disparity = x – x’ = Bf/Z

Where B is the baseline (Distance between the cameras), and f is the focal length.

We will use a StereoSGBM method of OpenCV to write a code for calculating the disparity map for a given pair of images. The StereoSGBM method is based on [3].

C++

// Reading the left and right images.

cv::Mat imgL,imgR;
imgL = cv::imread("../im0.png"); // path to left image is "../im0.png"
imgR = cv::imread("../im1.png"); // path to left image is "../im1.png"

// Setting parameters for StereoSGBM algorithm
int minDisparity = 0;
int numDisparities = 64;
int blockSize = 8;
int disp12MaxDiff = 1;
int uniquenessRatio = 10;
int speckleWindowSize = 10;
int speckleRange = 8;

// Creating an object of StereoSGBM algorithm
cv::Ptr<cv::StereoSGBM> stereo = cv::StereoSGBM::create(minDisparity,numDisparities,blockSize,
disp12MaxDiff,uniquenessRatio,speckleWindowSize,speckleRange);

// Calculating disparith using the StereoSGBM algorithm
cv::Mat disp;
stereo->compute(imgL,imgR,disp);

// Normalizing the disparity map for better visualisation 
cv::normalize(disp, disp, 0, 255, cv::NORM_MINMAX, CV_8UC1);

// Displaying the disparity map
cv::imshow("disparity",disp);
cv::waitKey(0);

Python

# Reading the left and right images.

imgL = cv2.imread("../im0.png",0)
imgR = cv2.imread("../im1.png",0)

# Setting parameters for StereoSGBM algorithm
minDisparity = 0;
numDisparities = 64;
blockSize = 8;
disp12MaxDiff = 1;
uniquenessRatio = 10;
speckleWindowSize = 10;
speckleRange = 8;

# Creating an object of StereoSGBM algorithm
stereo = cv2.StereoSGBM_create(minDisparity = minDisparity,
        numDisparities = numDisparities,
        blockSize = blockSize,
        disp12MaxDiff = disp12MaxDiff,
        uniquenessRatio = uniquenessRatio,
        speckleWindowSize = speckleWindowSize,
        speckleRange = speckleRange
    )

# Calculating disparith using the StereoSGBM algorithm
disp = stereo.compute(imgL, imgR).astype(np.float32)
disp = cv2.normalize(disp,0,255,cv2.NORM_MINMAX)

# Displaying the disparity map
cv2.imshow("disparity",disp)
cv2.waitKey(0)

Left, right images of a real world scenario and corresponding output disparity image of a real world scenario.Figure 13. Left, right images of a real world scenario and corresponding output disparity image of a real world scenario. Source – 2014 High Resolution Stereo Datasets [2]

Try playing with the different parameters to observe how they affect the final output disparity map calculation. A detailed explanation of the StereoSGBM will be presented in the subsequent Introduction to Spatial AI series. In the next post, we will learn to create our own stereo camera setup and record live disparity map videos, and we will also learn how to convert a disparity map into a depth map. An interesting application of stereo cameras will also be explained, but that is a surprise for now! 

Link: https://learnopencv.com/introduction-to-epipolar-geometry-and-stereo-vision/

#opencv  #python  #computervision 

Introduction to Epipolar Geometry and Stereo Vision With OpenCV & C++

Eigenface using OpenCV (C++/Python)

In this post, we will learn about Eigenface — an application of Principal Component Analysis (PCA) for human faces.  We will also share C++ and Python code written using OpenCV to explain the concept.

The video below shows a demo of EigenFaces. The code for the application shown in the video is shared in this post.

What is PCA?

In our previous post, we learned about a dimensionality reduction technique called PCA. If you have not read the post, please do so. It is a pre-requisite for understanding this post.

Principal Component Analysis

Figure 1: The principal components of 2D data (red dots) are shown using the blue and green lines.

To quickly recap, we learned that the first principal component is the direction of maximum variance in the data. The second principal component is the direction of maximum variance in the space perpendicular (orthogonal) to the first principal component and so on and so forth. The first and second principal components the red dots (2D data) are shown using blue and green lines.

We also learned that the first principal component is the eigenvector of the covariance matrix corresponding to the maximum eigenvalue. The second principal component is the eigenvector corresponding to the second largest eigenvalue. 

What are EigenFaces ?

Eigenfaces are images that can be added to a mean (average) face to create new facial images. We can write this mathematically as,

 

  \[F = F_m + \sum^n_{i=1} \alpha_i F_i\]

 

where,

F is a new face.
F_m is the mean or the average face,
F_i is an EigenFace,
\alpha_i are scalar multipliers we can choose to create new faces. They can be positive or negative.

Eigen FaceFigure 2 : On the left is the mean image. On the right is a new face produced by adding 10 Eigenfaces with different weights (shown in center).

Eigenfaces are calculated by estimating the principal components of the dataset of facial images. They are used for applications like Face Recognition and Facial Landmark Detection.

An Image as a Vector

In the previous post, all examples shown were 2D or 3D data points. We learned that if we had a collection of these points, we can find the principal components. But how do we represent an image as a point in a higher dimensional space? Let’s look at an example.

A 100 x 100 color image is nothing but an array of 100 x 100 x 3 ( one for each R, G, B color channel ) numbers. Usually, we like to think of 100 x 100 x 3 array as a 3D array, but you can think of it as a long 1D array consisting of 30,000 elements.

You can think of this array of 30k elements as a point in a 30k-dimensional space just as you can imagine an array of 3 numbers (x, y, z) as a point in a 3D space!

How do you visualize a 30k dimensional space? You can’t. Most of the time you can build your argument as if there were only three dimensions, and usually ( but not always ), they hold true for higher dimensional spaces as well.

How to calculate EigenFaces?

To calculate EigenFaces, we need to use the following steps.

  1. Obtain a facial image dataset : We need a collection of facial images containing different kinds of faces. In this post, we used about 200 images from CelebA.
  2. Align and resize images : Next we need to align and resize images so the center of the eyes are aligned in all images. This can be done by first finding facial landmarks. In this post, we used aligned images supplied in CelebA. At this point, all the images in the dataset should be the same size.
  3. Create a data matrix: Create a data matrix containing all images as a row vector. If all the images in the dataset are of size 100 x 100 and there are 1000 images, we will have a data matrix of size 30k x 1000.
  4. Calculate Mean Vector [Optional]: Before performing PCA on the data, we need to subtract the mean vector. In our case, the mean vector will be a 30k x 1 row vector calculated by averaging all the rows of the data matrix. The reason calculating this mean vector is not necessary for using OpenCV’s PCA class is because OpenCV conveniently calculates the mean for us if the vector is not supplied. This may not be the case in other linear algebra packages.
  5. Calculate Principal Components: The principal components of this data matrix are calculated by finding the Eigenvectors of the covariance matrix. Fortunately, the PCA class in OpenCV handles this calculation for us. We just need to supply the datamatrix, and out comes a matrix containing the Eigenvectors
  6. Reshape Eigenvectors to obtain EigenFaces: The Eigenvectors so obtained will have a length of 30k if our dataset contained images of size 100 x 100 x 3. We can reshape these Eigenvectors into 100 x 100 x 3 images to obtain EigenFaces.

Principal Component Analysis (PCA) using OpenCV

The PCA class in OpenCV allows us to compute the principal components of a data matrix. Read the documentation for different usages. Here we are discussing the most common way to use the PCA class.

C++

PCA (Mat &data, Mat &mean, int flags, int maxComponents=0)

// Example usage
PCA pca(data, Mat(), PCA::DATA_ORDER_ROW, 10);
Mat mean = pca.mean;
Mat eigenVectors = pca.eigenvectors;

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!
 

Download Code

Python

mean, eigenvectors = cv2.PCACompute ( data, mean=mean, maxComponents=maxComponents )

// Example usage
mean, eigenVectors = cv2.PCACompute(data, mean=None, maxComponents=10)

where,

dataThe data matrix containing every data point as either a row or a column vector. If our data consists of 1000 images, and each image is a 30k long row vector, the data matrix will of size 30k x 1000.
meanThe average of the data. If every data point in the data matrix is a 30k long row vector, the mean will also be a vector of the same size. This parameter is optional and is calculated internally if it is not supplied.
flags

It can take values DATA_AS_ROW or DATA_AS_COL indicating whether a point in the data matrix is arranged along the row or along the column. In the code we have shared, we have arranged it as a row vector.
 

Note : In the Python version, you do not have the option of specifying this flag. The data needs to have one image in one row.

maxComponentsThe maximum number of principal components is usually the smaller of the two values 1) Dimensionality of the original data ( in our case it is 30k ) 2) The number of data points ( e.g. 1000 in the above example ). However, we can explicity fix the maximum number of components we want to calculate by setting this argument. For example, we may be interested in only the first 50 principal components. Calculating fewer principal components is cheaper than calculating the theoretical max.

EigenFace : C++ and Python Code

In this section, we will examine the relevant parts of the code. The credit for the code goes to Subham Rajgaria . He wrote this code as part of his internship at our company Big Vision LLC.

Let’s go over the main function in both C++ and Python. Look for the explanation and expansion of the functions used after this code block.

#define NUM_EIGEN_FACES 10
#define MAX_SLIDER_VALUE 255

int main(int argc, char **argv)
{
  // Directory containing images
  string dirName = "images/";

  // Read images in the directory
  vector<Mat> images;
  readImages(dirName, images);

  // Size of images. All images should be the same size.
  Size sz = images[0].size(); 

  // Create data matrix for PCA.
  Mat data = createDataMatrix(images);

  // Calculate PCA of the data matrix
  cout << "Calculating PCA ...";
  PCA pca(data, Mat(), PCA::DATA_AS_ROW, NUM_EIGEN_FACES);
  cout << " DONE"<< endl;

  // Extract mean vector and reshape it to obtain average face
  averageFace = pca.mean.reshape(3,sz.height);

  // Find eigen vectors.
  Mat eigenVectors = pca.eigenvectors;

  // Reshape Eigenvectors to obtain EigenFaces
  for(int i = 0; i < NUM_EIGEN_FACES; i++)
  {
      Mat eigenFace = eigenVectors.row(i).reshape(3,sz.height);
      eigenFaces.push_back(eigenFace);
  }

  // Show mean face image at 2x the original size
  Mat output;
  resize(averageFace, output, Size(), 2, 2);

  namedWindow("Result", CV_WINDOW_AUTOSIZE);
  imshow("Result", output);

  // Create trackbars
  namedWindow("Trackbars", CV_WINDOW_AUTOSIZE);
  for(int i = 0; i < NUM_EIGEN_FACES; i++)
  {
    sliderValues[i] = MAX_SLIDER_VALUE/2;
    createTrackbar( "Weight" + to_string(i), "Trackbars", &sliderValues[i], MAX_SLIDER_VALUE, createNewFace);
  }

  // You can reset the sliders by clicking on the mean image.
  setMouseCallback("Result", resetSliderValues);

  cout << "Usage:" << endl
  << "\tChange the weights using the sliders" << endl
  << "\tClick on the result window to reset sliders" << endl
  << "\tHit ESC to terminate program." << endl;

  waitKey(0);
  destroyAllWindows();
}

Python

if __name__ == '__main__':

	# Number of EigenFaces
	NUM_EIGEN_FACES = 10

	# Maximum weight
	MAX_SLIDER_VALUE = 255

	# Directory containing images
	dirName = "images"

	# Read images
	images = readImages(dirName)

	# Size of images
	sz = images[0].shape

	# Create data matrix for PCA.
	data = createDataMatrix(images)

	# Compute the eigenvectors from the stack of images created
	print("Calculating PCA ", end="...")
	mean, eigenVectors = cv2.PCACompute(data, mean=None, maxComponents=NUM_EIGEN_FACES)
	print ("DONE")

	averageFace = mean.reshape(sz)

	eigenFaces = []; 

	for eigenVector in eigenVectors:
		eigenFace = eigenVector.reshape(sz)
		eigenFaces.append(eigenFace)

	# Create window for displaying Mean Face
	cv2.namedWindow("Result", cv2.WINDOW_AUTOSIZE)

	# Display result at 2x size
	output = cv2.resize(averageFace, (0,0), fx=2, fy=2)
	cv2.imshow("Result", output)

	# Create Window for trackbars
	cv2.namedWindow("Trackbars", cv2.WINDOW_AUTOSIZE)

	sliderValues = []

	# Create Trackbars
	for i in xrange(0, NUM_EIGEN_FACES):
		sliderValues.append(MAX_SLIDER_VALUE/2)
		cv2.createTrackbar( "Weight" + str(i), "Trackbars", MAX_SLIDER_VALUE/2, MAX_SLIDER_VALUE, createNewFace)

	# You can reset the sliders by clicking on the mean image.
	cv2.setMouseCallback("Result", resetSliderValues);

	print('''Usage:
	Change the weights using the sliders
	Click on the result window to reset sliders
	Hit ESC to terminate program.''')

	cv2.waitKey(0)
	cv2.destroyAllWindows()

The above code does the following.

  1. Set the number of Eigenfaces (NUM_EIGEN_FACES) to 10 and the max value of the sliders (MAX_SLIDER_VALUE) to 255. These numbers are not set in stone. Change these numbers to see how the application changes.
  2. Read Images : Next we read all images in the specified directory using the function readImages. The directory contains images that are aligned. The center of the left and the right eyes in all images are the same. We add these images to a list ( or vector ). We also flip the images vertically and add them to the list. Because the mirror image of a valid facial image, we just doubled the size of our dataset and made it symmetric at that same time.
  3. Assemble Data Matrix: Next, we use the function createDataMatrix to assemble the images into a data matrix. Each row of the data matrix is one image. Let’s look into the createDataMatrix function

C++

// Create data matrix from a vector of images
static  Mat createDataMatrix(const vector<Mat> &images)
{
  cout << "Creating data matrix from images ...";
  
  // Allocate space for all images in one data matrix.
  // The size of the data matrix is
  //
  // ( w  * h  * 3, numImages )
  //
  // where,
  //
  // w = width of an image in the dataset.
  // h = height of an image in the dataset.
  // 3 is for the 3 color channels.

  Mat data(static_cast<int>(images.size()), images[0].rows * images[0].cols * 3, CV_32F);

  // Turn an image into one row vector in the data matrix
  for(unsigned int i = 0; i < images.size(); i++)
  {
    // Extract image as one long vector of size w x h x 3
    Mat image = images[i].reshape(1,1);

    // Copy the long vector into one row of the destm
    image.copyTo(data.row(i));
  }

  cout << " DONE" << endl;
  return data;
}

Python

def createDataMatrix(images):
	print("Creating data matrix",end=" ... ")
	''' 
	Allocate space for all images in one data matrix.
	The size of the data matrix is
	( w  * h  * 3, numImages )
	where,
	w = width of an image in the dataset.
	h = height of an image in the dataset.
	3 is for the 3 color channels.
	'''
  
	numImages = len(images)
	sz = images[0].shape
	data = np.zeros((numImages, sz[0] * sz[1] * sz[2]), dtype=np.float32)
	for i in range(0, numImages):
		image = images[i].flatten()
		data[i,:] = image
	
	print("DONE")
	return data

4. Calculate PCA : Next we calculate the PCA using the PCA class in C++ (see lines 19-23 in the main function above) and the PCACompute function in Python (see line 23 in the main function above). As an output of PCA, we obtain the mean vector and the 10 Eigenvectors.
5. Reshape vectors to obtain Average Face and EigenFaces : The mean vector and every Eigenvector is vector of length w * h * 3, where w is the width, h is the height and 3 is the number of color channels of any image in the dataset. In other words, they are vectors of 30k elements. We reshape them to the original size of the image to obtain the average face and the EigenFaces. See line 24-35 in the C++ code and lines 26-32 in Python code.
6. Create new face based on slider values. A new face can be created by adding weighted EigenFaces to the average face using the function createNewFace. In OpenCV, slider values cannot be negative. So we calculate the weights by subtracting MAX_SLIDER_VALUE/2 from the current slider value so we can get both positive and negative values.

C++

void createNewFace(int ,void *)
{
  // Start with the mean image
  Mat output = averageFace.clone();

  // Add the eigen faces with the weights
  for(int i = 0; i < NUM_EIGEN_FACES; i++)
  {
    // OpenCV does not allow slider values to be negative.
    // So we use weight = sliderValue - MAX_SLIDER_VALUE / 2
    double weight = sliderValues[i] - MAX_SLIDER_VALUE/2;
    output = output + eigenFaces[i] * weight;
  }

  resize(output, output, Size(), 2, 2);

  imshow("Result", output);

}

Python

def createNewFace(*args):
	# Start with the mean image
	output = averageFace
	
	# Add the eigen faces with the weights
	for i in range(0, NUM_EIGEN_FACES):
		'''
		OpenCV does not allow slider values to be negative. 
		So we use weight = sliderValue - MAX_SLIDER_VALUE / 2
		''' 
		sliderValues[i] = cv2.getTrackbarPos("Weight" + str(i), "Trackbars");
		weight = sliderValues[i] - MAX_SLIDER_VALUE/2
		output = np.add(output, eigenFaces[i] * weight)

	# Display Result at 2x size
	output = cv2.resize(output, (0,0), fx=2, fy=2)
	cv2.imshow("Result", output)

Link: https://learnopencv.com/eigenface-using-opencv-c-python/

#opencv  #python  #computervision 

Eigenface using OpenCV (C++/Python)

Depth Estimation using Stereo Camera Using OpenCV & C++

Create a custom low-cost stereo camera and capture depth maps with it using OpenCV.

Directory Structure

All the code files and folders follow the following structure.

├── cpp
│   ├── disparity2depth_calib.cpp
│   ├── disparity_params_gui.cpp
│   ├── obstacle_avoidance.cpp
│   └── CMakeLists.txt
├── data
│   ├── depth_estimation_params.xml
│   ├── depth_estimation_params_cpp.xml
│   ├── depth_estmation_params_py.xml
│   ├── depth_params.xml
│   └── stereo_rectify_maps.xml
├── python
│   ├── disparity2depth_calib.py
│   ├── disparity_params_gui.py
│   ├── obstacle_avoidance.py
│   └── requirements.txt
└── README.md

Instructions

C++

To run the code in C++, please go into the cpp folder, then compile the disparity_params_gui.cpp, obstacle_avoidance.cpp and disparity2depth_calib.cpp code files, use the following:

mkdir build
cd build
cmake ..
cmake --build . --config Release

Use the following commands to execute the compiled files:

./build/disparity_params_gui
./build/disparity2depth_calib
./build/obstacle_avoidance

Python

To run the code in Python, please go into the python folder and refer to the following to use the disparity_params_gui.py, obstacle_avoidance.py and disparity2depth_calib.py files respectively:

python3 disparity_params_gui.py
python3 disparity2depth_calib.py
python3 obstacle_avoidance.py

Link: https://github.com/spmallick/learnopencv/tree/master/Depth-Perception-Using-Stereo-Camera

#opencv  #python  #computervision #cpluplus 

Depth Estimation using Stereo Camera Using OpenCV & C++

Deep Learning with OpenCV's DNN Module

Directory Structure

All the code files and folders follow the following structure.

├── cpp
│   ├── classify
│   │   ├── classify.cpp
│   │   └── CMakeLists.txt
│   └── detection
│       ├── detect_img
│       │   ├── CMakeLists.txt
│       │   └── detect_img.cpp
│       └── detect_vid
│           ├── CMakeLists.txt
│           └── detect_vid.cpp
├── input
│   ├── classification_classes_ILSVRC2012.txt
│   ├── DenseNet_121.caffemodel
│   ├── DenseNet_121.prototxt
│   ├── frozen_inference_graph.pb
│   ├── image_1.jpg
│   ├── image_2.jpg
│   ├── object_detection_classes_coco.txt
│   ├── ssd_mobilenet_v2_coco_2018_03_29.pbtxt.txt
│   └── video_1.mp4
├── outputs
│   ├── image_result.jpg
│   ├── result_image.jpg
│   └── video_result.mp4
├── python
│   ├── classification
│   │   ├── classify.py
│   │   └── README.md
│   ├── detection
│   │   ├── detect_img.py
│   │   └── detect_vid.py
│   └── requirements.txt
└── README.md

Instructions

Python

To run the code in Python, please go into the python folder and execute the Python scripts in each of the respective sub-folders.

C++

To run the code in C++, please go into the cpp folder, then go into each of the respective sub-folders and follow the steps below:

mkdir build
cd build
cmake ..
cmake --build . --config Release
cd ..
./build/classify
mkdir build
cd build
cmake ..
cmake --build . --config Release
cd ..
./build/detect_img
mkdir build
cd build
cmake ..
cmake --build . --config Release
cd ..
./build/detect_vid

Outputs

Image Classification

Object Detection

Link: https://github.com/spmallick/learnopencv/tree/master/Deep-Learning-with-OpenCV-DNN-Module

#opencv #deeplearning  #python  #computervision 

Deep Learning with OpenCV's DNN Module

Deep Convolutional GAN in TensorFlow and PyTorch

Package Dependencies

This repository trains the Deep Convolutional GAN in both Pytorch and Tensorflow on Anime-Faces dataset. It is tested with:

  • Cuda-11.1
  • Cudnn-8.0

The Pytorch and Tensorflow scripts require numpy, tensorflow, torch. To get the versions of these packages you need for the program, use pip: (Make sure pip is upgraded: python3 -m pip install -U pip)

pip3 install -r requirements.txt 

Directory Structure

├── PyTorch
│   ├── DCGAN_Anime_Pytorch.ipynb
│   └── dcgan_anime_pytorch.py
└── TensorFlow
    ├── DCGAN_Anime_Tensorflow.ipynb
    └── dcgan_anime_tesnorflow.py

Instructions

PyTorch

To train the Deep Convolutional GAN with Pytorch, please go into the Pytorch folder and execute the Jupyter Notebook.

TensorFlow

To train the Deep Convolutional GAN with TensorFlow, please go into the Tensorflow folder and execute the Jupyter Notebook.

Link: https://github.com/spmallick/learnopencv/tree/master/Deep-Convolutional-GAN

#opencv  #python  #computervision #tensorflow #pytorch 

Deep Convolutional GAN in TensorFlow and PyTorch

Image Classification with OpenCV and Java

Getting Started

Our code is tested using Python 3.7.5, but it should also work with any other python3.x. If you'd like to check your version run:

python3 -V

Virtual Environment

Let's create a new virtual environment. You'll need to install virtualenv package if you don't have it:

pip install virtualenv

Now we can create a new virtualenv variable and call it env:

python3 -m venv ~/env

The last thing we have to do is to activate it:

source  ~/env/bin/activate

To install the required python dependencies run:

pip3 install -r requirements.txt

OpenCV

In this blog post we are using OpenCV 4.3.0 unavailable via pip and OpenCV for Java. That is why, we first need to build the OpenCV library. To do so:

  1. Check the list of the below libraries. Install the missed dependencies:
sudo apt-get update
sudo apt-get install build-essential cmake unzip pkg-config
sudo apt-get install libjpeg-dev libpng-dev libtiff-dev
sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
sudo apt-get install libxvidcore-dev libx264-dev
sudo apt-get install libgtk-3-dev
sudo apt-get install libatlas-base-dev gfortran
sudo apt-get install python3-dev

For OpenCV Java installation we used default Java Runtime Environment and Java Development Kit:

sudo apt-get install default-jre
sudo apt-get install default-jdk
sudo apt-get install ant

2.   Download the latest OpenCV version from the official repository:

cd ~
wget -O opencv.zip https://github.com/opencv/opencv/archive/4.3.0.zip
wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/4.3.0.zip

3.   Unzip the downloaded archives:

unzip opencv.zip
unzip opencv_contrib.zip

4.   Rename the directories to match CMake paths:

mv opencv-4.3.0 opencv
mv opencv_contrib-4.3.0 opencv_contrib

5.   Compile OpenCV. Create and enter a build directory:

cd ~/opencv
mkdir build && cd build

6.   Run CMake to configure the OpenCV build. Don't forget to set the right pass to the PYTHON_EXECUTABLE:

cmake -D CMAKE_BUILD_TYPE=RELEASE \
 -D CMAKE_INSTALL_PREFIX=/usr/local \
 -D INSTALL_PYTHON_EXAMPLES=OFF \
 -D INSTALL_C_EXAMPLES=OFF \
 -D OPENCV_ENABLE_NONFREE=ON \
 -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \
 -D PYTHON_EXECUTABLE=~/env/bin/python3 \
 -D ANT_EXECUTABLE=/usr/bin/ant \
 -D BUILD_SHARED_LIBRARY=OFF \
 -D BUILD_TESTS=OFF \
 -D BUILD_PERF_TESTS=OFF \
 -D BUILD_EXAMPLES=ON ..

If you want to configure the build with some specific Java version, please, add the following fields, verifying the paths:

 -D JAVA_AWT_INCLUDE_PATH=/usr/lib/jvm/java-1.x.x-openjdk-amd64/include \
 -D JAVA_AWT_LIBRARY=/usr/lib/jvm/java-1.x.x-openjdk-amd64/lib/libawt.so \
 -D JAVA_INCLUDE_PATH=/usr/lib/jvm/java-1.x.x-openjdk-amd64/include \
 -D JAVA_INCLUDE_PATH2=/usr/lib/jvm/java-1.x.x-openjdk-amd64/include/linux \
 -D JAVA_JVM_LIBRARY=/usr/lib/jvm/java-1.x.x-openjdk-amd64/include/jni.h \

7.   Check the output and make sure that everything is set correctly. After that we're ready to build it with:

make -j8

Make sure, you didn't get any errors. In case of successful completion you will find the following files in the build directory:

  • bin/opencv-430.jar
  • lib/libopencv_java430.so
  • lib/python3/cv2.cpython-37m-x86_64-linux-gnu.so

Then run the following command:

sudo ldconfig

which creates the necessary links and cache to our freshly built shared library.

Put lib/python3/cv2.cpython-37m-x86_64-linux-gnu.so into the virtual environment installed packages:

cp lib/python3/cv2.cpython-36m-x86_64-linux-gnu.so ~/env/lib/python3.7/site-packages/cv2.so

The last step is to put ~/opencv/build/lib/libopencv_java430.so into the /usr/lib directory:

sudo cp lib/libopencv_java430.so /usr/lib

For Windows and macOS OpenCV Java build, please, follow the steps described in Introduction to Java Development or Installing OpenCV for Java.

Executing Model Conversion and Test Script

The proposed for the experiments Mobilenetv2ToOnnx.py script supports the --input_image key to customize the model conversion pipeline. It defines the full input image path, including its name - "coffee.jpg" by default.

To run MobileNetV2 conversion case, please, choose one of the described below scenarios:

  • for the custom input image and running evaluation of the converted model:
python3 Mobilenetv2ToOnnx.py --input_image "images/red-ceramic-mug.jpg"
  • for the default input image:
python3 Mobilenetv2ToOnnx.py

Executing DNN OpenCV Java

To compile DnnOpenCV.java run the following command setting the classpath key, which value is the full path to the opencv-430.jar:

javac -cp ":/home/$USER/opencv/build/bin/opencv-430.jar" DnnOpenCV.java

To run the code, please, execute the following line:

java -cp ":/home/$USER/opencv/build/bin/opencv-430.jar" DnnOpenCV

Link: https://github.com/spmallick/learnopencv/tree/master/DNN-OpenCV-Classification-with-Java

#opencv #java  #python  #computervision 

Image Classification with OpenCV and Java

Image Classification with OpenCV for Android

Getting Started

Our code is tested using Python 3.7.5, but it should also work with any other python3.x. If you'd like to check your version run:

python3 -V

Virtual Environment

Let's create a new virtual environment. You'll need to install virtualenv package if you don't have it:

pip install virtualenv

Now we can create a new virtualenv variable and call it env:

python3 -m venv ~/env

The last thing we have to do is to activate it:

source  ~/env/bin/activate

To install the required python dependencies run:

pip3 install -r requirements.txt

OpenCV

In this blog post we are using OpenCV 4.3.0 unavailable via pip. The first step is building the OpenCV library. To do so:

  1. Check the list of the below libraries. Install the missed dependencies:
sudo apt-get update
sudo apt-get install build-essential cmake unzip pkg-config
sudo apt-get install libjpeg-dev libpng-dev libtiff-dev
sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
sudo apt-get install libxvidcore-dev libx264-dev
sudo apt-get install libgtk-3-dev
sudo apt-get install libatlas-base-dev gfortran
sudo apt-get install python3-dev

For OpenCV Java installation we used default Java Runtime Environment and Java Development Kit:

sudo apt-get install default-jre
sudo apt-get install default-jdk
sudo apt-get install ant

2.   Download the latest OpenCV version from the official repository:

cd ~
wget -O opencv.zip https://github.com/opencv/opencv/archive/4.3.0.zip
wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/4.3.0.zip

3.   Unzip the downloaded archives:

unzip opencv.zip
unzip opencv_contrib.zip

4.   Rename the directories to match CMake paths:

mv opencv-4.3.0 opencv
mv opencv_contrib-4.3.0 opencv_contrib

5.   Compile OpenCV. Create and enter a build directory:

cd ~/opencv
mkdir build && cd build

6.   Run CMake to configure the OpenCV build. Don't forget to set the right pass to the PYTHON_EXECUTABLE:

cmake -D CMAKE_BUILD_TYPE=RELEASE \
 -D CMAKE_INSTALL_PREFIX=/usr/local \
 -D INSTALL_PYTHON_EXAMPLES=OFF \
 -D INSTALL_C_EXAMPLES=OFF \
 -D OPENCV_ENABLE_NONFREE=ON \
 -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \
 -D PYTHON_EXECUTABLE=~/env/bin/python3 \
 -D ANT_EXECUTABLE=/usr/bin/ant \
 -D BUILD_SHARED_LIBRARY=OFF \
 -D BUILD_TESTS=OFF \
 -D BUILD_PERF_TESTS=OFF \
 -D BUILD_EXAMPLES=ON ..

If you want to configure the build with some specific Java version, please, add the following fields, verifying the paths:

 -D JAVA_AWT_INCLUDE_PATH=/usr/lib/jvm/java-1.x.x-openjdk-amd64/include \
 -D JAVA_AWT_LIBRARY=/usr/lib/jvm/java-1.x.x-openjdk-amd64/lib/libawt.so \
 -D JAVA_INCLUDE_PATH=/usr/lib/jvm/java-1.x.x-openjdk-amd64/include \
 -D JAVA_INCLUDE_PATH2=/usr/lib/jvm/java-1.x.x-openjdk-amd64/include/linux \
 -D JAVA_JVM_LIBRARY=/usr/lib/jvm/java-1.x.x-openjdk-amd64/include/jni.h \

7.   Check the output and make sure that everything is set correctly. After that we're ready to build it with:

make -j8

Make sure, you didn't get any errors. In case of successful completion you will find the following files in the build directory lib/python3/cv2.cpython-37m-x86_64-linux-gnu.so.

Then run the following command:

sudo ldconfig

which creates the necessary links and cache to our freshly built shared library.

The last step is to move lib/python3/cv2.cpython-37m-x86_64-linux-gnu.so into the virtual environment installed packages:

cp lib/python3/cv2.cpython-36m-x86_64-linux-gnu.so ~/env/lib/python3.7/site-packages/cv2.so

OpenCV Android

For Android application development we will need OpenCV for Android:

wget https://github.com/opencv/opencv/releases/download/4.3.0/opencv-4.3.0-android-sdk.zip -O opencv-4.3.0-android-sdk.zip
unzip opencv-4.3.0-android-sdk.zip
rm opencv-4.3.0-android-sdk.zip

Executing Model Conversion and Test Script

The proposed for the experiments MobileNetV2ToOnnx.py script supports the --input_image key to customize the model conversion pipeline. It defines the full input image path, including its name - "test_img_cup.jpg" by default.

To run MobileNetV2 conversion case, please, choose one of the described below scenarios:

  • for the custom input image and running evaluation of the converted model:
python3 MobileNetV2Conversion.py --input_image <image_name>
  • for the default input image:
python3 MobileNetV2Conversion.py

Link: https://github.com/spmallick/learnopencv/tree/master/DNN-OpenCV-Classification-Android

#opencv  #python  #computervision #android 

Image Classification with OpenCV for Android

Contour Detection With Computer vision using OpenCV

Directory Structure

All the code files and folders follow the following structure.

├── CPP
│   ├── channel_experiments
│   │   ├── channel_experiments.cpp
│   │   └── CMakeLists.txt
│   ├── contour_approximations
│   │   ├── CMakeLists.txt
│   │   └── contour_approx.cpp
│   └── contour_extraction
│       ├── CMakeLists.txt
│       └── contour_extraction.cpp
├── input
│   ├── custom_colors.jpg
│   ├── image_1.jpg
│   └── image_2.jpg
├── python
│   ├── channel_experiments
│   │   └── channel_experiments.py
│   ├── contour_approximations
│   │   └── contour_approx.py
│   └── contour_extraction
│       └── contour_extraction.py
└── README.md

Instructions

Python

To run the code in Python, please go into the python folder and execute the Python scripts in each of the respective sub-folders.

C++

To run the code in C++, please go into the cpp folder, then go into each of the respective sub-folders and follow the steps below:

mkdir build
cd build
cmake ..
cmake --build . --config Release
cd ..
./build/channel_experiments
mkdir build
cd build
cmake ..
cmake --build . --config Release
cd ..
./build/contour_approximations
mkdir build
cd build
cmake ..
cmake --build . --config Release
cd ..
./build/contour_extraction

Link: https://github.com/spmallick/learnopencv/tree/master/Contour-Detection-using-OpenCV

#opencv  #python  #computervision 

Contour Detection With Computer vision using OpenCV

Conditional GAN in TensorFlow and PyTorch

Package Dependencies

This repository trains the Conditional GAN in both Pytorch and Tensorflow on the Fashion MNIST and Rock-Paper-Scissors dataset. It is tested with:

  • Cuda-11.1
  • Cudnn-8.0

The Pytorch and Tensorflow scripts require numpy, tensorflow, torch. To get the versions of these packages you need for the program, use pip: (Make sure pip is upgraded: python3 -m pip install -U pip)

pip3 install -r requirements.txt 

Directory Structure

├── PyTorch
│   ├── CGAN-PyTorch.ipynb
│   └── cgan_pytorch.py
└── TensorFlow
    ├── CGAN-FashionMnist-TensorFlow.ipynb
    ├── cgan_fashionmnist_tensorflow.py
    ├── CGAN-RockPaperScissor-TensorFlow.ipynb
    └── cgan_rockpaperscissor_tensorflow.py

Instructions

PyTorch

To train the Conditional GAN with Pytorch, please go into the Pytorch folder and execute the Jupyter Notebook.

TensorFlow

To train the Conditional GAN with TensorFlow, please go into the Tensorflow folder and execute the Jupyter Notebook.

Link:: https://github.com/spmallick/learnopencv/tree/master/Conditional-GAN-PyTorch-TensorFlow

#opencv  #python  #computervision 

Conditional GAN in TensorFlow and PyTorch

T-Rex Game Bot using Feature Matching in OpenCV & Python

A bot to automate Chrome Dino game using Feature Matching, AutoPyGUI and MSS.

Install required packages

pip install -r requirements.txt

Execution Guideline

  • Position chrome window to the right half of the screen.
  • Open terminal/powershell on the left half as shown below.
  • Navigate to the working directory and run the script.

T-Rex Bot Demo

The obstacle detection area should look something like the image shown below. If not, play with the box height percentage and check if display auto scaling is ON. Further instruction has been provided within the code.


 

Detection_area

Link: https://github.com/spmallick/learnopencv/tree/master/Chrome-Dino-Bot-using-OpenCV-feature-matching

#opencv  #python  #computervision 

T-Rex Game Bot using Feature Matching in OpenCV & Python

Character Classification (of Synthetic Dataset) using Keras & Opencv

Character Classification (of Synthetic Dataset) using Keras (modified LeNet)

Step 1:

Download backgrounds and put the light and dark backgrounds separately. We'll be using them for creating synthetic dataset. We have uploaded sample backgrounds in light_backgrounds and dark_backgrounds for reference.

Step 2:

Download fonts from here. These fonts will be used for randomly selected font-type while creating synthetic dataset.

Step 3:

Create synthetic data using ImageMagick. We have given an intuition behind creating synthetic data, in our blog. This can be done with following command:

python3 generate-images.py

The script first generates two directories light_background_crops and dark_background_crops containing 32x32 backgrounds crops. It then adds text and other artifacts like blur/noise/distortion to the backgrounds. To regenerate all data, delete light_background_crops and dark_background_crops. To generate training images, open the script and set OUTPUT_DIR = 'train/' and NUM_IMAGES_PER_CLASS = 800. Similarly, to generate test images, set OUTPUT_DIR = 'test/' and NUM_IMAGES_PER_CLASS = 200.

Step 4:

Training the model on the given dataset. A modified LeNet structure has been used to train our model, using Keras. This can be done with following command:

python3 train_model.py

Step 5:

In order to predict the digit or character in an image, execute the following command. Give the test image path as the argument.

python3 make_predictions.py <image_path>

Link: https://github.com/spmallick/learnopencv/tree/master/CharClassification#character-classification-of-synthetic-dataset-using-keras-modified-lenet

#opencv  #python  #computervision #keras 

Character Classification (of Synthetic Dataset) using Keras & Opencv