Video processing is one of the most challenging area in Artificial Intelligence world. Tons of surveillance cameras and their video captures instantly processing by control centers and so many companies obtain valuable info from these videos. I am very interested in in this topic and I follow different news and feeds in this area for being informed and learning developments. Also I try to write some code as a hobby in my spare time.
For last couple weeks, I tried to create a hybrid video segmentation algorithm that only uses optical flows and DBSCAN algorithm together to find moving objects in a stationary video capture. Algorithm basically finds dense optical flows in between video frames first, then it creates cumulative optical flow vectors from these, and last it feeds DBSCAN algorithm with these vectors. As result DBSCAN algorithm gives clusters of Optical Flow vectors. In a basic form it look like this:
Basic Introduction about Optical Flow
As long as there is an observer, this observer watches observee objects and see their motions in 3-dimensional world. When we want to take photo shots, the created images will be in 2-dimensional plane. So, when we have a sequential images, then we can find some motion flow. The projected images does not contain 3-d information, they only have 2-d projection info. In this case, motion flows may not seen same as in 2-d projected results. These 2-d projected motions called as optical flows. The term optical flow can be explained also: distribution of the apparent velocities of objects in an image.
There are some important things considered by optical flows and motion flows such as displacement, direction, velocity (direction and speed), acceleration, time and speed.
In a timely fashion, we can process sequence of images and find displacements, directions, and speed relatively from observer’s perspective.
Optical flow algorithms based on two main concepts: Sparse and Dense computations. In sparse computation, first you find some points to track and then follow these points in image frames for calculating optical flow vectors. This saves you from tracking points that has no interesting features (interesting features depends on the segmentation process, but we can basically say corner points, high\low intensity points, etc.) and as a result processing gets faster than dense computations. At the other hand, dense computation takes all pixels into account in an image sequence. So, it finds velocity vectors for each pixel. This is good, because you are tracking whole motions. Some of well known algorithms are Horn-Schunck, Lucas-Kanade, and Farneback.
Pyramids also important approach in both sparse and dense algorithms. By creating pyramids on images, displacements computed on each layer and iterates flow results to each other, all results combines and produce better approximation of optical flows. In my implementation, I use Farneback dense algorithm and we control pyramids via “levels” parameter.
While I talk about optical flows I cannot stop adding this quote:
Even for a fixed image, there may be more than one “best” segmentation because the criteria defining the quality of a segmentation are application dependent. (Pierre Soille)
For sure, implementation have some other parameters regarding to algorithms I used. For instance optical flows have some assumptions like:
- Optical flows are temporal consistent
- Optical flows of an object are not affected by intensity changes in consecutive frames
First rule says, a flow (moving object) slowly converges to another position in time. So, a scene change or a background change between frames makes flow vectors completely wrong. Second assumption is also saying if there is an object, and if it moves, then it’s intensity act like an aggregate. Thus, optical flows can be found more accurate compared with object’s other flow vectors. As a result, we can conclude that an object is moving from a place toward another place in direction of flow vectors. But this also have some problems, such as daylight changes and sunlight variety on an object’s different parts. This can be also called as illumination difference.
Another problem about processing video based on optical flows is the actual video is not well stationed. In this case a very small shake creates so much verbose optical flows. Camera stabilization also another problem in Computer Vision world, and one of the solutions are optical flow tracking. And also if video captures has not enough (this is very relative) resolution, then you may suffer from lack of quality.
For better clustering with DBSCAN, I wanted to add some of optical flow information into algorithm. Because, I wanted these two algorithms should talk in some way to create a hybrid algorithm. Getting some results from one and passing these results into another is not a good solution. It must be also passing parameters to next one.
In computer vision world, problems do not end, only evolves to harder ones. This is what I see.
In the light of these problems, I have decided to create some global parameters. This is also not good, and I must admit, these are some hacks to make your implementation works. Also when we look into context, we need to keep parameters localize as much as possible. Global parameters affect whole, even you train them, they will still be problematic for some other part of the data.
These parameters are:
- global vector threshold
- orientation angle
- window size
Global vector threshold is for removing optical flow vectors after computing dense optical flow Farneback algorithm. If the camera shots are not stationary, as I said, you need some threshold value to clean verbosity on flows.
By using orientation angle, we can inject one more parameter into DBSCAN metric function. By this way, I did not only checked the epsilon parameter, also I checked orientation angle differences between clustered points. For more info about DBSCAN algorithm you can check my other post from here.
And with window size we can create small windows on top of dense velocity vectors that taken from Farneback algorithm, then take these windows corner points as a reference for that velocity vectors in that window. For example if we use 10 x 10 pixels window, and our velocity image is 300 x 300 pixels, then we have (300/10) x (300/10) = 30 x 30 = 900 velocity vector fields. This sounds a bit wrong, taking dense flows and making them sparse again, but the thing is dense flows are already filtered with Gaussian filter (algorithm paper page 6, experimental results), so spread of vectors will stay close as in dense flows after windowing process.
Other than these parameters, of course there are some parameters related with Optical Flow and DBSCAN algorithms. I already talked about details of DBSCAN in my other post. Farneback parameters have enough information in OpenCV documentations. Even though, I talk about them here, you need to know some about algorithm and it’s implementation. Here is the OpenCV documentation and code implementation.
And lastly we need some frame interval value to create larger optical flows and process faster that long sequences of images. Basically I took frame interval 3 or 5, it definitely depends on video.
Conclusion and Source Code
In many computer vision fields you may not find any ground truth. Ground truth is essential for artificially intelligent systems. Because it trains with truth data, and then produces information based on it. But in video segmentation and computer vision, sometimes we cannot create a truth data. Because it always depends on user requirements and specifications. I tried to explain some basic concepts about optical flows, and my hybrid algorithm. Without a doubt, this is experimental project and it may not fit for different video types (like different weather conditions, too fast objects, night time, etc.). For running program, you need .NET framework and EmguCV port library of OpenCV. Instead of sharing only algorithm and implementation, I created a form interface to play with parameters and see effects dynamically on video. But It’s buttons may create exception, I didn’t check everything about form interface features. I also shared project in GitHub. You can see details and share your thoughts with me.
GitHub Project link: https://github.com/yusufuzun/of-dbscan-object-detection