|Year : 2012 | Volume
| Issue : 2 | Page : 171-175
Enhanced Motion Estimation using Kalman Filter
Manish Bajaj, Brejesh Lall
Department of Electrical Engineering, Indian Institute of Technology, Delhi, India
|Date of Web Publication||16-May-2012|
Department of Electrical Engineering, Indian Institute of Technology, Delhi
| Abstract|| |
This document proposes a new algorithm for Motion Estimation using the Kalman Filter. The algorithm predicts the motion vectors for the next frame by using the motion vectors of the current frame. The new algorithm is compared with the already existing search algorithms such as Exhaustive search, Three-Step Search, and Four-Step Search and show that the prediction using the Kalman Filter leads to reduction in the computational complexity.
Keywords: Block-matching algorithm, Full search or exhaustive search, Four-step search, Kalman filter, Three-step search, Video coding, Video compression
|How to cite this article:|
Bajaj M, Lall B. Enhanced Motion Estimation using Kalman Filter. IETE J Res 2012;58:171-5
| 1. Introduction|| |
The limited channel bandwidth and requirements of real-time video playback make video coding indispensable for modern day applications. Real world videos typically have high correlation in consecutive frames. So, for compressing a video, we can utilize this property and remove the redundant data. At the same time, motion is integral to any video. This gives us the motivation for Motion Estimation, i.e., to identify the temporally correlated data in the consecutive frames and eliminate it. The basic point is to encode a frame relative to previous frames so that frame can be easily reconstructed using the encoded data. The common and effective method to reduce the temporal redundancy is block-matching motion estimation (BMME), which has been widely adopted in various video coding standards such as H.261, H.263, MPEG-1, MPEG-2, and MPEG-4 and in motion-compensated video coding techniques. There has been a lot of research in this field. The aim of the research is to find an algorithm to maximize compression and minimize the computation complexity. Full Search or Exhaustive Search (ES) searches for the best match for a macroblock over the search window to give global optimal solution, i.e., minimum error to the motion estimation. But this algorithm demands high computational load. For dealing this problem, many fast block matching algorithms have been developed such as three-step search (3SS), four-step search (4SS), diamond search, etc. These algorithms reduce the computations drastically for finding the optimum motion vector ,, .
In this paper, we propose a simple, efficient, computationally less intensive BMME algorithm by using Kalman filter to predict the motion vectors corresponding to the current frame. The proposed algorithm is called as Kalman Filter Search (KFS).
The paper is organized as follows: first of all, the brief introduction of Kalman Filter is given. In section 3, the proposed algorithm using the Kalman Filter is explained. Section 4 shows the results of KFS. Finally, the conclusions are explained in Section 5.
| 2. Kalman Filter|| |
The Kalman filter .. is a mathematical method named after Rudolf E. Kalman. The method aims at producing values closer to the true values of measurements and their associated calculated values by using the noisy observed values. The filter predicts the values by estimating the uncertainty of the predicted value and computing weighted average of the predicted value and measured value.
The Kalman Filter moves toward certainty by weighted averaging of the system's state with a new measurement. The purpose of the weights is that values with smaller uncertainty are given more weightage. These weights are calculated from the covariance, a measure of the estimated uncertainty of the prediction of the system's state. The result is obviously having less uncertainty. This process is repeated every time step, with the new estimate and its covariance are used in predicting the following iteration. The Kalman filter consists of two phases, predict and update phase. The predict phase predicts the state vector by using the characteristics of system and the previous behavior of the system. Update phase updates the state vector and covariance of the state vector estimate. The basic equations of the Kalman Filter are as follows:
The "state transition matrix" A k is premultiplied with the state vector x0ˆk-1 in each time step. There is optionally (if nonzero) an input vector u k which affects the state linearly, and this linear effect on the state is represented by premultiplying by the "input matrix" B k, w k is the gaussian process noise. The observation vector zˆk is a linear function of the state vector, and this linear relationship is represented by premultiplication by "observation matrix" H k. There is also gaussian measurement noise v k. Where, w k ~ N(0,Q) Meaning w k is Gaussian noise with covariance.
v k ~ N(0,Q) Meaning v k is Gaussian noise with covariance R. Other equations of the two phases will be shown in the algorithm itself.
| 3. Kalman Filter Search Algorithm|| |
The proposed algorithm is also based on BMME. KFS algorithm uses the Kalman filter to predict the motion vectors of the current frame using the motion vectors of the previous frame. The Kalman filter needs the initial input matrices of the system to correctly predict the system's next state. So, the KFS algorithm is initiated using the motion vectors of first three frames. The prediction starts from the fourth frame. In this frame, we are not using any other search algorithm such as 3SS; we do not have the exact motion vector for this frame. For improving the prediction, we need to introduce a P frame at certain intervals to get the actual measured motion vectors. This introduction of the P frame increases the number of computations but it also increases the accuracy as we now get the motion vectors by actually finding the best match rather than prediction. This results in an increase in Kalman filter gain. While starting with the video or with the first frame, the Block Size is kept four and the search area is eight blocks around it as in [Figure 1].
The first frame of the video is intra-coded. From the second frame and third frame, we use the 3SS to find the best match in the previous frame. The motion vectors for each macroblock block in the frame are stored in a matrix. Now, we have the motion vectors and by using them, we can easily calculate the affine transform for each block. The affine transform which is a 3 × 3 matrix is also stored for each macro block. So, the Kalman filter equations are as shown in (1) and (2).
Here, xˆk is state vector estimate. In the input struct, this is the "a priori" state estimate (prior to the addition of the information from the new observation). In the output struct, this is the "a posteriori" state estimate (after the new measurement information is included). Here, this xˆk represents the parameters to affine transform, as with each frame this is going to be predicted and updated. We have not taken position instead of affine transform as 'xˆk' because in consecutive frames, the behavior of motion of a particular block will change less as compared with position. The way it is going to move, added with the previous position, will give us the new position.
Prediction for state vector and covariance:
Z/k= calculated affine transform after using the 3SS to get the best measurement
uk = input control is zero
A k = State transition matrix is defaulted to identity
B k = input matrix is zero
P k = covariance of the state vector estimate. In the input struct, this is "a priori," and in the output it is "a posteriori." It is initialized to identity matrix
Q = Process noise covariance is some factor (value around ~ 0.2) of identity matrix
R = Measurement noise covariance is some factor (value around ~ 0.2) of identity matrix
H k = Observation Matrix is Identity matrix.
Here, we are using Kalman filter by updating it at certain intervals using the new observational values, by introducing a P frame, and refining its prediction. It predicts affine transforms, i.e., the next motion trajectory for the block.
Now, with the second and third frames, the minimum difference block or the closest match block is found in the search area. The previously stored affine transform is updated for that block using the Kalman filter's update equation. In further frames, the minimum difference block in the search area is not needed to be computed as those can be predicted by Kalman filter. For a particular frame, we start from the upper left corner (block size is four). The last updated affine transform corresponds to the motion of the block in the next frame, so we take the inverse of the affine transform for each block to get the coordinates of the new position of the same block. Obviously, the coordinates of the predicted macroblock will not make a perfect rectangle, so the new coordinates are to be quantized to the nearest macroblock depending upon the extent of the algorithm whether it is till pixel, half-pixel, or quarter-pixel. This is done for all the macroblocks to get a new predicted frame, hence reducing the number of computations.
The KFS algorithm is summarized as follows:
- The first frame is intra-coded, and 3SS is used in second frame to compute the motion vectors of the macroblocks. The affine transform of the macroblocks is also found out and stored in matrix. Each element of the matrix corresponds to 3×3 matrix
- These affine transforms are initial values of the xˆk. The Kalman filter is initialized with Process covariance as some factor (less value around ~ 0.2) of the identity matrix (3×3). Measurement noise is also kept the same (can be different). Their values can be altered to see the effect on results
- The 3SS is used to get motion vectors for next frame also and then find their affine transforms. The Kalman Filter update equations are used to update its xˆk and P k
- Now, the Kalman Filter's prediction equations are used to get the affine transforms for the next frame and then using them to get the motion vectors of the same. For the update equations, as we will take measurements after 2-3 frames (using 3SS), we can choose zˆk as xˆk/k-1 . To increase the accuracy of Kalman Filter by introducing the zˆk, go to Step 3; otherwise, recursively repeat this step.
As we know there is a tradeoff between the number of computations and the quality of compression, so here we want to reduce the number of computations significantly by compromising little bit on quality. To check the quality of reconstruction for inter frame coding, Peak Signal to Noise ratio (PSNR) is found out between the original frame and the inter-coded frame, so better the SNR better the coding, and lesser the residue.
The results also have the comparison between the PSNR and the computations of the other algorithms such as ES, 3SS, and 4SS.
| 4. Results|| |
The results are shown using the tables comparing PSNR and computations needed, with respect to each frame for each one of the algorithm used. [Figure 2],[Figure 3],[Figure 4] and [Figure 5] show the comparison between the number of computations and PSNR for each algorithm. Here, PSNR corresponds to quality of compression, i.e., greater the PSNR, better the compression.
The standard video sequence is used: "Football" (This sequence is a part of sports telecast. The sequence has a high contrast level, strong motion, and rich colors.)
| 5. Conclusions|| |
As seen in the results, ES is the computationally most expensive search. But, it gives the best possible PSNR one can have in the comparison of the two frames.
[Table 1] compares the results of various algorithms and proves how the Kalman Filter Search reduces the computations while not reducing PSNR much. 3SS and 4SS are better than ES as the number of computations is about one-tenth, but PSNR is only two to five percent less than that of ES. Kalman filter Computations are very less than ES and one-fourth of 3SS and one-fifth of the 4SS and PSNR is reduced to approximately 87% of 3SS and 4SS and 73.8% of the ES. It can be seen that in 3.9 computations, we have reached to 15.7 PSNR. The future work can be to devise an algorithm which can be used with KFS algorithm to increase PSNR by 3 and computations not more than 12, as then it will be very useful. There is a significant decrease in the number of computations but relatively a small drop in PSNR.
| References|| |
|1.||EIain, and G Richardson, "H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia", Hoboken, New Jersey:John Wiley & Sons, Ltd,2003. |
|2.||RC Gonzalez, and RE Woods, "Digital Image Processing", 2 nd ed., Upper Saddle River, New Jersey: Prentice Hall;2002. |
|3.||K Jack, "Video Demystified", 5th ed., Netherlands: Elsevier Inc;2007. |
|4.||W Greg, and B Gary, "An Introduction to the Kalman Filter", Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 2006. p. 27599-3175. |
|5.||B Gary, and K Adrian, "Learning Open CV," 1005 Gravenstein Highway North, Sebastopol, Sebastopol, California: O'Reilly Media, Inc;2008. |
|6.||The WIKIPEDIA website. [Online]. Available from: http://www.wikipedia.org/[Last cited on 2011 Sep 28]. |
| Authors|| |
Manish Bajaj is Senior Undergraduate Student in Electrical Engineering Department, Indian Institute of Technology Delhi. He will be receiving his B. Tech in August, 2012. His research interests include Image and Video Processing, and Wireless communication.
Brejesh Lall received B.E. degree in electronics and communication engineering from Delhi College of Engineering, Delhi, India in 1991, M.E. in electronics and communication engineering also from Delhi College of Engineering, Delhi, India in 1992 and PhD degree from Indian Institute of Technology Delhi, India in 1999. He joined Hughes Software Systems, Gurgaon, India in 1997 and worked there in the Signal Processing group till 2005. In 2005 he joined Indian Institute of Technology Delhi as an Assistant Professor in the Electrical Engineering Department and is currently working there as an Associate Professor. His research interests include Multirate Signal Processing, Image Processing and Wireless communication.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5]