IETE Journal of Research
Home | About us | Search | Current Issue | Past Issues | Guidelines | Subscribe | ContactLogin 
IETE Journal of Research
  Users Online: 141 Print this page  Email this page Small font size Default font size Increase font size


 
 Table of Contents    
ARTICLE
Year : 2011  |  Volume : 57  |  Issue : 5  |  Page : 461-466  

Performance Evaluation of Shot Boundary Detection Metrics in the Presence of Object and Camera Motion


1 Department of Electrical Engineering, Research Scholar, SPANN Lab, IIT Bombay Indian Institute of Technology, Mumbai, Maharashtra, India
2 Department of Electrical Engineering, IIT, Hyderabad, Andhra Pradesh, India

Date of Web Publication24-Nov-2011

Correspondence Address:
Krishna K Warhade
Department of Electrical Engineering, Research Scholar, SPANN Lab, IIT Bombay Indian Institute of Technology, Mumbai, Maharashtra
India
Login to access the Email id

DOI: 10.4103/0377-2063.90172

Get Permissions

   Abstract 

Partitioning a video into shots is an important step for video indexing. We have presented the performance of various traditional metrics that are generally used to detect shot boundaries. In this paper, we evaluated shot boundary detection metrics, such as likelihood ratio and color ratio histogram in Red Green Blue (RGB) and Hue Saturation, Value (HSV) color space for three different action and thriller movies. These movies consist of large number of frames with object and camera motion. The pixel difference and Chi-square shot boundary detection metrics in Luma and Chrominance Components (YUV) color space has been tested for five different movies. The results were evaluated in terms of Recall, Precision, and F1 measure for all these movies. It has been observed that these results are affected by the disturbance due to the motion in the consecutive frames. The false positives and miss detection of shot boundaries in all the tested metrics are due to fast camera and object motion. An algorithm has been proposed for shot boundary detection by using dual tree complex wavelet transform in the presence of motion. Performance comparison of the proposed algorithm with the traditional metrics validates its effectiveness in terms of improved Recall, Precision, and F1 score.

Keywords: Dual tree complex wavelet transform, Shot boundary detection, Traditional metrics, Recall, Precision


How to cite this article:
Warhade KK, Merchant SN, Desai U B. Performance Evaluation of Shot Boundary Detection Metrics in the Presence of Object and Camera Motion. IETE J Res 2011;57:461-6

How to cite this URL:
Warhade KK, Merchant SN, Desai U B. Performance Evaluation of Shot Boundary Detection Metrics in the Presence of Object and Camera Motion. IETE J Res [serial online] 2011 [cited 2013 Jun 19];57:461-6. Available from: http://www.jr.ietejournals.org/text.asp?2011/57/5/461/90172


   1. Introduction and Related Work Top


Recent advances in video compression standards, broadcast networks, and high-speed network connections and cable modem have enabled a large volume of video to be available online. Production of digital videos has become available to the masses with introduction of high performance, low-cost digital capturing and recording devices. Movie and TV broadcast is also moving into the digital era. The area of content-based video retrieval aims to automate the indexing, retrieval, and management of this video data. For efficient video storage and management, video segmentation must be performed prior to all other processes. Video segmentation is a technique that divides video into physical units, generally called shots. A shot is a video segment that consists of one continuous action. These shot boundaries can be categorized into two types, abrupt transition and gradual transition (GT). The GT can be further classified into dissolve, wipe, fade in, and fade out.

The existing literature on comparison of video shot boundary detection methods are discussed below. Pair-wise comparison, likelihood ratio, and histogram comparison have been used as a different metric for shot boundary detection by Zhang et al. [1] . Object motion and camera motion have been observed as the major source of false positives. Boreczky and Rowe [2] have presented a comparison of several shot boundary detection and classification techniques and their variations including histograms, edge tracking, discrete cosine transform, motion vector, and block matching methods. Lienhart [3] has used color histogram differences, standard deviation of pixel intensities, and edge-based contrast as a metric to find shot boundaries and tested results on a diverse set of video sequences. Hanjalic [4] identified and analyze the major issues related to shot boundary detection in detail. Gargi et al. [5] have evaluated and characterized the performance of a number of shot detection methods using color histograms, Moving Picture Experts Group (MPEG) compression parameter information, and image block-motion matching. Ford et al. [6] have reported results on various histogram test statistics, statistic-based metrics, pixel differences, MPEG metrics, and an edge-based metrics. Yuan et al. [7] have presented a comprehensive review of the existing approaches and identified the major challenges to the shot boundary detection. They found that the elimination of disturbances caused by large object and camera movement is the major challenge to the current shot boundary detection techniques. Sethi and Patel [8] have also tested statistical test for scene change detection.

Though it has been reported that object motion and camera motion has been the major source of false positives, the test video data used for comparing the different metrics do not have sufficient number of frames with fast camera and object motion. So, we have evaluated the performance of major metrics for shot boundary detection, specifically in the presence of camera and object motion. For comparing different metrics, we have considered the video clips as the test video sequence, where fast camera and object motion was observed in addition to shot boundaries.

The paper is structured as follows: In Section 2, the major metrics used for comparison of experimental results have been discussed. The test video sequence and evaluation criterion have been described in Section 3. Evaluation results of the traditional shot boundary detection metrics in Red Green Blue (RGB), Hue Saturation, Value (HSV), and Luma & Chrominance Components (YUV) color space have been presented in Section 4. The proposed algorithm and its performance comparison with the traditional metrics are discussed in Section 5. Finally, we conclude this paper and discuss the future work in Section 6.


   2. Major Metric Used for Shot Boundary Detection Top


The mathematical symbols employed to describe these metrics are summarized as follows: Let f i and f i+1 are the consecutive frames, μi and μi+1 are the mean intensity value of these frames, σi and σi+1 are the standard deviations of intensity value of frames fi and f i+1, respectively, N is the total number of frames in a one video clip and 1 ≤ i ≤ N -1, P × Q is the size of the image, where 1 ≤ x ≤ P, and 1 ≤ y ≤ Q.

2.1 Pixel Differences

The simplest approach to detect, if the two images are significantly different, is to count the number of pixels that have changed. A shot boundary is declared if more than a given percentage of the total numbers of pixels have changed. The pixel differences (denoted as PD) metric is defined as



This technique is very sensitive to camera and object motion. Zhang et al. [1] have suggested that the effect of motion can be reduced by using 3×3 averaging filter before pixel-wise comparison.

2.2 Likelihood Ratio

Jain et al. [9] computed a likelihood ratio test based on the assumption of uniform second order statistics. It is a standard hypothesis test in which a ratio of probabilities is used as the test statistic. This is a statistical method which expands on the idea of pixel differences by breaking the images into regions and comparing statistical measures of the pixels in those regions. It is defined as



Where, LHR is a likelihood ratio between two consecutive regions; where



This method is reasonably tolerant of noise, but is relatively slow due to the complexity of the statistical formulas. Limitation of the likelihood ratio is that if the two images to be compared have the same mean and variance, but completely different probability density functions, no change will be detected.

2.3 Histogram Difference

Histograms are the most common method used to detect shot boundaries. Histogram difference is defined by



Where, Hi[j] and Hi+1[j] denote the histogram value for the ith frame and (i + 1)th frame, respectively, and j is one of the G possible gray level. The histogram comparison algorithm is less sensitive to object motion than pixel differences. There may be certain cases in which two images have similar histogram but completely different content, but such cases are rare in practice.

2.4 Chi-square Test

Nagasaka and Tanka [10] experimented with histogram and pixel difference metrics, and concluded that histogram metrics are most effective. They found the best results by breaking the images into 16 regions, using a Chi-square test on color histogram of those regions. Chi-square test (denoted as CS) is defined as



Where, Hi[j] denotes the histogram value for the ith frame and j is one of the G possible gray levels.

2.5 Color Histogram

Color histogram comparison is calculated by histogram comparison of each color space of adjacent two frames and is defined as



Where, H r i [j], H g i [j], and H b i[j] denote the histogram value for the ith frame in R, G, and B color space, respectively.


   3. Test Video Sequence and Evaluation Criterion Used For Metric Evaluation Top


3.1 Test Video Sequence

The proposed algorithm has been tested on movies X-Men (XM), Home Alone (HA), Mission Impossible 3 (MI), Jumper (JMP), Wednesday (WED), Pale rider (PR), Bee movie (BEE), and Good bad and ugly (GBU). These movies are manually observed frame by frame to find actual shot boundaries. These movies are considered for obtaining test data since large number of frames are observed with object motion and camera motion. We mostly considered the video clips where fast camera and object motion is observed in addition to shot boundaries. Number of frames considered for test video sequence in each movie is shown in [Table 1].
Table 1: Number of frames considered for analysis from each test video

Click here to view


3.2 Evaluation Criterion

Traditionally, Recall and Precision are the two metrics used for evaluation of shot detection algorithms. Recall is defined as



Whereas, Precision is defined as



Where, D is the total number of shot boundaries in the test video sequence, C is the number of shot boundaries correctly detected by the algorithm, M is the number of shot boundaries missed by the algorithm, and FP is the false positives detected by the algorithm. Also, to rank the performance of different algorithms, F1 measure [7] have been used, i.e., harmonic average of Recall and Precision and is defined as




   4. Evaluation Results of The Traditional Shot Boundary Detection Metrics Top


The performance of metrics such as pixel difference, histogram difference, likelihood ratio, color histogram, and Chi-square test in RGB, HSV, and YUV color space has been compared on the same data sequence. The performance comparison between likelihood ratio and color histogram in RGB color space are shown in [Table 2]. It has been observed that color histogram provided better result than likelihood ratio in terms of F1 measure. False positives and miss detection in both the algorithm was due to the fast camera and object motion.
Table 2: Performance comparison between likelihood ratio and color histogram in RGB color space

Click here to view


The performance comparison between likelihood ratio and histogram difference in HSV color space (using only Hue component) are shown in [Table 3].
Table 3: Performance comparison between likelihood ratio and histogram difference in HSV color space (using only Hue component)

Click here to view


The performance of likelihood ratio is slightly better in HSV color space when compared with RGB color space, whereas the performance of color histogram in RGB color space is better than gray scale histogram in HSV color space. The performance of these metrics was found poor in both the color space for movie "Mission Impossible" due to large number of frames with fast camera and object motion.

We also tested the performance of Chi-square test and pixel difference method in YUV color space (using only Y component) for various movies and the results are shown in [Table 4]. The results of Chi-square were found to be better than pixel difference method in terms of F1 measure.
Table 4: Performance comparison between Chi-square and pixel differences in YUV color space (using only Y component)

Click here to view


Overall, it has been observed that all these metrics did not perform well due to the disturbances caused by fast camera and object motion. The maximum false positives and missed detections were due to frame difference between consecutive frames caused by fast camera motion.


   5. Proposed Algorithm Top


We proposed an algorithm for shot boundary detection by using dual tree complex wavelet transform (DT-CWT) and spatial domain structural similarity algorithm. Discrete wavelet transform has poor directional selectivity and also lacks shift invariance. The DT-CWT have been developed by Kingsbury [11],[12] , which allows perfect reconstruction in addition to shift invariance and directional selectivity. We explored DT-CWT to find shot boundaries in video. The detail explanation for 2D dual-tree complex wavelet transform has been given in [13] . The structural features of an image using DT-CWT for consecutive frames in the presence of motion has been obtained. The fundamental principle of the human visual system is highly adapted to extract structural information from the visual scene. The structural informations are those attributes that represent the structure of an object in the scene. We found the similarity or dissimilarity between consecutive frames by using spatial domain structural similarity algorithm [14],[15] . The procedure to find shot boundaries is described below.

We have considered 119 frames from the movie X-Men for demonstration of results, as shown in [Figure 1]. Here, actual shot boundaries are at frame number 13, 25, 51, 73, 93, and 105. In this clip, fast camera and object motion is observed in almost all the frames with sufficient number of shot boundaries. This clip has been used to show the robustness of our proposed algorithm in the presence of motion.
Figure 1: Video clip from the movie X-Men.

Click here to view


Each frame was converted from RGB to HSV color space, and only Hue (H) component was used to obtain gray scale frame for further analysis. Every frame was decomposed into 12 band pass-oriented sub-bands using 2D DT-CWT up to first level of decomposition. These 12 sub-bands gave information strongly oriented at {+15°, +45°, +75°, -15°, -45°, -75°} directions for six real and six imaginary sub-bands. Then, the magnitude of corresponding real and imaginary coefficients of each sub-band was obtained. These six magnitude sub-bands of an image were combined to form the structure feature of an image. Natural image signals are highly structured and their pixels exhibit strong dependencies, especially when they are spatially proximate. These dependencies carry important information about the structure of the objects in the visual scene. The spatial domain structural similarity (SSIM) algorithm has been proposed by Wang et al. [14] . The structural information in an image represents the structure of the objects in the scene which is independent of average luminance and contrast.

Hence, we propose and explore the possibility of SSIM as a shot boundary detection metric. The SSIM index between consecutive frames are obtained by



for 1 ≤ i ≤ N-1, where, μi and μi+1 are the mean of the structure feature of a current frame and next consecutive frame, respectively, σi and σi+1 are the standard deviation of the structure feature of a current frame and next consecutive frame, respectively, C1 and C2 are small constants to avoid instability. The value of σi,i+1 is obtained by



[Figure 2] (a) shows the SSIM index obtained after applying DT-CWT and SSIM algorithm on the frames from the movie X-Men, shown in [Figure 1]. The maximum SSIM index value 1 is achieved when the frames are identical, while the lower values indicate dissimilarity. Then, post-processing, local and adaptive threshold on the SSIM index has been applied to declare correct shot boundaries. The detail explanation about the post-processing method and thresholds used can be found in [16] . The results obtained after applying post-processing and thresholds (denoted as PPSSIM) on the SSIM index shown in [Figure 2] (b) indicate that all the shot boundaries are correctly detected, though fast camera and object motion present in the consecutive frames. The proposed algorithm has been tested on two movies, X-Men (XM) and Home Alone (HA). The performance comparison of the proposed algorithm with likelihood ratio and histogram difference in HSV color space (using only Hue component) has been shown in [Table 5]. It has been observed that the proposed algorithm perform better than likelihood ratio and histogram difference metrics in terms of Recall, Precision, and F1 measure.
Table 5: Performance comparison between proposed algorithm, likelihood ratio, and histogram difference in HSV color space (using only Hue component)

Click here to view
Figure 2: (a) Shot boundaries obtained using SSIM algorithm; (b) Results obtained after applying post-processing and threshold on SSIM index.

Click here to view



   6. Conclusion and Future Work Top


Disturbances caused by fast object and camera motion are often mistaken as shot boundaries and its elimination is the major challenge to the shot boundary detection algorithms. We evaluated the performance of major traditional shot boundary algorithms in the presence of motion for various color space. From the experimental results, it has been found that the color histogram metric performed better than the likelihood ratio in RGB color space, whereas likelihood ratio performed better than histogram difference in HSV color space. However, in YUV color space, the Chi-square method performed better than pixel difference metric. The performance of all the metric is poor due to the disturbances caused by fast camera and object motion.

Hence, an algorithm has been proposed for shot boundary detection in the presence of motion using DT-CWT followed by application of spatial domain structure similarity algorithm and thresholds. We used video clips where large number of frames with fast camera and object motion is observed, in addition to shot boundaries to test the robustness of the proposed algorithm. The performance of the proposed algorithm has been tested and compared with likelihood ratio and histogram difference in HSV color space. It has been observed that the proposed algorithm performed better than these traditional metric in terms of improved Recall, Precision, and F1 measure.

The possibility of using the DT-CWT to eliminate the disturbances due to illumination and fast camera motion in YUV color space can be explore further in the future. Another important area of further research is to differentiate between GT and motion, as the persistent slow motion may result in temporal patterns over continuity signal curve similar to those of GT.

 
   References Top

1.H J Zhang, A Kankanhalli, and S Smoliar, "Automatic partitioning of full-motion video", Multimedia Systems, Vol. 1, No. 1, pp. 10-28, Jun. 1993.   Back to cited text no. 1
    
2.J S Boreezky, and L A Rowe, "Comparison of video shot boundary detection techniques", Proc. SPIE Storage Retrieval Image Video Databases, Vol. 2664, No. 4, pp. 170-9, Jan. 1996.  Back to cited text no. 2
    
3.R Lienhart, "Comparison of automatic shot boundary detection algorithms", Proc. SPIE Image and Video Process., Vol. 3656, No. 7, pp. 25-30, Jan. 1999.   Back to cited text no. 3
    
4.A Hanjalic, "Shot boundary detection: Unraveled and resolved", IEEE Transaction on Circuits and Systems for Video Technology, Vol. 12, No. 2, pp. 90-105, Feb. 2002.   Back to cited text no. 4
    
5.U Gargi, R Kasturi, and S Strayer, "Performance characterization of video-shot-change detection methods", IEEE Transactions on Circuits and Systems for Video Technology, Vol. 10, No. 1, pp. 1-13, Feb. 2000.  Back to cited text no. 5
    
6.R Ford, C Roboson, D Temple, and M Gerlach, "Metrics for shot boundary detection in digital video sequences", Multimedia System, Vol. 8, pp. 37-46, 2000.  Back to cited text no. 6
    
7.J Yuan, H Wang, L Xiao, W Zheng, J Li, F Lin, et al., "A Formal Study of Shot Boundary Detection", IEEE Transactions on Circuits and Systems for Video Technology, Vol. 17, No. 2, pp. 168-86, Feb. 2007.  Back to cited text no. 7
    
8.I K Sethi, and N Patel, "A statistical approach to scene change detection", SPIE Proc. on Storage and Retrieval for Image and Video Databases III, Vol. 2420, pp. 329-38, Feb. 1995.   Back to cited text no. 8
    
9.R Jain, R Kasturi, and B Schunck, "Machine Vision", New York: McGraw-Hill; pp. 406-15, 1995.  Back to cited text no. 9
    
10.A Nagasaka, and Y Tanka, "Automatic video indexing and full video search for object appearance", Visual Database Systems II, E Knuth, and L Wegner Editors., Elsevier Science Publishers, pp. 113-27, 1992.   Back to cited text no. 10
    
11.N G Kingsbury, "The dual tree complex wavelet transform: A new technique for shift invariance and directional filters", In Proc. 8th IEEE DSP workshop, Utah, Aug. pp. 9-12, 1998.   Back to cited text no. 11
    
12.N G Kingsbury, "Image processing with complex wavelet", Phil. Trans. Royal Soceity London A, Vol. 357, pp. 2543-60, Sept. 1999.  Back to cited text no. 12
    
13.I W Selenick, R G Baraniuk, and N G Kingsbury, "The dual tree complex wavelet transform: A coherent framework for multiscale signal and image processing", IEEE Signal Processing Magazine, pp. 123-51, Nov. 2005.  Back to cited text no. 13
    
14.Z Wang, A C Bovik, H R Sheikh, and E P Simoncelli, "Image quality assessment: from error visibilty to structural similarity", IEEE Transactions On Image Processing, Vol. 13, No. 4, pp. 600-12, Apr. 2004.  Back to cited text no. 14
    
15.Z Wang, and E P Simoncelli, "Translation insensitive image similarity in complex wavelet domain", Proc. IEEE Inter. Conf. Acoustic, Speech and Signal Processing, Vol. 2, pp. 573-6, Mar. 2005.  Back to cited text no. 15
    
16.K K Warhade, S N Merchant, and U B Desai, "Shot boundary detection in the presence of fire flicker and explosion using stationary wavelet transform", Signal Image and Video Processing Journal, Available from: http://www.springerlink.com/content/15t147p5l2462707/ [Last cited on 07 Aug 2010].Red Green Blue (RGB) and Hue Saturation, Value (HSV)  Back to cited text no. 16
    

 
   Authors Top


Krishna K. Warhade received the Bachelor of Engineering in Electronics in 1995 and Master of Engineering in Instrumentation in 1999 both from Shri Guru Gobind Singhaji Institute of Engineering and Technology, Nanded, and Ph. D. in November 2010 from the Department of Electrical Engineering, Indian Institute of Technology Bombay, India. He has 16 years of experience in teaching and research. He is currently working as a Professor in the Department of Electronics Engineering, Lokmanya Tilak College of Engineering, Navi Mumbai, India. His research interests are in the area of signal processing, image processing, video segmentation, video retrieval and wavelets.

Shabbier N. Merchant is a Professor in Department of Electrical Engineering, Indian Institute of Technology, Bombay. He has received his B. Tech, M. Tech, and PhD degrees all from Department of Electrical Engineering, Indian Institute of Technology, Bombay, India. He is a Fellow of IETE (The Institution of Electronic & Telecommunication Engineers). He is a recipient of 10 th IETE Prof. S.V.C. Aiya Memorial Award for his contribution in the field of detection and tracking. He is also a recipient of 9 th IETE S.V.C. Aiya Memorial Award for Excellence in Telecom Education. He has more than 25 years experience in teaching and research. His noteworthy contributions have been in solving state of the art signal and image processing problems faced by Indian defense.

U. B. Desai received his PhD degree in Electrical Engineering from the Johns Hopkins University, Baltimore, USA, in 1979. From 1979 to 1984, he was as Assistant Professor in the Electrical Engineering Department at Washington State University, Pullman, WA, USA, and Associate Professor at the same place from 1984 to 1987. From 1987 onwards, he was Professor in the Electrical Engineering Department at the Indian Institute of Technology, Bombay. He is a Fellow of Indian National Science Academy (INSA) and Indian National Academy of Engineering (INAE). From July 2002 to June 2004, he was the Director of HP-IITM R and D Lab, at IIT-Madras. Since 2009, Prof. Desai has taken charge as the first Director of Indian Institute of Technology, Hyderabad, India.


    Figures

  [Figure 1], [Figure 2]
 
 
    Tables

  [Table 1], [Table 2], [Table 3], [Table 4], [Table 5]



 

Top
 
  Search
 
  
    Access Statistics
    Email Alert *
    Add to My List *
* Registration required (free)  

 
  In this article
    Abstract
    1. Introduction ...
    2. Major Metric ...
    3. Test Video Se...
    4. Evaluation Re...
    5. Proposed Algo...
    6. Conclusion an...
    References
    Authors
    Article Figures
    Article Tables

 Article Access Statistics
    Viewed873    
    Printed81    
    Emailed0    
    PDF Downloaded136    
    Comments [Add]    

Recommend this journal