1Video Input with OpenCV and similarity measurement {#tutorial_video_input_psnr_ssim} 2================================================== 3 4Goal 5---- 6 7Today it is common to have a digital video recording system at your disposal. Therefore, you will 8eventually come to the situation that you no longer process a batch of images, but video streams. 9These may be of two kinds: real-time image feed (in the case of a webcam) or prerecorded and hard 10disk drive stored files. Luckily OpenCV threats these two in the same manner, with the same C++ 11class. So here's what you'll learn in this tutorial: 12 13- How to open and read video streams 14- Two ways for checking image similarity: PSNR and SSIM 15 16The source code 17--------------- 18 19As a test case where to show off these using OpenCV I've created a small program that reads in two 20video files and performs a similarity check between them. This is something you could use to check 21just how well a new video compressing algorithms works. Let there be a reference (original) video 22like [this small Megamind clip 23](https://github.com/Itseez/opencv/tree/master/samples/cpp/tutorial_code/HighGUI/video-input-psnr-ssim/video/Megamind.avi) and [a compressed 24version of it ](https://github.com/Itseez/opencv/tree/master/samples/cpp/tutorial_code/HighGUI/video-input-psnr-ssim/video/Megamind_bugy.avi). 25You may also find the source code and these video file in the 26`samples/cpp/tutorial_code/HighGUI/video-input-psnr-ssim/` folder of the OpenCV source library. 27 28@include cpp/tutorial_code/HighGUI/video-input-psnr-ssim/video-input-psnr-ssim.cpp 29 30How to read a video stream (online-camera or offline-file)? 31----------------------------------------------------------- 32 33Essentially, all the functionalities required for video manipulation is integrated in the @ref cv::VideoCapture 34C++ class. This on itself builds on the FFmpeg open source library. This is a basic 35dependency of OpenCV so you shouldn't need to worry about this. A video is composed of a succession 36of images, we refer to these in the literature as frames. In case of a video file there is a *frame 37rate* specifying just how long is between two frames. While for the video cameras usually there is a 38limit of just how many frames they can digitalize per second, this property is less important as at 39any time the camera sees the current snapshot of the world. 40 41The first task you need to do is to assign to a @ref cv::VideoCapture class its source. You can do 42this either via the @ref cv::VideoCapture::VideoCapture or its @ref cv::VideoCapture::open function. If this argument is an 43integer then you will bind the class to a camera, a device. The number passed here is the ID of the 44device, assigned by the operating system. If you have a single camera attached to your system its ID 45will probably be zero and further ones increasing from there. If the parameter passed to these is a 46string it will refer to a video file, and the string points to the location and name of the file. 47For example, to the upper source code a valid command line is: 48@code{.bash} 49video/Megamind.avi video/Megamind_bug.avi 35 10 50@endcode 51We do a similarity check. This requires a reference and a test case video file. The first two 52arguments refer to this. Here we use a relative address. This means that the application will look 53into its current working directory and open the video folder and try to find inside this the 54*Megamind.avi* and the *Megamind_bug.avi*. 55@code{.cpp} 56const string sourceReference = argv[1],sourceCompareWith = argv[2]; 57 58VideoCapture captRefrnc(sourceReference); 59// or 60VideoCapture captUndTst; 61captUndTst.open(sourceCompareWith); 62@endcode 63To check if the binding of the class to a video source was successful or not use the @ref cv::VideoCapture::isOpened 64function: 65@code{.cpp} 66if ( !captRefrnc.isOpened()) 67 { 68 cout << "Could not open reference " << sourceReference << endl; 69 return -1; 70 } 71@endcode 72Closing the video is automatic when the objects destructor is called. However, if you want to close 73it before this you need to call its @ref cv::VideoCapture::release function. The frames of the video are just 74simple images. Therefore, we just need to extract them from the @ref cv::VideoCapture object and put 75them inside a *Mat* one. The video streams are sequential. You may get the frames one after another 76by the @ref cv::VideoCapture::read or the overloaded \>\> operator: 77@code{.cpp} 78Mat frameReference, frameUnderTest; 79captRefrnc >> frameReference; 80captUndTst.open(frameUnderTest); 81@endcode 82The upper read operations will leave empty the *Mat* objects if no frame could be acquired (either 83cause the video stream was closed or you got to the end of the video file). We can check this with a 84simple if: 85@code{.cpp} 86if( frameReference.empty() || frameUnderTest.empty()) 87{ 88 // exit the program 89} 90@endcode 91A read method is made of a frame grab and a decoding applied on that. You may call explicitly these 92two by using the @ref cv::VideoCapture::grab and then the @ref cv::VideoCapture::retrieve functions. 93 94Videos have many-many information attached to them besides the content of the frames. These are 95usually numbers, however in some case it may be short character sequences (4 bytes or less). Due to 96this to acquire these information there is a general function named @ref cv::VideoCapture::get that returns double 97values containing these properties. Use bitwise operations to decode the characters from a double 98type and conversions where valid values are only integers. Its single argument is the ID of the 99queried property. For example, here we get the size of the frames in the reference and test case 100video file; plus the number of frames inside the reference. 101@code{.cpp} 102Size refS = Size((int) captRefrnc.get(CAP_PROP_FRAME_WIDTH), 103 (int) captRefrnc.get(CAP_PROP_FRAME_HEIGHT)), 104 105cout << "Reference frame resolution: Width=" << refS.width << " Height=" << refS.height 106 << " of nr#: " << captRefrnc.get(CAP_PROP_FRAME_COUNT) << endl; 107@endcode 108When you are working with videos you may often want to control these values yourself. To do this 109there is a @ref cv::VideoCapture::set function. Its first argument remains the name of the property you want to 110change and there is a second of double type containing the value to be set. It will return true if 111it succeeds and false otherwise. Good examples for this is seeking in a video file to a given time 112or frame: 113@code{.cpp} 114captRefrnc.set(CAP_PROP_POS_MSEC, 1.2); // go to the 1.2 second in the video 115captRefrnc.set(CAP_PROP_POS_FRAMES, 10); // go to the 10th frame of the video 116// now a read operation would read the frame at the set position 117@endcode 118For properties you can read and change look into the documentation of the @ref cv::VideoCapture::get and 119@ref cv::VideoCapture::set functions. 120 121Image similarity - PSNR and SSIM 122-------------------------------- 123 124We want to check just how imperceptible our video converting operation went, therefore we need a 125system to check frame by frame the similarity or differences. The most common algorithm used for 126this is the PSNR (aka **Peak signal-to-noise ratio**). The simplest definition of this starts out 127from the *mean squad error*. Let there be two images: I1 and I2; with a two dimensional size i and 128j, composed of c number of channels. 129 130\f[MSE = \frac{1}{c*i*j} \sum{(I_1-I_2)^2}\f] 131 132Then the PSNR is expressed as: 133 134\f[PSNR = 10 \cdot \log_{10} \left( \frac{MAX_I^2}{MSE} \right)\f] 135 136Here the \f$MAX_I^2\f$ is the maximum valid value for a pixel. In case of the simple single byte image 137per pixel per channel this is 255. When two images are the same the MSE will give zero, resulting in 138an invalid divide by zero operation in the PSNR formula. In this case the PSNR is undefined and as 139we'll need to handle this case separately. The transition to a logarithmic scale is made because the 140pixel values have a very wide dynamic range. All this translated to OpenCV and a C++ function looks 141like: 142@code{.cpp} 143double getPSNR(const Mat& I1, const Mat& I2) 144{ 145 Mat s1; 146 absdiff(I1, I2, s1); // |I1 - I2| 147 s1.convertTo(s1, CV_32F); // cannot make a square on 8 bits 148 s1 = s1.mul(s1); // |I1 - I2|^2 149 150 Scalar s = sum(s1); // sum elements per channel 151 152 double sse = s.val[0] + s.val[1] + s.val[2]; // sum channels 153 154 if( sse <= 1e-10) // for small values return zero 155 return 0; 156 else 157 { 158 double mse =sse /(double)(I1.channels() * I1.total()); 159 double psnr = 10.0*log10((255*255)/mse); 160 return psnr; 161 } 162} 163@endcode 164Typically result values are anywhere between 30 and 50 for video compression, where higher is 165better. If the images significantly differ you'll get much lower ones like 15 and so. This 166similarity check is easy and fast to calculate, however in practice it may turn out somewhat 167inconsistent with human eye perception. The **structural similarity** algorithm aims to correct 168this. 169 170Describing the methods goes well beyond the purpose of this tutorial. For that I invite you to read 171the article introducing it. Nevertheless, you can get a good image of it by looking at the OpenCV 172implementation below. 173 174@sa 175 SSIM is described more in-depth in the: "Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. 176 Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE 177 Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004." article. 178 179@code{.cpp} 180Scalar getMSSIM( const Mat& i1, const Mat& i2) 181{ 182 const double C1 = 6.5025, C2 = 58.5225; 183 /***************************** INITS **********************************/ 184 int d = CV_32F; 185 186 Mat I1, I2; 187 i1.convertTo(I1, d); // cannot calculate on one byte large values 188 i2.convertTo(I2, d); 189 190 Mat I2_2 = I2.mul(I2); // I2^2 191 Mat I1_2 = I1.mul(I1); // I1^2 192 Mat I1_I2 = I1.mul(I2); // I1 * I2 193 194 /***********************PRELIMINARY COMPUTING ******************************/ 195 196 Mat mu1, mu2; // 197 GaussianBlur(I1, mu1, Size(11, 11), 1.5); 198 GaussianBlur(I2, mu2, Size(11, 11), 1.5); 199 200 Mat mu1_2 = mu1.mul(mu1); 201 Mat mu2_2 = mu2.mul(mu2); 202 Mat mu1_mu2 = mu1.mul(mu2); 203 204 Mat sigma1_2, sigma2_2, sigma12; 205 206 GaussianBlur(I1_2, sigma1_2, Size(11, 11), 1.5); 207 sigma1_2 -= mu1_2; 208 209 GaussianBlur(I2_2, sigma2_2, Size(11, 11), 1.5); 210 sigma2_2 -= mu2_2; 211 212 GaussianBlur(I1_I2, sigma12, Size(11, 11), 1.5); 213 sigma12 -= mu1_mu2; 214 215 ///////////////////////////////// FORMULA //////////////////////////////// 216 Mat t1, t2, t3; 217 218 t1 = 2 * mu1_mu2 + C1; 219 t2 = 2 * sigma12 + C2; 220 t3 = t1.mul(t2); // t3 = ((2*mu1_mu2 + C1).*(2*sigma12 + C2)) 221 222 t1 = mu1_2 + mu2_2 + C1; 223 t2 = sigma1_2 + sigma2_2 + C2; 224 t1 = t1.mul(t2); // t1 =((mu1_2 + mu2_2 + C1).*(sigma1_2 + sigma2_2 + C2)) 225 226 Mat ssim_map; 227 divide(t3, t1, ssim_map); // ssim_map = t3./t1; 228 229 Scalar mssim = mean( ssim_map ); // mssim = average of ssim map 230 return mssim; 231} 232@endcode 233This will return a similarity index for each channel of the image. This value is between zero and 234one, where one corresponds to perfect fit. Unfortunately, the many Gaussian blurring is quite 235costly, so while the PSNR may work in a real time like environment (24 frame per second) this will 236take significantly more than to accomplish similar performance results. 237 238Therefore, the source code presented at the start of the tutorial will perform the PSNR measurement 239for each frame, and the SSIM only for the frames where the PSNR falls below an input value. For 240visualization purpose we show both images in an OpenCV window and print the PSNR and MSSIM values to 241the console. Expect to see something like: 242 243 244 245You may observe a runtime instance of this on the [YouTube here](https://www.youtube.com/watch?v=iOcNljutOgg). 246 247\htmlonly 248<div align="center"> 249<iframe title="Video Input with OpenCV (Plus PSNR and MSSIM)" width="560" height="349" src="http://www.youtube.com/embed/iOcNljutOgg?rel=0&loop=1" frameborder="0" allowfullscreen align="middle"></iframe> 250</div> 251\endhtmlonly 252