View My GitHub Profile

Vivekraj V. K.

I am working as an Assistant Professor at Indian Institute of Information Technology Dharwad. My PhD topic is “Autonomous Video Skimming: Taxonomy, Frameworks, and Quantitative Evaluation”, pursued under the supervision of Prof. R. Balasubramanian (IIT Roorkee) and Prof. Debashis Sen (IIT Kharagpur). ‘User-video summarization /skimming’ deals with generating a shorter video as a summary for any given user video such that the comprehension time is significantly reduced. User videos are unstructured videos that capture interesting events in day-to-day activities. The research culminated in two major contributions: Component Ranking based importance computation and new assessment of video summaries by taking into consideration the differences among various ground truth summaries available for evaluation.

My research interests include video analysis and video processing applications. I studied B.E (CSE) and M.Tech (Computer Engineering) from Visvesvaraiah Technological University, Belgaum (Karnataka) in the year 2007 and 2012 respectively. Please visit my LinkedIn Profile for further details.

My DBLP page, Google Scholar Profile and Scopus Author Profile.

Please find below the list of publications along with abstract.


  1. Vivekraj V. K., Debashis Sen, and Balasubramanian Raman. 2020. Vector Ordering and Regression Learning Based Ranking for Dynamic Summarization of User Videos. IET Image Processing 14, no. 15 (2020): 3941-3956. [Q2 Journal, 2020 impact factor 2.004] Dynamic video summarization (video skimming) is a process of generating a shorter video (video skim) as a summary of a given video, which helps in its easier and quicker comprehension. In this paper, an efficient dynamic summarization approach for user videos is proposed using vector ordering for ranking video units (frames /shots). User videos are casually shot unscripted videos, where skimming involves the selection of its interesting part(s) ignoring many uninteresting ones. The concept of R-ordering of vectors is employed to find a representative frame, which is used to perform relative ranking of the video frames. It is theoretically shown that significance is given to each element of a frame's feature vector while computing the importance scores that lead to the frame ranks used for skimming. Further, allocation of different weights to the features involved is also achieved using linear and Gaussian process regressions. Through extensive experiments considering several standard datasets with human labeled ground truth, the proposed approach is demonstrated to be efficient and to perform better than the relevant state-of-the-art.

  2. Vivekraj V. K., Debashis Sen, and Balasubramanian Raman. 2019. Video Skimming: Taxonomy and Comprehensive Survey. ACM Computing Surveys 52, 5, Article 106 (September 2019), 38 pages. [Q1 Journal, 2019 impact factor 7.99]. Alternate Link Video skimming, also known as dynamic video summarization, generates a temporally abridged version of a given video. Skimming can be achieved by identifying significant components either in uni-modal or multi-modal features extracted from the video. Being dynamic in nature, video skimming, through temporal connectivity, allows better understanding of the video from its summary. Having this obvious advantage, recently, video skimming has drawn the focus of many researchers benefiting from the easy availability of the required computing resources. In this article, we provide a comprehensive survey on video skimming focusing on the substantial amount of literature from the past decade. We present a taxonomy of video skimming approaches and discuss their evolution highlighting key advances. We also provide a study on the components required for the evaluation of a video skimming performance.

  3. Vivekraj V. K., Debashis Sen, and R. Balasubramanian. “Vector ordering based multimodal video skimming for user videos.” In IEEE Region 10 Conference, TENCON 2017, IEEE, 2017, pp. 775-780. Video skimming is the generation of a shorter video as a summary for any given video, containing a subset of its segments that are sufficient to convey its purpose. User videos, which are often almost structureless, do not have any predefined script or events to help in summarization. Use of multiple modalities with a proper fusion strategy would be beneficial for skimming of such videos. In this paper, first, r(educed)-ordering based importance ranking of video segments is performed on audio and visual channels independently. A round robin based fusion scheme is proposed for combining importance ranks generated considering multiple modalities, and applied on the importance ranks from audio and visual channels. The fused rank is then used to generate the video summary. Experimental results show that the proposed fusion scheme outperforms relevant low level fusion and single modality cases, when r-ordering-based and other schemes are used for importance determination in each modality.

  4. Vivekraj V. K, R. Balasubramanian, and Debashis Sen, “Vector r-ordering based selection of segments for video skimming,” in 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2016, pp. 871–876. Video skimming is a process of generating a shorter yet fully comprehensible version of a given video as its dynamic summary. A generic skimming system involves division of the video into segments and selecting the segments based on their suitability. The suitability is often obtained considering various features of the video and combining their individual contributions. Suggesting that the combination causes loss of information, we propose collective representation of the individual contributions in the form of a vector and use vector reduced (R)-ordering to judge the suitability. R-ordering based tree-structured organization and similarity levels of the video segments are employed to determine the suitability. Comparing with user generated summaries, we show that a video summary generated by a general skimming approach using R-ordering will be more effective in covering the important parts of a given video than when a feature combination is used.