You’ve probably encountered presentation-style videos that combine slides, figures, tables, and oral explanations. These videos have become a widely used means of disseminating information, especially after the COVID-19 pandemic, when containment measures were implemented. Although videos are an engaging way to access content, they have significant drawbacks, including that they are time-consuming, requiring you to watch the entire video to find specific information, and that they take up considerable storage space due to their large file size.
Researchers led by Professor Hyuk-Yoon Kwon of Seoul National University of Science and Technology in South Korea sought to address these issues with PV2DOC, a software tool that converts presentation videos into summary documents. Unlike other video summaries, which require transcription alongside the video and become ineffective when only video is available, PV2DOC overcomes this limitation by combining visual and audio data and converting video into documents.
This article was posted online on October 11, 2024 and was published in volume 28 of the journal SoftwareX December 1, 2024.
“For users who need to watch and study numerous videos, such as lectures or conference presentations, PV2DOC generates summary reports that can be read in two minutes. Additionally, PV2DOC manages figures and tables separately, connecting them to summary content so users can refer to them when needed. Professor Kwon explains.
For image processing, PV2DOC extracts frames from the video at one-second intervals. It uses a method called structural similarity index, which compares each image with the previous one to identify unique images. Objects in each image, such as figures, tables, graphs and equations, are then detected by the object detection models, Mask R-CNN and YOLOv5. During this process, some images may become fragmented due to spaces or subfigures. To solve this problem, PV2DOC uses a figure fusion technique that identifies overlapping areas and combines them into a single figure. Then, the system applies optical character recognition (OCR) using the Google Tesseract engine to extract text from images. The extracted text is then organized into a structured format, such as headings and paragraphs.
Simultaneously, PV2DOC extracts audio from video and uses the Whisper model, an open source text-to-speech (STT) tool, to convert it into written text. The transcribed text is then summarized using the TextRank algorithm, creating a summary of the main points. The extracted images and text are combined into a Markdown document, which can be transformed into a PDF file. The final document presents the content of the video (such as text, figures, and formulas) in a clear and organized manner, following the structure of the original video.
By converting unorganized video data into structured, searchable documents, PV2DOC improves video accessibility and reduces the storage space needed to share and store video. “This software simplifies data storage and facilitates data analysis for presentation videos by transforming unstructured data into a structured format, thus offering significant potential from the point of view of information accessibility and data management. It provides a basis for more effective use of presentation videos. Professor Kwon said.
The researchers plan to further streamline video content into accessible formats. Their next goal is to train a large language model (LLM), similar to ChatGPT, to offer a question-and-answer service, where users can ask questions based on the content of the videos, with the model generating precise and contextual answers. relevant.
***
Reference
DOÏ: 10.1016/j.softx.2024.101922
About the Seoul National University of Science and Technology (SEOULTECH) Institute
Seoul National University of Science and Technology, commonly known as “SEOULTECH”, is a national university located in Nowon-gu, Seoul, South Korea. Founded in April 1910, around the time of the establishment of the Republic of Korea, SeoulTech has grown into a large comprehensive university with a campus of 504,922 m².2.
It includes 10 undergraduate schools, 35 departments, 6 graduate schools and has approximately 14,595 students.
Website: https://en.seoultech.ac.kr/
About the Associate Professor Hyuk Yoon Kwon
Professor Kwon is currently an Associate Professor at ITM Division, Department of Industrial Engineering/Graduate School of Data Science, Seoul National University of Science and Technology, Seoul, South Korea. He leads the AI laboratory driven by Big Data (https://bigdata.seoultech.ac.kr). Before that, he worked at the Ministry of National Defense as a researcher from 2014 to 2018 and at KAIST as a postdoctoral researcher from 2013 to 2014. He received a Ph.D. degree in computer science from KAIST in 2013. He worked as a visiting scholar at the Georgia Institute of Technology from 2024 to 2025 and at Microsoft Research in Asia as a research intern from 2011 to 2012. His research interests include AI/ML data-driven. ,Big Data Management, Distributed and Cloud Computing, Federated and Distributed Learning, Databases, Data-Centric Cybersecurity, and Fair Data Retrieval and Analysis. He has presented and published at leading conferences and journals in the areas of databases, big data, artificial intelligence and data mining, including ACM SIGMOD, NeurIPS, AAAI, IEEE ICDM, IEEE BigData, IEEE TKDE and IEEE TII (https://scholar.google.co.kr/citations?user=INJzI3IAAAAJ).
Research method
Data analysis/statistics
Research subject
Not applicable
Article title
PV2DOC: Conversion of the presentation video into a summary document
Article publication date
December 1, 2024
Conflict of interest declaration
Hyuk-Yoon Kwon reports that financial support was provided by the Seoul National University of Science and Technology. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.
Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of press releases published on EurekAlert! by contributing institutions or for the use of any information via the EurekAlert system.