This week is mainly focused on preparing for the final project presentation. After coming back from the pottery experience, our theme had a lot of catching up to do. Before we had our recess week, we were still at the stage of confirming our idea and this week we had to get the demo done as well as the final report. Following which, we had intense discussions on the details of our algorithm as well as how best to suit it for the use of Alibaba and how best to fulfil the requirements of the consumer. Me and Xu Liang were mainly in charge of finalizing our ideation as well as coming up with a demonstration of how our product will look like as well as giving our viewers a better idea of the entire process.
First, we confirmed with each other again what our objective is for this project. We intend to provide Alibaba with an automated software that is able to perform video abstraction on their database of live streams such that the abstracted video could be placed in the product description for consumers to view. We want to produce our abstracted videos with the aim of improving sales for Taobao through providing consumers with suitable information. After much discussion, we decided to have 3 separate algorithms to perform this complicated task. The 1st will be the pre-processing, 2nd is the main abstraction algorithm and the 3rd will be an optimization algorithm. After looking through many advertising live streams on Taobao, as well as the product description page, we noticed that in many live streams, one streamer will advertise many different products over an average period of 5 hours. The product description page also can contain multiple product, but all would be categorized under the same genre. Thus the pre-processing stage is necessary to split the live streams into the different categories/products that its advertising. This is to facilitate the process in the 2nd algorithm as well as to reduce the possibility of the abstracted video containing contents of other products.
Following the pre-processing stage, is the main abstraction algorithm. This part took us the longest time as we had to consider what is important information for consumers such that it will promote sales of the product. We also had to consider whether it is possible for the computer to be able to recognize these important information and if so what methods are we going to employ to perform such detection tasks. We split this algorithm into 4 main detection criteria. We intend for the algorithm to detect important product features, important audio information that the streamer provides for their viewers, entertaining aspects that makes watching live streams entertaining as well as external information such as when many people purchase the product at a specific timing. These 4 detections could be performed by training CNN models, training audio to transcript NLP models, etc. After performing these 4 detections separately, we intend to combine them together such that there is a combined rating of each frame in the video before using a threshold value to select all the important frames.
Lastly, is the optimization stage where we have a separate algorithm try to optimize the 2nd algorithm. Previously, we intend to combine the scores from 4 different detectors. However, we don’t have a suitable weightage value to optimize the output from the 2nd algorithm. Thus our 3rd algorithm aims to optimize the weightage using a neural network. The basic idea is to perform manual evaluation of multiple randomly abstracted videos using our 2nd algorithm and then training a neural network that improves after each evaluation by tweaking the weightage values as well as other parameters such as length of video, length of each video skim, transitions, volume, etc. Thus we visualize our final product to be a combination of these 3 algorithm such that it would be a self-learning AI that improves over time.
After setting this clear direction for our project, me and Xu Liang worked out how we want to present our demo. The demo will not be super fanciful, but we wanted to include more visuals such that the audience is better able to understand the long process described above. Thus, we used pygame as our main platform to build the demo such that a certain level of interaction is possible while we are able to perform some level of animation.
All in all, the week was quite fruitful as the main idea is finally out, and the other group members are able to have a 1st draft of our final report completed while me and Xu Liang finished up the 1st version of the demo.