DJ-MVP 2022

A music video (MV) is a videotaped performance of a recorded popular song, usually accompanied by dancing and visual images. In this paper, we outline the design of DJ-MVP – a generative music video system, which automatically generates an audio-video mash-up for a given target audio track. The system performs segmentation for the given target song based on the beat detection. Next, according to audio similarity analysis, we obtain generated video segments. Then, the speeds of these video segments are adjusted so their durations match the length of audio segments and are concatenated as the final music video. An evaluation of our system has shown that users are receptive to this novel presentation of music videos and are interested in future developments.

Figure 1 shows the design of DJ-MVP. For building the corpus, music videos are segmented based on the beat detection. When running the system, it first extracts the beat timestamps in the target audio and shifts them by a user-defined amount. The collection of audio beat timestamps are then adjusted to specific lengths (one, two, or four beats) to vary the length of segments. Audio features are extracted from these segments and normalized, and the data is aggregated. Principal Component Analysis is then performed to get the most relevant columns of data. The system uses this information to find the segment from a corpus of videos that has features that are most similar to the inputted target audio. The system also allows users to blacklist songs based on repetitiveness and add visual effects to the video segments. Finally, the system concatenates all video segments and combines them with the target audio.

Papers & Posters

Fan, J., Li, W., Bizzocchi, Jim., Bizzocchi, Justine., Pasquier, P.: DJ-MVP : An Automatic Music Video Producer, November 2016 ACE ’16: Proceedings of the Advances in Computer Entertainment Technology Conference (2016).

Download PDF



  • Audio/video alignment: Number of milliseconds to shift the video, relative to audio
  • Max repeated song: Maximum number of video segments from the same song
  • 10-second preview: Produces a 10-second video using the first 10 seconds of audio
  • Black and white: Adds a black and white effect to the video
  • Fade: Adds a fade in/out to black effect to the video
  • Mirror: Flips the video horizontally or vertically
  • Datamosh: Creates a glitch effect in the video
  • Paintify: Adds an effect to the video to make it look like a painting and removes detail from video frames

Supported Audio Formats

  • mp3
  • wav