Corpora

ADC2004
type: Audio
size: 20 excerpts
metadata: predominant pitchAMG1608
AMG1608 is a dataset for music emotion analysis. It contains frame-level acoustic features extracted from 1608 30-second music clips and corresponding valence-arousal (VA) annotations provided by 665 subjects.
type: Audio
size: 1,608
metadata: valence & arousalAPL
type: Audio
size: 620 segments
metadata: piano practiceartist20
type: Audio
size: 1413 songs
metadata: 20 artistsAudio Content Analysis Datasets
Companion datasets to the book Audio Content Analysis by Alexander Lerch
type: Audiobach10
type: Audio / Symbolic
size: 10 chorales
metadata: multitrack & aligned MIDIballroom
type: Audio
size: 698 excerpts (30s)
metadata: 8 genres & tempo & (down-)beatsbeatboxset1
type: Audio
size: 14 clips
metadata: perc. annotationC224a
type: Audio
size: 224 artists
metadata: 14 genresC3ka
type: Audio
size: 3000 artists
metadata: 18 genresC49ka-C111ka
type: Audio
size: 48800/110588 artists
metadata: genresCAL10k
type: Audio
size: 10870 songs
metadata: tagsCAL500
type: Audio
size: 502 songs
metadata: tagsCenter for Computer Assisted Research in the Humanities
Musedata, Themefinder, Humdrum and Kern resources.
type: Symbolic
metadata: tagsCCMixter
type: Audio
size: 50 mixes
metadata: vocal & background trackChopin22
type: Audio
size: 44 recordings
metadata: audio & aligned MIDICMMSD
type: Audio
size: 36 excerpts
metadata: note/rest/transition & onsets & vibratoCoidach
type: Audio
size: 26420 songs
metadata: 55 genresCompmusic Corpora
Data collections of cultural music from various sources that evolve and grow.
type: AudiocorpusCOFLA
type: Audio
size: 1800 flamenco recordings
metadata: editorial & predominant melodycovers80
type: Audio
size: 80 song pairs
metadata: cover songsCREL Singing Voice Database
Dataset for research of physical characteristics of different singing expressions
type: Audio
metadata: segmented with temporal markers for each expressionDAMP
type: Audio
size: 34000 monophonic recordings
metadata: karaoke performancesDEAM
The biggest publicly available music affect dataset., which has 1802 songs. It contains average and std of valence and arousal value of each excerpt. It has audio files, feature and annotations.
type: Audio
size: 1,802
metadata: valence & arousalDEAPDataset
type: Audio
size: 120 music video excerpts
metadata: valence & arousal & dominance & physiological dataDREANSS
type: Audio
size: 18 excerpts
metadata: onset times & perc. instrumentsDrumPt
type: Audio
size: app. 2000 annotations
metadata: 4 playing techniquesemoMusic
type: Audio
size: 744 excerpts (45s)
metadata: arousal & valenceEmotify
Emotify dataset has no arousal/valence values, but it provides the audio and is annotated with the GEMS. The discrete emotion tags include amazement, solemnity, tenderness, nostalgia, calmness, power, joyful activation, tension, and sadness.

type: Audio
size: 400 excerpts
metadata: induced emotion

ENST-Drums
type: Audio
size: 318 segments
metadata: onset times & perc. instruments & playing techniqueExtendedballroom
type: Audio
size: 4000 excerpts (30s)
metadata: 9 genres & tempo &ampffuhrmann
type: Audio
size: 6951 excerpts/220 songs
metadata: 11 predom. instr.FlaBase
type: Audio
size: 1102 artists & 74 palos & 2860 albums & 13311 tracks
metadata: editorial & biographical & musicological information on flamencoFMA-medium
type: Audio
size: 14511 excerpts (30s)
metadata: 20 genresFMA-small
type: Audio
size: 4000 excerpts (30s)
metadata: 10 genresFugue
Reference data for computational music analysis. Now contains a dataset of ground truth structures for fugues.
type: Symbolic
size: 36 pieces
metadata: fugue analysisGiantStepsKey
Datasets for automatic evaluation of tempo estimation and key detection algorithms.
type: Audio
size: 604 files
metadata: keyGiantStepsTempo
Datasets for automatic evaluation of tempo estimation and key detection algorithms.
type: Audio
size: 664 files
metadata: tempoGNMID14
type: Audio
size: 110M music ID matches
metadata: timestamp & countryGood-sounds.org
type: Audio
size: 8750 notes
metadata: 12 instruments, pitch, sound qualityGPT
type: Audio
size: 6580 clips
metadata: 7 guitar playing techniquesGTZAN
type: Audio
size: 1000 excerpts (30s)
metadata: 10 genres & tempo & key1 & key2 & beat/downbeat & metrical levelsHainsworth
type: Audio
size: 245 excerpts (60s)
metadata: tempoHJDB
type: Audio
size: 236 excerpts
metadata: downbeatholzapfel:onset
type: Audio
size: 78 excerpts
metadata: onset timeshomburg
type: Audio
size: 1889 excerpts (10s)
metadata: 9 genresIADS
type: Audio
size: 111 sound snippets
metadata: valence & arousal & dominanceIDMT-MT
type: Audio
size: 12 songs
metadata: multitrack & styleIDMT-SMT-Audio-Effects
type: Audio
size: 55044 recordings
metadata: effects on bass and guitar notesIDMT-SMT-Bass
type: Audio
size: 4300 excerpts
metadata: bass performance stylesIDMT-SMT-Bass-SINGLE-TRACK
type: Audio
size: 17 bass lines (?)
metadata: style annotated bass linesIDMT-SMT-Drums
type: Audio
size: 518 files
metadata: onset times & perc. instrumentsIDMT-SMT-Guitar
type: Audio
size: 4700+400 note events
metadata: 9 guitar playing techniquesiKala Dataset
Comprised of 252 30-second excerpts sampled from 206 iKala songs
type: Audio
size: 252
metadata: Pitch contour, timestamped lyricsINRIA:EuroVision
type: Audio
size: 124 songs
metadata: structureINRIA:Quaero
type: Audio
size: 159 songs
metadata: structureIRMAS
type: Audio
size: 2874 excerpts
metadata: 11 instrumentsISMIR2004Genre
type: Audio
size: 729 excerpts (30s)
metadata: 6 genresISMIR2004Tempo
type: Audio
size: 465 excerpts (20s)
metadata: tempoIsophonics
Datasets, Ontologies, and other goodies.
type: AudioJ-DISC
J-DISC is a resource for searching and exploring jazz recordings created by the Center for Jazz Studies at Columbia University.
type: AudioJamendo
type: Audio
size: 61+16+16 songs
metadata: voice activityJGDB
type: Audio
size: random generated excerpts
metadata: multitrack & MIDIJordan:Classical
type: Audio
size: 15 pieces
metadata: structureJordan:Jazz
type: Audio
size: 15 pieces
metadata: structureLabROSA:APT
type: Audio
size: 29 piano excerpts
metadata: MIDILabROSA:MIDI
type: Audio / Symbolic (midi)
size: 4 songsLakh MIDI DataSet
The Lakh MIDI dataset is a collection of 176,581 unique MIDI files, 45,129 of which have been matched and aligned to entries in the Million Song Dataset.
type: Symbolic (midi)
size: 176,581last.fm
type: Audio
size: 992 users
metadata: listening habitsLatin
type: Audio
size: 3160 songs
metadata: 10 genresmagnatagatune
type: Audio
size: 25863 excerpts (30s)
metadata: similarityMAPS Database
A piano database for multipitch estimation and automatic transcription of music.
type: Audio
size: 238 pieces
metadata: Proud truth pitch informationMARD
type: Audio
size: 66566 songs
metadata: album reviewsMARG Note-level Singing Dataset
Dataset produced by the Music & Audio Research Group for work in automatic music transcription
type: Audio
metadata: ground truth pitch information (monophonic)MARG-AMT
type: Audio
size: 30 melodies
metadata: MIDI pitch & onset/offset timesMcGill Billboard
type: Audio
size: 740 songs
metadata: chordsMcGill Billboard Annotations
Annotations and audio features for the first 1000 randomly selected entries from Billboard chart slots presented at ISMIR 2011, and the additional 300 entries used to evaluate audio chord estimation for MIREX 2012.
type: Audio
metadata: high-level structure, timestamped chord labels, instrument informationMedleyDB
type: Audio
size: 122 songs
metadata: multitrack & genre & melody f0 & instrument activationMeertens Tunes Collections
The MTC consist of a number of melodic data sets (Dutch Songs), both vocal and instrumental. MTC is open access available for research purposes and is especially valuable for MIR research.MidiDB
MIDI transcriptions of many popular songs, including EDM.
type: Symbolic (midi)Million Musical Tweets Dataset
The “Million Musical Tweets Dataset” (MMTD) contains listening histories inferred from microblogs. Each listening event identified via twitter-id and user-id is annotated with temporal (date, time, weekday, timezone), spatial (longitude, latitude, continent, country, county, state, city), and contextual (information on the country) information. In addition, pointers to artist and track are provided as a matter of course.
type: Audio
size: 1,000,000Million Song Dataset
A collection of audio features and metadata for a million contemporary popular music tracks.
type: Audio
size: 1,000,000MIR Datasets
A list of datasets maintained at the Music Inforation Retrieval Wiki.MIR Lab
Corpora prepared by MIR Lab.MIR-1K Dataset
One thousand clip dataset for singing voice separation from MIR Lab,
type: Audio
size: 1,000
metadata: pitch contour, lyrics, indices and types for unvoiced frames.mirex05Train
type: Audio
size: 13 excerpts
metadata: predominant pitchmirex06Train
type: Audio
size: 20 excerpts (30s)
metadata: tempo & beatsMMTD
type: Audio
size: 1086808 tweets
metadata: listening behaviorModal
type: Audio
size: 71 snippets
metadata: onset timesMood Swing Dataset
It contains V/A value of 240 songs.
type: Audio
size: 240MOODetector:Bi-Modal
type: Audio
size: 133 excerpts
metadata: lyrics & valence & arousalMOODetector:Multi-Modal
type: Audio
size: 903 excerpts (30s)
metadata: lyrics & MIDI & moodMSD
type: Audio
size: 1000000 songs
metadata: meta data & proprietary featuresMTG-QBH
type: Audio
size: 118 queries/481 songs
metadata: title & artistMuseData
An electronic library of Classical Music scores
type: Symbolic (Midi, MuseData, Humdrum)
size: 881Music Mood Rating Dataverse
It contains average ratings of discrete emotion tags, including valence, arousal, atmosphere, happy, dark, sad, angry, sensual, sentimental.
type: Audio
size: 600
metadata: AnnotationsMusic Recommendation Dataset (KGRec-music)
Two different datasets with users, items, implicit feedback interactions between users and items, item tags, and item text descriptions are provided, one for Music Recommendation (KGRec-music), and other for Sound Recommendation (KGRec-sound)
type: AudioMusic Technology Group Datasets
Various datasets compiled as part of research projects carried out at the MTG.MusicClef 2012
The MusiClef 2012 – Multimodal Music Data Set provides editorial metadata, various audio features, user tags, web pages, and expert labels on a set of 1355 popular songs. It was used in the MusiClef 2012 Evaluation Campaign.
type: Audio
size: 1355 songs
metadata: tagsMusicMicro
type: Audio
size: 136866 users
metadata: music listening patternsMusicMicro Dataset
The “MusicMicro 11.11-09.12” data set contains listening histories inferred from microblogs. Each listening event identified via twitter-id and user-id is annotated with temporal (month and weekday) and spatial (longitude, latitude, country, and city) information. In addition, pointers to artist and track are provided as a matter of course.MusicNet
type: Audio
size: 330 recordings
metadata: pitch and onsetsmusiXmatch Database
Official lyrics collection of the Million Song Dataset.
size: 1,000,000NSynth
type: Audio
size: 305,979 excerpts
metadata: 305,979 musical notes, each with a unique pitch, timbre, and envelopeODB
type: Audio
size: 19 excerpts
metadata: onset timesOnset_Leveau
type: Audio
size: 21 excerpts
metadata: onset timesPetrucci Music Library
The datasets backing the Music Ngram Viewer.
type: Symbolic
metadata: N-gram per yearPhonation Modes Dataset
A collection of datasets for detection of phonation modes: breathy, neutral, flow and pressed.
type: Audio
size: 900
metadata: Phonation mode ground truthPlaylistDataset
type: Audio
size: 75262 songs/2840553 transitions
metadata: playlistsQBT-Extended
type: Symbolic
size: 3365 queries/51 songs
metadata: tapsQMUL:Beatles
type: Audio
size: 181 songs
metadata: structure & key & chords & beatsQMUL:King
type: Audio
size: 14 songs
metadata: structure & key & chordsQMUL:MichaelJackson
type: Audio
size: 38 songs
metadata: structureQMUL:MultiTrack
type: Audio
size: 104 songs
metadata: structure & multitrackQMUL:Queen
type: Audio
size: 51/31 songs
metadata: structure/key & chordsQMUL:RSS
type: Audio
size: 60 songs
metadata: structureQMUL:Zweieck
type: Audio
size: 18 songs
metadata: structure & key & chords & beatsQUASI
type: Audio
size: 11 songs
metadata: multitrackRECOLA Database
Multimodal recordings of spontaneous collaborative and affective interactions in French.
type: Audio
metadata: Segmentation of spoken utterances, probability of speech, acoustic low-level descriptorsRepovizz
A framework for remote storage, visual browsing, annotation, and exchange of multi-modal data.RockCorpus
type: Audio
size: 200 songs
metadata: chords & melody & barsRWC
type: Audio
size: 115 songs/50 classical/100 songs
metadata: lyrics & 10 genre & 50 instruments & chords & structure & aligned MIDIRWC Music Database
The RWC (Real World Computing) Music Database is a copyright-cleared music database (DB) available to researchers as a common foundation for research.
type: Audio
size: 315
metadata: ground truth midiSaarland Music Data
Saarland Music Data (SMD) – SMD supplies free music recordings of Western classical music (SMD Western Music) as well as MIDI-audio pairs (SMD MIDI-Audio Piano Music), which have been generated by using hybrid acoustic / digital pianos (Disklavier).
type: Symbolic / AudioSALAMI
type: Audio
size: 779 songs
metadata: structureSargon
type: Audio
size: 4 songs
metadata: structureSASD
type: Audio
size: 268+2336artists
metadata: artist biographies & similaritySchenker
A dataset of MusicXML excerpts and corresponding Schenkerian analyses in a computer-readable format.
type: Symbolic (musicXML)
size: 41 pieces
metadata: MusicXML & Schenker analysisSeyerlehner:1517-Artists
type: Audio
size: 3180 songs
metadata: 19 genresSeyerlehner:Annotated
type: Audio
size: 190 songs
metadata: 19 genresSeyerlehner:Pop
type: Audio
size: 1105 songs
metadata: tempoSeyerlehner:Unique
type: Audio
size: 3115 excerpts (30s)
metadata: 14 genresSISEC
type: Audio
size: 5 excerpts
metadata: multitrack & mixSMC:MIREX
type: Audio
size: 217 excerpts
metadata: tempo & beat positionsSMD
type: Audio
size: 50 recordings
metadata: audio & aligned MIDISoundtrack
The selection of the excerpts has been done in terms of dimensional and discrete emotion model (see the paper for details) and evaluated by pilot study and a larger scale study. The soundtracks are short (approx. 15 second) excerpts from film soundtracks.
type: Audio
size: 360
metadata: valence & energy & tension & moodSPAM
type: Audio
size: 50 songs
metadata: structureSu-AMT
type: Audio
size: 10 excerpts
metadata: onset times & pitchSuomen Kansan eSävelmät
Digital Archive of Finnish Folk Tunes.
type: Audio
size: 9,000
metadata: notation, key, meter, place of collection, lyricsSymbTr
A Turkish Makam Music Symbolic Data Collection.
type: Audio
size: 2,000
metadata: Phrase boundaries, segment boundaries.The Bellmann Corpus
Released in 2013, consisting of musical scores for over 650 pieces (or complete sections of multi-movement works) for piano or harpsichord
type: Symbolic (midi)
size: 650The Meertens Tune Collections
type: Audio
size: 3000-7000 melodies
metadata: phrases & key & meterTonal Harmony Excerpts
MIDI files from the workbook and instructor’s manual for Tonal Harmony by Stefan Kostka and Dorothy Payne.
type: Symbolic (midi)
size: 46
metadata: Ground truth chord labels.TONAS
type: Audio
size: 72 single-voiced excerpts
metadata: pitchTPD
type: Audio
size: 23385 songs
metadata: popularity ratingTRIOS
type: Audio
size: 5 excerpts
metadata: multitrack & aligned MIDITunebot
type: Audio
size: 10000 queries/? songs
metadata: title & artistUMA-Piano
type: Audio
size: 275040 recordings
metadata: piano chordsuspop2002
type: Audio
size: 8752 songs
metadata: tags & genre & chordsWeimar Jazz Database (WJAZZD)
A component of the Jazzomat project, WJAZZD is a database of jazz solo transcriptions available to the public to further enhance and improve jazz and MIR research.
type: Symbolic

ACM_MIRUM
type: Audio
size: 1410 excerpts (60s)
metadata: tempo