![]() ![]() Flickr Audio Caption CorpusĤ0,000 spoken captions of 8,000 natural images, 4.2 GB in size. The entire set is about 38 GB in size available in both audio and without audio format. Hundreds of hours of aligned audio and annotations can be mapped back to the original HTML. This is a corpus of aligned spoken Wikipedia articles from the English, German, and Dutch Wikipedia. This set is 150 MB in size and has about 2000 hours of speech. It’s an intriguing use case for isolating and identifying which superstar the voice belongs to. There is no overlap between the development and test sets. The celebrities span a diverse range of accents, professions, and age. The data is mostly gender balanced (males comprise of 55%). It contains around 100,000 utterances by 1,251 celebrities, extracted from You Tube videos. VoxCeleb is a large-scale speaker identification dataset. The data has been sourced from audio books from the LibriVox project and is 60 GB in size. This dataset is a large-scale corpus of around 1000 hours of English speech. This is a really small set- about 10 MB in size. Currently, it contains the below characteristics: 1) 3 speakers 2) 1,500 recordings (50 of each digit per speaker) 3) English pronunciations. It’s an open dataset so the hope is that it will keep growing as people keep contributing more samples. This one was created to solve the task of identifying spoken digits in audio samples. Top 6 Cheat Sheets Novice Machine Learning Engineers Need Speech Datasets Free Spoken Digit Dataset The size of this dataset is about 280 GB. The sample audio can be fetched from services like 7digital, using the code provided by Columbia University. The dataset does not include any audio, only the derived features. The core of the dataset is the feature analysis and meta-data for one million songs. The Million Song Dataset is a freely-available collection of audio features and meta-data for a million contemporary popular music tracks. It is an open dataset created for evaluating several tasks in Music Information Retrieval (MIR). The dataset consists of full-length and HQ audio, pre-computed features, and track and user-level meta-data. Sign up for a time slot Music Datasets Free Music ArchiveįMA is a dataset for music analysis. Trying to build a custom dataset? Not sure where to start? Join me for a 30-minute one on one to talk about your project. Here is a list of datasets that I found pretty useful for our research and that I've personally used to make my audio related models perform much better in real-world environments. From deep learning based voice extraction to teaching computers how to read our emotions, we needed to use a wide set of data to deliver APIs that worked even in the craziest sound environments. ![]() List of 25 Large Audio Datasets I use for my audio researchĪt Wonder Technologies, we have spent a lot of time building Deep learning systems that understand the world through audio. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |