Announcing the Call for Code 2019 Global WinnerLearn more
A dataset for the task of Natural Language Inference in the clinical domain.
A dataset of standardized images of fashion items from 10 classes
Text from approximately 1000 English compliance sentences obtained from IBM’s publicly available contracts, annotated with a layer of “universal” semantic role labels.
Local climatological data originally collected by JFK airport.
Videos of the chaotic motion of a double pendulum, in addition to the positions of the datums of the pendulum.
This dataset consists of raw and processed execution logs generated from two versions of Nutch, an open source web crawler application.
Text from approximately 1000 English sentences obtained from IBM’s public annual financial reports, annotated with a layer of “universal” semantic role labels.
The released dataset contains term-relatedness values for 9,856 pairs of terms.
A set of concepts from Wikipedia rated for their degree of abstractness.
60 argumentative speeches, recorded by expert debaters, discussing various controversial topics, in audio and text formats.
A dataset on the sentiment of phrases from the interaction between its constituents.
Sentences from Wikipedia together with their topic.
400 argumentative speeches, recorded by expert debaters, discussing 200 controversial topics.
The goal of Mention Detection is to map entities/concepts mentioned in text to the correct concept in a knowledge base. The dataset contains 3000 sentences that are annotated with Mentions.
5,000 frequently occurring idioms with sentiment annotation
The emphasized words dataset was created to train and evaluate a system that receives a written argumentative speech and predicts which words should be emphasized by the Text-to-Speech component.
200 argumentative speeches, recorded by expert debaters, discussing 50 controversial topics.
Wikipedia categories annotated with their stance towards Wikipedia concepts representing controversial topics.
Pairs of concepts from Wikipedia scored for their level of relatedness.
A benchmark of sentence-clustering based on the partition of Wikipedia articles into sections.
Dialog Act Classification for Online Discussions
Randomly sampled 100 threads from the dataset of discussion threads from Ubuntu Forums used in previous research; Ubuntu Forums: Official forum of the Ubuntu Linux distribution
A dataset of images, together with a question and associated answer for each image
More datasets coming soon.