Premise-based Multi-modal Reasoning(PMR) is a task that the inference model should be able to reason with both textual (from the premise) and visual(from images) clues.
The PMR dataset contains 15,360 manually annotated samples and adversarial samples with the same size as the former. You can browse them in Explore page.
We provide prediction examples with standard format to help you standardlize the result, and you can submit it by mailing at ccl2022_pmr@163.com with model.
This research is supported by the National Key Research and Development Program of China 2020AAA0106700 and NSFC project U19A2065. We thank all the workers who participated in our data collection for their contributions.
Also thanks to SQuAD and VCR for allowing us to use their code to create this website!
We use accuracy on mix-test dataset as a metric to ranking models' performance on leaderboard.
Rank | Model | Mix-Accuracy |
---|---|---|
1 | ERNIE-VIL-Large
Baidu Inc. (Yu et al., 2020) |
77.6 |
2 | VLBERT-Large
University of Science and Technology of China, MSRA (Su et al., 2019) |
76.4 |
3 | UNITER-Large
Microsoft Dynamics 365 AI Research (Chen et al., 2020) |
73.2 |