This dataset contains data gathered across 6 different hospitals in northern Italy during the first outbreak of SARS-CoV-2 (March-June 2020) with the coordination of Centro Diagnostico Italiano (CDI): https://aiforcovid.radiomica.it/. All included subjects were confirmed with a diagnosis of COVID-19. Disease outcome was updated at a later stage and is here reported as either severe (if patient required mechanical ventilation or died) or mild (all other outcomes).
During triage a set of clinical tests were performed generating a number of clinical parameters, 16 of which were deemed relevant for outcome prediction and included in the dataset. The following table reports the name and a brief description of the collected items. Not all items are available for all subjects.
Age |
Patient’s age (years) |
Sex |
Patient’s sex (0 – male, 1 – female) |
Body Temperature (°C) |
Patient temperature at admission (in °C) |
Cough |
Cough |
Dyspnea |
Patient had intense tightening in the chest, air hunger, difficulty breathing, breathlessness or feeling of suffocation |
WBC |
White blood cells count (10^9/L) |
CRP |
C-reactive protein concentration (mg/dL) |
Fibrinogen |
Fibrinogen concentration in blood (mg/dL) |
LDH |
Lactate dehydrogenase concentration in blood (U/L) |
D-dimer |
D-dimer amount in blood |
O2 |
Oxygen percentage in blood |
PaO2 |
Partial pressure of of oxygen in arterial blood (mmHg) |
SaO2 |
Arterial oxygen saturation (%) |
pH |
Blood pH |
Cardiovascular Disease |
Patient had cardiovascular disease |
Respiratory Failure |
Patient had respiratory failure |
For each patient, a single chest X-ray is reported, also collected on first day of hospital admission. X-ray scans often occurred in emergency conditions, therefore both image quality and subject position are highly variable. Furthermore, some images were collected on digital support, while others are the result of digitalization of film images.
For the purpose of this challenge, data from the 6 different hospitals constitute the training set and will be provided as collected (raw), for a total of 1103 subjects. The test set is composed of 486 additional entries, all collected at the same institution. The test set has only recently been curated and, differently from the training set, it has not been made publicly available before. Patient outcome is provided in all instances for the training set and never for the test set.
The objective of this hackathon consists in the classification of subjects according to disease outcome.
Proposed solutions will be considered if they meet the following two requirements:
Furthermore, the winners in each category (see below) will be required to make all the relevant code publicly available. There are no constraints over the licensing scheme, but the solution should be reproducible.
The score used for ranking is the accuracy value of the proposed classification on the test set. Please note that the proportion of classes may vary between training and test sets.
A second prize will be awarded for the approach with the highest level of explainability. This decision will be taken by a panel composed of clinicians and computer vision scientists.
Only registered (and authenticated) users can donwload the dataset. Please register or login here.