For this 2nd edition of the Challenge4Cancer, Epidemium will provide a comprehensive dataset of demographic, economic and environmental indicators, collected from the public data repositories of international institutions.
The data is coupled with the cancer epidemiologic data used in the previous Challenge4Cancer, with the purpose of exploring and revealing correlations between these factors and worldwide incidence and mortality data. As of now, the data were collected from the following three institutions: a/ The World Bank, b/ The Food and Agriculture Organization for the United Nations (FAO) and c/ the International Labour Organization (ILO) The data aggregates about 7000 indicators, within a 40 years and 200 countries span. The individual indicators are classified according to their sources and theme.
In order to get a ready-to-use dataset, we performed a cleaning and processing pipeline on the source files using Dataiku Data Science Studio software. The documentation is available next to the data tables.