Question:

Where can I find datasets?

Answer:

Here are a few data sources to get you started.

Curated datasets:

  • DataCamp can provide access to the datasets curated by ThinkNum. Ask your Curriculum Lead for details.
  • For small, reasonably clean datasets, Wikipedia tables are excellent.
  • You can generate a list of CRAN packages that contain datasets using finddatasetpkgs. Use the following code:
library(remotes)
install_github("datacamp/finddatasetpkgs")

library(finddatasetpkgs)
finddatasetpkgs::get_dataset_pkgs()

Data sharing platforms:

  • CKAN is a data sharing platform. Some popular instances include datahub.io, catalog.data.gov, and the European Data Portal.
  • Dataverse is a good source of datasets from academic papers. (Click the map on the home page to see specific Dataverse installations.)
  • data.world hosts datasets directly and contains many (typically small) datasets.
  • Our world in data contains articles on the state of the world according to datasets, with links to the data used.

Governmental and NGO datasets:

Did this answer your question?