Skip to main content
Datasets to avoid

Find out which datasets you should NOT use in your course.

George Boorman avatar
Written by George Boorman
Updated over a week ago


What datasets can I use for my course?


The datasets that you choose should be available for commercial use and suitable for your content. Datasets about sensitive topics and datasets containing mature language should not be used. Any datasets that would clearly date the course should be avoided, such as datasets about currently trending topics and datasets that will become out of date quickly.

The following datasets are overused, and our learner feedback suggests these should be avoided. Please choose different ones.

  • Iris

  • mtcars

  • Gapminder

  • Wine dataset

  • Wisconsin breast cancer

  • Titanic

  • Boston housing

  • diamonds

  • Olympics

  • Avocado prices

  • World Happiness Survey

  • Bike-sharing


The following dataset topics are overused and should be avoided:

  • Diabetes

  • Kidney disease

  • Car / automobiles

  • Housing

  • Credit cards / loans

Additionally, datasets from UCI Irvine Machine Learning Repository should also be avoided.

Did this answer your question?