All Collections
Frequently Asked Questions (FAQ)
Profanity and sensitive content in data
Profanity and sensitive content in data

Can your dataset include profanity, political, religious, or potentially offensive content?

Richie Cotton avatar
Written by Richie Cotton
Updated over a week ago

Question:

Can I use data with profanity or sensitive content?

Answer:

Many learners are taking DataCamp content at work, so in general, you should ensure that your content (including the datasets) is safe for work. If a learner gets offended and stops taking your course or project, they will stop learning, and you'll lose revshare (because they didn't complete your content).


Social media datasets pose a problem. Twitter data, in particular, is notorious for containing profanity and offensive content. Analysis of social media is an important skill for data scientists, so it is absolutely allowed to use those datasets, as long as you follow these principles:

Notify the learners

If there is some profanity or offensive content in the dataset, add a note to the first exercise or task where they see this data. For example:

> Be aware that this is real data from Twitter, and there is some use of profanity.

Don't push an agenda

If the dataset includes political or religious content, you must not push your own agenda or viewpoint. Remain as neutral as possible.

Try to avoid printing the sensitive content

Users actively digging through the dataset to find sensitive content is different from an exercise requiring them to print it on screen. If possible, avoid printing, or cherry-pick the safe bits of the dataset to print.

Choose safer topics

If the learning objective allows it, just choose a dataset that isn't likely to offend learners.

Did this answer your question?