Selecting packages

Consider the different technologies, packages, or functions you'd like to use in your course and whether it's too much (or too little).

Richie Cotton avatar
Written by Richie Cotton
Updated over a week ago

You will need to determine the technologies, package, or functions that you want to use in your course. This will help you determine the scope of your course. Include things like R/Python packages, SQL modules, or Google Sheets add-ons. If there are any important functions, methods, or commands that you want to teach, you can mention them here.

Are there any restrictions on technology?

  1. All software must be installable under Ubuntu Linux.

  2. There must be no fee for DataCamp to use the technology.

  3. The license for the software must allow DataCamp to host it.

  4. The technology should be currently maintained.

Can I use R packages from GitHub/Bitbucket/my mate's laptop?

You can use R packages from GitHub or Bitbucket during course development, but by the time the course is released, they must have been published on CRAN or Bioconductor.

Where can I get Python packages from?

Python packages must support Python 3 and must be available via PyPI.

How many technologies can I use?

If you have too many technologies or packages (especially those with competing syntax), you risk confusing students. Aim for "the smallest number of technologies that lets you teach everything that you want to teach well."

For R and Python courses, more than 4 packages is a warning sign of over-complication.

Don't include technologies that you don't need

In particular, for R courses, you will never need to load the whole tidyverse  package; select its individual component packages.

To reiterate: aim for "the smallest number of technologies that lets you teach everything that you want to teach well."

Don't try to teach competing packages

A common desire is to compare the syntax or features of two different packages. Course rating data suggests that the majority of students really dislike this. In general, students want to learn 1 good way to do things, not to have a discussion about competing choices.

Explain your choices

If there are several possible packages or technologies that you could use, list them all, and write down your reasoning for picking one. This reasoning is often useful to discuss in a video exercise.

It's in your interest and DataCamp's interest to teach technologies that people actually use. If you have to choose between two packages, then it's usually best to pick the most popular one.

Check that the technology is current and maintained

For R packages, check their homepage on CRAN or Bioconductor to see if they have recently been updated. Likewise, for Python packages, you can check the most recent release date on PyPI. Make a note of any packages that haven't been updated in over a year.

Visit: https://cranlogs.r-pkg.org/badges/last-month/{pkgname} to see the number of downloads in the last month for a specific R package.

RDocumentation also provides estimates of package download numbers.

Examples

From a course on keras. This example has only a few Python packages and goes into depth on the functions that will be used.

  • keras, pandas, sklearn

  • Keras functions:DenseConcatenate, Subtract, Multiply (operate on 2 layers)

  • EmbeddingFlattenkeras.preprocessing.text.text_to_word_sequencekeras.preprocessing.sequence.pad_sequences

  • GRUBidirectional

  • MAYBE Droput and BatchNormalization

From a course on single cell RNA-Seq workflows. This example considers which R packages are needed for each of the topics covered in the course. The detail is excellent, however too many packages were chosen.

  • Introduction to single-cell RNA-SeqSingleCellExperiment, scRNAseq

  • Visualization and normalizationSingleCellExperiment, scater, magrittr, scran, scone, zinbwave

  • Dimensionality reduction and cell clusteringSingleCellExperiment, scater, clusterExperiment, Seurat, scone, dplyr, ggplot2

  • Differential Expression analysisSingleCellExperiment, scater, edgeR, MAST, zinbwave, NMF, ggplot2

Did this answer your question?