What is a capstone exercise?

A Capstone Exercise is just the final exercise in a chapter. They aren’t meant to be “projects”. They aren’t meant to be larger or more in-depth than a typical exercise. They just happen to come at the end of a chapter. Because each lesson in a chapter builds on previous lessons, we would expect that the final exercise will touch on concepts from the entire chapter, but we don’t necessarily expect it to assess all of the learning objectives taught in the chapter.

Why build capstones?

We ask you to build Capstone Exercises during the Course Specs process. We do this because:

We want you to get used to using our Teach editor. Building these exercises should help you learn our content guidelines and become familiar with the platform. This is a great time for asking questions and making mistakes!
Writing Capstones is like building a roadmap for your course. If you know where each chapter is supposed to end, it’s much easier to figure out what needs to be in the rest of the chapter.
Capstones can help spotlight potential problems with course structure. As you build a Capstone Exercise, you’ll notice required prerequisites that might be missing, or concepts that are difficult to teach via code. Sometimes, this can lead to big changes in your course outline.

Common Capstone Problems

Covering too much material

The most common challenge in designing a capstone exercise is choosing how much material to include. Most instructors’ first instinct is to build a toy version of a typical data analysis process: load and visualize a dataset, apply some sort of analysis or machine learning, visualize the result. This is a great starting point, but it usually ends up with too much material for a single exercise. Here are some tips for trimming down an exercise:

You can probably do the initial dataset familiarization in your slides or in a previous exercise. In fact, you’ll likely work with the same dataset for an entire chapter, so it would be jarring to “start from scratch” in your final exercise.
Focus on the learning objectives you want to assess. If you want to teach a new methodology (such as “Learner will be able to perform k-means clustering” or “Learner will be able to fit a linear regression model” or “Learner will be able to use a CASE statement to clean data”), you probably want to focus on the “middle” step. If you want to the learner to think about the results of an analysis (such as “Learner will be able to compare the results of a logistic classifier and a random forest classifier” or “Learner will be able to select an appropriate SQL table schema” or “Learner will be able to define an outlier”), you might want to focus on the visualization/exploration of results.
Consider what tasks are most representative of the chapter. Sometimes, you’ll need to cheat a little bit when making a capstone exercise. Although a wrap-up/visualization of results might be the most appropriate final exercise for a chapter, it doesn’t help DataCamp know where your chapter is headed. In this case, it’s best to build the second-to-last exercise, where you actually perform the critical analysis or computation.
Move some code into your pre-exercise code. Need to load a module? Do it in pre-exercise code. Need to pre-process a dataset? Do that in pre-exercise code. The only code that should be in your sample/solution code is what is necessary for a learner to engage with the target learning objective. Be sure to leave in any code that is necessary for understanding, but if you plan on covering a topic in depth during a previous lesson, you might be able to skip it in the capstone exercise.
At the same time, don’t overdo it on pre-exercise code. The goal is for students to be able to reproduce the skills learned in the real world. This becomes difficult to impossible if you’re creating functions on the back end, since students can’t see how you created them/how to create them outside of your course.

Incorrect use of sequential exercises

A common response to having too much material is to use a sequential exercise. Don’t do that! An entire sequential exercise (all instructions, and all sample code) should not contain any more material than a standard coding exercise. Sequential exercises are just a way of breaking up an exercise when the intermediate results are necessary for following the final instructions.

Each new line of code should build on the previous code going from one step to the next so there shouldn’t be anything rearranged between steps.

The total amount of code (from all steps of a sequential exercise) cannot exceed 15 lines, including the comments.

Iterative exercises

Iterative exercises are the only ones where bullets shouldn't be added to the instructions. Numbered bullets are automatically added to these when rendered, so it looks really odd if you add additional bullets. Remember you can only have at most 4 instructions total in an exercise and in iterative exercises, you can only have 1 per step. You’ll need to reduce these.

Not checking for errors

It’s easy to write code that you think is correct, but that still contains errors. The easiest way to check for errors is to preview every exercise. Although you can’t preview slides, you should always be able to preview and run all exercises. You can do this by clicking the “Preview” button in the lower right corner of the Teach editor.

In addition, you should always check the build status for warnings. You can access the "Builds" page using the menu on the left-hand side.

You can find out what the specific warnings are be clicking on the specific build in the "Builds" page and scrolling through the build log.

Some warnings are stylistic (like when you have too many or too few characters in the Context box. Others are code errors that can result from anything from a typo to a missing requirement. Try your best to clear these up before reaching out to your CM, but if it isn’t clear, ask your CM for help as soon as possible as a GitHub issue in your repository with a screenshot or screencast!

Learning objective mismatch

Before writing a capstone exercise, you should identify one or two learning objectives that you want to assess/practice. This will ensure that your capstone is meaningful and representative of the chapter you are teaching. It will also help you decide what code is superfluous and can be moved to the pre-exercise code. We recommend you use your final lesson learning objective as your guide here, but remember to be as narrow as possible with your learning objective for the exercise while keeping the lesson learning objective in your mind.

Missing motivation

All exercises need some motivation. This usually comes from an interesting dataset and a problem to solve. You can build learner engagement by stating a problem in the Context section of your exercise. It’s likely that the dataset you’re working with has been explained in a previous exercise, but you should still mention which dataset you’re using and why you’re about to give these instructions.

Chapter 1 Capstone Success Message

This is probably your most important success message of the course as this is the end of the first chapter, where unregistered students hit a paywall - you want this one to be as encouraging and engaging as possible, to entice students to keep going.

Chapter 4 Capstone Success Message

This will be the final exercise of the entire course. Given that our platform is all about learning by doing and students will have learned a lot of code at this point, it is highly suggested to finish with a non-multiple choice exercise.

Common Style Problems

Context vs. Instructions

Context

Lists any modules, variables, or data that were loaded in the exercise code
Tells a story to motivate the exercise (i.e., what real world problem are we trying to solve)
(optionally) Gives a quick refresher of the material learned in the slides

Instructions

Tells the learner what to do using specific language
Is always a bulleted list
Every instruction should be paraphrased as a comment in the sample code

Aliasing filenames (in Python)

In Python, we often want learners to load data from a file. The file should be uploaded to the Assets section. Once it is uploaded, it will be assigned a long, ugly URL. We probably don’t want learners to use that ugly URL. We can easily alias our file to a shorter filename using the following code:

from urllib.request import urlretrieve
url = 'long/ugly/filename.csv'
urlretrieve(url, 'short_filename.csv')

How much scaffolding?

The easiest way to create sample code is to start by building the solution code, and then subtracting. In general, you want to:

Turn any important functions (and their arguments) into blanks
Leave variable names filled in
Leave code that is not related to the learning objective being assessed filled in (or move it to the pre-exercise code, if possible)

Python Sample Code

# Fit the data into a k-means algorithm
centroids,_ = ____

# Assign cluster labels
df['cluster_labels'], _ = ____

# Display cluster centers 
print(df.groupby('cluster_labels').mean())

# Create a scatter plot through seaborn and assign a different color to each cluster
sns.scatterplot(x=____, y=____, hue=____, data=df)

plt.show()

Python Solution Code

# Fit the data into a k-means algorithm
centroids,_ = kmeans(df, 3)

# Assign cluster labels
df['cluster_labels'], _ = vq(df, centroids)

# Display cluster centers 
print(df.groupby('cluster_labels').mean())

# Create a scatter plot through seaborn and assign a different color to each cluster
sns.scatterplot(x='scaled_def', y='scaled_phy', hue='cluster_labels', data=df_sample)

plt.show()

R Sample Code

# Build the augmented dataframe
algeria_fitted <- ___

# Compare the predicted values with the actual values of life expectancy
algeria_fitted %>% 
  ggplot(aes(x = ___)) +
  geom_point(aes(y = ___)) + 
  geom_line(aes(y = ___), color = "red")

R Solution Code

# Build the augmented dataframe
algeria_fitted <- augment(algeria_model)

# Compare the predicted values with the actual values of life expectancy
algeria_fitted %>% 
  ggplot(aes(x = year)) +
  geom_point(aes(y = life_expectancy)) + 
  geom_line(aes(y = .fitted), color = "red")

SQL Sample Code

SELECT 
    -- Select the team long/short names
    ____,
    ____
FROM ____
WHERE 
    -- Exclude teams NOT IN the subquery
    team_api_id ___ __
    (____ DISTINCT hometeam_id  ____ match);

SQL Solution Code

SELECT 
    -- Select the team long/short names
    team_long_name,
    team_short_name
FROM team
WHERE 
    -- Exclude teams NOT IN the subquery
    team_api_id NOT IN
    (SELECT DISTINCT hometeam_id FROM match);