This article lays out some guidelines on writing a good SCT that allows for different ways of solving the problem but gives targeted and actionable feedback in case of a mistake. Before you read this article, I suggest you read this article for a basic introduction to SCTs, and this one for a deeper dive into how SCTs work behind the scenes.

Inspecting results > inspecting output > inspecting actual code

The SCT packages (testwhat , pythonwhat , sqlwhat , shellwhat , and sheetwhat) access three pieces of information: the student process, the output the student generated, and the code they wrote, plus their solution counterparts. When you check the student process for the correct variables, or whether an expression evaluates correctly in the student process, you should not check _how_ students got there; you should check _whether_ they got there. This approach makes more robust SCTS. Code-based checks, on the other hand, or more restrictive. They expect the student to type something, and are not flexible. More specifically:

  • Verify the contents of an object rather than the code to define that object.
  • Verify the output of the student rather than the code that generated that printout
  • Verify the result of calling a function rather than the arguments used to call that function.
  • Etc.

Use check_correct() whenever it makes sense

The seemingly opposite requirements of robustness to different solutions versus targeted feedback can be satisfied by using check_correct(). This function takes two sets of tests: 'checking' tests, and 'diagnosing' tests. Checking tests verify the end result, while diagnosing tests dive deeper to look at the mistakes a student made. If the checking tests pass, the typically more restrictive diagnosing tests are not executed. If the checking tests fail, the diagnosing tests are executed, which will give more detailed feedback. This allows the student flexibility in coding up the end result, but also gives specific feedback when a mistake is made. You typically want your checking tests to be process-based checks, while your diagnosing tests can be more code-based.

Be frugal

With every SCT you write, you add another check that students can bump into, causing their exercise to not pass. For every SCT function you add, try to think for yourself whether you need this test and whether it tests what is asked of the student. Does it really matter if they do that printout, or is it a nice-to-have? Is it absolutely necessary that they specify the exact same plot title, or is just specifying a title argument enough? Do you really have to check the existence of an intermediate variable if the end result is okay?

Depend on automatic feedback messages as much as you can

The automatically generated messages know what the solution is, know exactly what the student did, and can give tailored feedback because of it. We are constantly working to improve the feedback mechanisms. If you do not specify custom messages, your course's SCT automatically leverage these automatically generated messages. If you want to give very specific hints related to the course content or point to a common mistake that 90% of students get wrong, it makes sense to use custom feedback messages.

Think like the student: what will be the most common mistakes they make?

You can shape your SCT differently or use different SCT functions depending on how you think students will make mistakes.

Follow common style to make your SCTs readable and self-documenting

When written well and formatted properly, SCTs are succinct and easy to read. It's self-documenting testing code. In R, use the %>% syntax to chain together your SCT calls. In Python, use the . and multi() functions  to show your intentions clearly. Extra comments can make sense when chopping things up for large exercise or when you want to explain a workaround for a corner case, but every comment you write can also become outdated, which can make things confusing.

Did this answer your question?