Interpreting Code Diffs

Learn how to interpret learner submissions.

Amy Peterson avatar
Written by Amy Peterson
Updated over a week ago

The Content Dashboard shows you the "diffs", or differences, between the code that the student submitted and the code in the expected solution. Examining these is best done after you have identified an exercise with a high "ask hint" or "ask solution" metric via the exercise-level quality metrics.

Accessing code diffs

In the Incorrect Attempts tool, each row represents an incorrect solution by one or more learners. Click "View" to see the diff.

The "Incorrect Attempts" tool on the "View Courses" pane of the Content Dashboard. Each row has a link named "View" to access the code diff tool.

You can also see diffs in the Issues and "Feedback (SCT/Hint)" tools. If you access diffs from those tools, you may see examples where the learner's submission was marked as correct.

Example 1: basic usage

Here's an example diff from an exercise in Intermediate Python.

An example code diff from Intermediate Python. There are two lines different in the learner submission compared to the solution.
  • White lines are the same in both the learner submission and the solution.

  • Red lines are in the solution but not the learner submission.

  • Green lines are in the learner submission but not the solution.

In line 8, the learner didn't use spaces around the times operator. This change should not have caused an incorrect submission, and can be ignored.

In line 11 of the solution (line 10 of the learner submission), the learner passed the wrong value to the s argument of plt.scatter(). This would have caused an incorrect submission.

To follow up on this, the instructor should check the corresponding instruction and hint to make sure that it is clear which variable should be used at this point in the exercise.

Important takeaways

  • Not every difference causes the student’s answer to be counted as incorrect.

  • Check the instructions and hints are clear.

Example 2: case sensitivity

Here's an example from Introduction to SQL.

An example code diff from Introduction to SQL. The whole of the learner submission appears to be different to the solution due to differences in case for SQL keywords.

Diffs are case sensitive. This is necessary for case-sensitive languages like Python and R, but SQL is mostly case insensitive, so be wary for things that appear to be big differences but should not affect the correctness of the code.

In this case, the case of the keywords SELECT, FROM and WHERE would not affect the correctness of the submission.

The only problem with the learner's submission is the missing clause about release_year on line 3. To follow-up, check the instructions and hints to make sure they clearly specify how to filter the rows of the table.

Important takeaways

  • Differences in case can appear to create large differences from the solution that are actually unimportant.

Example 3: language-specific issues and interconnected exercises

Here's an example from Introduction to PySpark.

An example code diff from Introduction to PySpark. Two lines are different in the learner submission compared to the solution.

On line 2, notice that the learner used single quotes rather than double quotes. In Python, it is acceptable to use either type of quote to denote string variables, so this change should not affect the correctness of the solution. In other languages like SQL, this could be a genuine problem.

On line 5, the learner passed the wrong column of the flights data frame to the .sum() method, which is the cause of the problem. Again, the follow-up task is to check the instructions and hints make sure that the exercise is clear.

It may also be useful to go back to earlier exercises where the dataset is described to make sure that learners understand what is contained in each column, or even to rename columns in the dataset to improve clarity.

Important takeaways

  • Pay attention to small changes like punctuation.

  • Whether or not a highlighted change is important or not depends on the programming language of the exercise.

  • Diffs may reveal problems with earlier exercises or with datasets.

Example 4: incorrect rejections

Here's another example from Introduction to SQL.

An example code diff from Introduction to SQL. The learner submission has differences in spacing, in casing, and in the calculation of an average.

This example was marked as incorrect, but appears to give the correct answer.

When correct submissions are marked as incorrect, there is often a problem with the submission correctness tests (SCTs). The Content team can rectify the problem, but before you contact them, you need to do some digging to determine the cause of the problem.

In order to diagnose the issue, you need to open the exercise and try running variations on the code, either in Teach Preview or in Campus.

Your first guess might be that the problem is with the difference in spacing. Try running a version of the code with the same spacing as the solution, but the rest of the code the same as the learner submission. (This will pass.)

Your second guess might be that the problem is with the difference in case for the SQL keywords. Try running a version of the code with the same casing as the solution, but the rest of the code the same as the learner submission. (This will pass too.)

Your third guess might be that it involves the way the average is calculated. Try swapping the order of the division and the call to AVG(). This time, tiny differences in the solution due to floating point rounding issues emerge, causing the rejection.

Important takeaways

  • If a learner submission appears to be incorrectly rejected, try to diagnose the cause of the rejection, then contact @datacamp-contentquality over GitHub to request changes to the SCTs.

Did this answer your question?