All Collections
Projects
Creating Your Project
DataLab project Instructions: Best Practice
DataLab project Instructions: Best Practice
George Boorman avatar
Written by George Boorman
Updated yesterday

The Instructions outline the challenge the learner needs to complete and provide details about the format that the answer needs to be provided in, i.e., the name of the variable(s), the number (and names) of columns, data type(s), and any quality restrictions such as missing values. When writing this section, think about the minimum information that learners will require in order to satisfy the SCTs (see this article for more information) and successfully complete the project.

It should focus on information to understand what is expected of them along with the answer format. It does not include information about how to complete the project (this should only be included in the Guides and Resources.

The vision is that learners who have completed the prerequisite course(s) or developed equivalent knowledge and skills elsewhere will be capable of completing the project based solely on reading the Context and Instructions, rather than relying on other sections such as Guides or Resources. The approach of their solution may not align with the Solution Code developed, but as long as the Submission Correctness Tests (SCTs) are met (put another way, the quality criteria for the output(s) are satisfied) then the attempt would be considered successful and the learner deemed to have demonstrated the Knowledge, Skills, and Abilities (KSAs) for the project.

Good ideas

Write 1-4 sentences explicitly detailing what learners must achieve. This should not detail how they complete the project.

Be specific about output/variable requirements. If learners must produce a string variable called peak_performance then let them know this information, otherwise they might not pass the project's Submission Correctness Tests (SCTs). Likewise, if they must create a pandas DataFrame with specific columns then this information needs to be included.

Use one Instruction, in bullet point format, for each key output that learners will need to produce.

Limit to five Instructions. The best practice is one to three Instructions. If you need more than five, then the project is likely too complex or too large, or you are being too prescriptive in the Instructions.

Utilize markdown syntax. For example, if you are referring to a variable, use backticks to render the variable's name as code so it is clear to learners that this is part of the syntax they will need to use.

Common problems and their solutions

Being too vague. While instructions should allow learners to interpret what is needed and action in their own way, it is important to be specific about what the outcome should be in terms of outputs and performance requirements. For example, rather than saying “Build a model to predict Y and evaluate the model’s performance” you should say “Build a model, stored as classification_model, to predict the target Y with an f1 score of at least 0.75.”.

Copy and paste. While it might be appropriate to include certain syntax, such as referring to a column name, you should avoid providing so much detail that learners can copy the instructions to execute the code required.

Example Python Project Instructions

The following are Instructions from Analyzing Crime in Los Angeles:

The LAPD has asked you to help them by finding answers to the following questions:

  • Which hour has the highest frequency of crimes? Store as an integer variable called peak_crime_hour.

  • Which area has the largest frequency of night crimes (crimes committed between 10pm and 4am)? Save as a string variable called peak_night_crime_location.

  • Identify the number of crimes committed against victims by age group (<18, 18-25, 26-34, 35-44, 45-54, 55-64, 65+). Save as a pandas Series called victim_ages.

Example SQL project Instructions

The following are Instructions from Exploring London's Travel Network:

Write three SQL queries to answer the following questions:

  • What are the most popular transport types, measured by the total number of journeys? The output should contain two columns, 1) JOURNEY_TYPE and 2) TOTAL_JOURNEYS_MILLIONS, and be sorted by the second column in descending order. Save the query as most_popular_transport_types.

  • Which five months and years were the most popular for the Emirates Airline? Return an output containing MONTH, YEAR, and JOURNEYS_MILLIONS, with the latter rounded to two decimal places and aliased as ROUNDED_JOURNEYS_MILLIONS. Exclude null values and save the result as emirates_airline_popularity.

  • Find the five years with the lowest volume of Underground & DLR journeys, saving as least_popular_years_tube. The results should contain the columns YEAR, JOURNEY_TYPE, and TOTAL_JOURNEYS_MILLIONS.

Did this answer your question?