How to author Solution Code for DataLab projects
How to author Solution Code for DataLab projects
Written by George Boorman
Updated over a week ago

The Solution Code provides the end-to-end code used to solve the problem. It should conform to coding best practices and use methods/functions/techniques that learners will be familiar with if they have completed the prerequisite course(s). It should also include code comments throughout so that learners can see why this specific approach has been taken.

# R/Python

For R/Python projects the Solution Code will be a single `.R` or `.py` file that learners can copy and paste into the Workbook. Here is an example Solution Code for the Analyzing Crime in Los Angeles Python project:

`# Import required librariesimport pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as sns# Read in and preview the datasetcrimes = pd.read_csv("crimes.csv", parse_dates=["Date Rptd", "DATE OCC"], dtype={"TIME OCC": str})crimes.head()## Which hour has the highest frequency of crimes? Store as an integer variable called peak_crime_hour# Extract the first two digits from "TIME OCC", representing the hour,# and convert to integer data typecrimes["HOUR OCC"] = crimes["TIME OCC"].str[:2].astype(int)# Preview the DataFrame to confirm the new column is correctcrimes.head()# Produce a countplot to find the largest frequency of crimes by hoursns.countplot(data=crimes, x="HOUR OCC")plt.show()# Midday has the largest volume of crimepeak_crime_hour = 12## Which area has the largest frequency of night crimes (crimes committed between 10pm and 4am)? ## Save as a string variable called peak_night_crime_location# Filter for the night-time hoursnight_time = crimes[crimes["HOUR OCC"].isin([10,11,12,1,2,3,4])]# Group by "AREA NAME" and count occurrences, filtering for the largest value and saving the "AREA NAME"peak_night_crime_location = night_time.groupby("AREA NAME",                                                as_index=False)["HOUR OCC"].count().sort_values("HOUR OCC", ascending=False).iloc[0]["AREA NAME"]                                                                                     # Print the peak night crime locationprint(peak_night_crime_location)## Identify the number of crimes committed against victims by age group (<18, 18-25, 26-34, 35-44, 45-54, 55-64, 65+) ## Save as a pandas Series called victim_ages# Create bins and labels for victim age rangesage_bins = [0, 17, 25, 34, 44, 54, 64, np.inf]age_labels = ["<18", "18-25", "26-34", "35-44", "45-54", "55-64", "65+"]# Add a new column using pd.cut() to bin values into discrete intervalscrimes["Age Bracket"] = pd.cut(crimes["Vict Age"],                               bins=age_bins,                               labels=age_labels)# Find the category with the largest frequencyvictim_ages = crimes["Age Bracket"].value_counts()print(victim_ages)`

# SQL

For PostgreSQL, Snowflake, Redshift, or BigQuery projects the Solution Code will be a single `.SQL` file. As SQL projects generally contain more than one SQL code cell, each query within the Solution Code must be separated by a semi-colon `;`. Here is example of Solution Code for the Exploring London's Travel Network Snowflake project:

`-- most_popular_transport_types-- Copy code into first SQL cellSELECT JOURNEY_TYPE, 	SUM(JOURNEYS_MILLIONS) as TOTAL_JOURNEYS_MILLIONSFROM TFL.JOURNEYSGROUP BY JOURNEY_TYPEORDER BY TOTAL_JOURNEYS_MILLIONS DESC;-- emirates_airline_popularity-- Copy code into second SQL cellSELECT MONTH, 	YEAR, 	ROUND(JOURNEYS_MILLIONS,2) AS ROUNDED_JOURNEYS_MILLIONSFROM TFL.JOURNEYSWHERE JOURNEY_TYPE = 'Emirates Airline' AND ROUNDED_JOURNEYS_MILLIONS IS NOT NULLORDER BY ROUNDED_JOURNEYS_MILLIONS DESCLIMIT 5;-- least_popular_years_tube-- Copy code into third SQL cellSELECT YEAR,	JOURNEY_TYPE,	SUM(JOURNEYS_MILLIONS) as TOTAL_JOURNEYS_MILLIONSFROM TFL.JOURNEYSWHERE JOURNEY_TYPE LIKE '%Underground%'GROUP BY YEAR, JOURNEY_TYPEORDER BY TOTAL_JOURNEYS_MILLIONSLIMIT 5;`