D Class of 2020
This was the first year for the new MSc in Psychological Research Methods with Data Science. PSY6422 is a key module for this course, and is designed to set students up for their independent data science project, completed over the summer.
Before they work on their summer projects, MSc students complete a mini-project on this module, and these are what is showcased below.
Most students came into this module with little or no previous experience of coding. The aim is to share powerful, flexible, computational methods which allow the production of reproducible analyses and useful data visualisations.
The module is taught via workshops, and the end point is to carry out and publish a data analysis and visualisation on a data set which interests you.
The class of 2020 was impacted by the coronavirus pandemic, which led to some classes being cancelled and the rest being moved online mid-module.
Here just a few examples of data visualisations produced by the class for their data mini-projects. Each project was different, and each student published their data, analysis code and visualisation as an online notebook (so please click the link for more details of each project).
Hala took open data from Sheffield city council and showed daily cycles in air pollution levels
Note the spikes in N02 levels during the morning and evening rush hours.
Nabiha looked at Students’ Performance in Exams
Note that these boxplots with jittered points allow you to see the group average but also the underlying data.
Rachel, looked at COVID-19 Deaths in England and produced a number of animated and interactive plots which really show off the data visualisation potential of R/ggplot, including this shocking plot of regional rates of death against deprivation, which shows that the death rate in the most deprived areas was twice that of the rate in the least deprived areas. Hover over each bar for details.
Yidan’s project mixed python and R and involved some highly technical work installing and querying an SQL database server. The topic, Skill acquisition of solo players: a case study revealed how practice mapped to skill level in players of an online strategy game:
This is the famous learning curve, showing that players improved with practice (but at a deccelerating rate).
My unofficial prize for aesthetics in data visualisation went to Katie’s project avocado prices, sales & distribution across the US between 2015-2018 which used a consistent, and avocado-themed, colour scheme throughout:
Adam’s project is a good illustration of the power of data visualisation - it can make salient something we might know but had not fully reckoned with. In his case, his project was on Changes in average house prices and incomes in England 1997-2019:
His graph of the change in the ratio of average house price to average income shows just how unaffordable housing has become.
Several projects looked at topics which interested the students and for which public data was available. For example, Victoria looked at the World Happiness Report 2019 data, (another) Katie looked at Caffeine Intake and Mental Health, and Eleanor looked at Professional CS:GO Matches and in particular at the performance of a professional team based in Sheffield.
Others projects used open data from published research, replicating and extending the analysis the original authors reproduced (Bessie, Esra, Sam, Leah)
These are just the projects which were completed and published publicly by students on the module this year (and who gave me permission to link to their work). I am immensely proud of all the work students did on this module, which was severely disrupted by the pandemic. However far you get with data analysis and visualisation there is always more you can learn, which can be daunting, but it can also be exciting because there are new things to try, new data to look and new things to find out.
Tom Stafford, July 2020
D.1 Feedback
Selected feedback from the class of 2020