Module Overview

PSY6422 Data Management and Visualisation is part of the MSc in Psychological Research Methods with Data Science taught at The University of Sheffield by Tom Stafford. See here for more on the different courses which offer data science, at the University of Sheffield

Spring 2024: Updates coming soon for this semester. If you find any inconsistencies please let me know. Check out the showcases form previous years: class of 2021 showcase, class of 2022 showcase, class of 2023 showcase

0.1 Motivation

Psychological science is increasingly reliant on complex computational and statistical methods to make sense of rich behavioural data. This course aims to teach the skills which support creating robust and reproducibile analyses with such methods and data.

As well as supporting sophisticated data visualisation, we aim to train you in reproducible workflows - meaning that you can reliably re-create all steps of an analysis using scripts that automate all steps between raw data and the final visualisation.

As well as being reproducible (by you or other researchers) your work should be legible (to Future You, or other researchers) and scalable (it should work as well on 400,000 data points as on 40).

You will need help to do this. Therefore you will use Open Source solutions - these are analysis products which have a worldwide community of people using them, and the infrastructure which supports sharing advice and solutions.

In practice, this means you are going to start by using R (you could use Python, but this module is based on R).

0.2 Course Aims

The curriculum is updated each year, but you can get the general idea of the order the topics are covered from the leftbar. By the end of this course you will have:

  • Been trained in data project management – including fundamentals of data storage, syncronisation and sharing – and the importance of reproducible workflows
  • Used the statistical programming language R, and RStudio, for data management, analysis and visualisation
  • Been introduced to fundamental programming concepts
  • Prepared data project documentation using RMarkdown
  • Had an introduction to version control using git
  • Published data projects to the web via github pages

There is also the opportunity to cover advanced topics, either in class or as part of your project. These could include

  • Interactive visualisation with Shiny apps
  • SQL
  • webscraping
  • animated / roll-over visualisation

You may particularly enjoy the Reading list

0.2.1 The Module mini-project

The bulk of the course assessment is to conduct and publish your own analysis project. By doing this you will have experience of combining all the skills taught on the course within a singe project. This will take a data visualisation from start to finish - from raw data, through data cleaning and documentation to sharing your code and the resulting visualisation on the web.

The intention with the assessment is to ensure that every student leaves the course with something they are proud to put in their portfolio of work, something which shows what they can do and which helps with future job or course applications.

See here for more on the nature of your module project, see here for examples from previous years: class of 2020, class of 2021, class of 2022

0.3 Resources for current students

Google Drive:

Includes slides and other resources, as well as these specific documents

FAQ document which I am adding to as questions come in

Most information is on these pages (hosted on github, no login required)