Module Overview

PSY6422 Data Management and Visualisation is part of the MSc in Psychological Research Methods with Data Science taught at The University of Sheffield by Tom Stafford. See here for more on data science @ Sheffield

These are placeholder pages. In 2020 most of this material was delivered offline. I am adding notes online as I can, so these pages in particular may evolve quickly

0.1 Motivation

Psychological science is increasingly reliant on complex computational and statistical methods to make sense of rich behavioural data. This course aims to teach the skills which support creating robust and repoducibile analyses with such methods and data.

As well as supporting sophisticated data visualisation, we aim to train you in reproducible workflows - meaning that you can reliably re-create all steps of an analysis using scripts that automate all steps between raw data and the final visualisation.

As well as being reproducible (by you or other researchers) your work should be legible (to Future You, or other researchers) and scalable (it should work as well on 400,000 data points as on 40).

You will need help to do this. Therefore you will use Open Source solutions - these are analysis products which have a worldwide community of people using them, and the infrastructure which supports sharing advice and solutions.

In practice, this means you are going to start by using R (you could use Python, but this module is based on R).

0.2 Course Aims

By the end of this course you will have:

  • Been trained in data project management – including fundamentals of data storage, syncronisation and sharing – and the importance of reproducible workflows
  • Used the statical programming language R, and RStudio, for data management, analysis and visualisation
  • Been introduced to fundamental programming concepts
  • Prepared data project documentation using RMarkdown
  • Had an introduction to version control using git
  • Published data projects to the web via github pages

0.2.1 Slides

Slides from this class are on the google drive: slides format, pdf format

0.2.2 The Final project

The bulk of the course assessment is to conduct and publish your own analysis project. By doing this you will have experience of combining all the skills taught on the course within a singe project. This will take a data visualisation from start to finish - from raw data, through data cleaning and documentation to sharing your code and the resulting visualisation on the web.

See here for more on the nature of your final project

0.3 Resources for current students

Google Drive (UoS login required to access):

Includes slides and workbooks, as well as these specific documents

And of course these pages (hosted on github, no login required)

You may particularly enjoy the Reading list

0.4 Course Outline

In 2020 we are covering a compressed curriculum. You can see the class topics in the leftbar

0.4.1 Stretch goals:

Unfortunately we won’t have time this year for a number of advanced topics which I would like to cover. Hopefully next year:

  • Jupyter notebooks
  • The terminal / ssh
  • Interactive visualisation with Shiny apps
  • SQL

0.5 More