PSY6422 Data Management and Visualisation
PSY6422 Data Management and Visualisation is part of the MSc in Psychological Research Methods with Data Science taught at The University of Sheffield by Tom Stafford. See here for more on the different courses which offer data science, at the University of Sheffield
Spring 2022: These pages are evolving as I teach the course (in person rather than online this year!). If you find any inconsistencies please let me know. Check out the class of 2021 showcase from last academic year
Psychological science is increasingly reliant on complex computational and statistical methods to make sense of rich behavioural data. This course aims to teach the skills which support creating robust and repoducibile analyses with such methods and data.
As well as supporting sophisticated data visualisation, we aim to train you in reproducible workflows - meaning that you can reliably re-create all steps of an analysis using scripts that automate all steps between raw data and the final visualisation.
As well as being reproducible (by you or other researchers) your work should be legible (to Future You, or other researchers) and scalable (it should work as well on 400,000 data points as on 40).
You will need help to do this. Therefore you will use Open Source solutions - these are analysis products which have a worldwide community of people using them, and the infrastructure which supports sharing advice and solutions.
In practice, this means you are going to start by using R (you could use Python, but this module is based on R).
The curriculum is updated each year, but you can get the general idea of the order the topics are covered from the leftbar. By the end of this course you will have:
- Been trained in data project management – including fundamentals of data storage, syncronisation and sharing – and the importance of reproducible workflows
- Used the statistical programming language R, and RStudio, for data management, analysis and visualisation
- Been introduced to fundamental programming concepts
- Prepared data project documentation using RMarkdown
- Had an introduction to version control using git
- Published data projects to the web via github pages
There is also the opportunity to cover advanced topics, either in class or as part of your project. These could include
- Interactive visualisation with Shiny apps
- animated / roll-over visualisation
You may particularly enjoy the Reading list
The bulk of the course assessment is to conduct and publish your own analysis project. By doing this you will have experience of combining all the skills taught on the course within a singe project. This will take a data visualisation from start to finish - from raw data, through data cleaning and documentation to sharing your code and the resulting visualisation on the web.
The intention with the assessment is to ensure that every student leaves the course with something they are proud to put in their portfolio of work, something which shows what they can do and which helps with future job or course applications.
Includes slides and other resources, as well as these specific documents
FAQ document which I am adding to as questions come in
Most information is on these pages (hosted on github, no login required)