The first ten topics of this course represent the core that I want every student to take away from the course. With these topics you should be equipped to focus on your personal data project for which makes up the majority of the grade on the course.
The remaining time in the timetable is available for supervision on this data project, and to cover more advanced topics. For these topics, rather than provide a lecture, we will work in small groups to follow exercises and tutorials which exist outside of the course. The motivation for this is two-fold. First, because all your future work will be teamwork, I want students to graduate from this course with experience working together on technical projects. Whether you are more or less confident in the technical requirements, you will learn a lot from trying to share what you know or think you know with a group. Second, a key skill for ongoing development in data science is to teach yourself. There world is rich in useful advice, tutorials and examples. Only by discovering how you can use these will you maximise your potential after you have finished this module.
In 2021 we voted on the advanced topics to cover in class:
- regexone.com Learn Regular Expressions with simple, interactive exercises.
- Performance tuning: Code performance in R: Parallelization
- Making Your First R Package
- Package management
- Reproducible workflows
- Targets (was Drake)
- e.g. docker
- Strand, J. F. (2021, March 31). Error Tight: Exercises for Lab Groups to Prevent Research Mistakes. https://doi.org/10.31234/osf.io/rsn5y
You should use a password manager. Really. LastPass is recommended
You should know how to use a VPN
- Advice for Sheffield students here: Working remotely - information for students
- You can test by visiting this page. If you are in the University network (either on campus or via a VPN) it will offer you the chance to download the article PDF. Otherwise it will try and charge you $35 for the privilege.