This course teaches general principles of coding and computation, and specific skills for data management and visualisation in R. Lots of people in the data science world, particularly in areas which align with computer science / machine learning, use Python.
I decided to teach this course in R because the community around a language is as important as the language. You can do anything in any language, and most things as easily in Python as in R, but Psychologists, biologists and scholars from across the social sciences tend to use R, so I’m teaching this course in R.
In my very prejudiced opinion, there are three reasons to prefer Python to R. These are
- Conda, for package, dependency and environment management
- Pandas, the data wrangling toolkit
- Certain toolkits,
I’d probably also do anything involving talking to hardware, an API, web-scraping or bulk text processing in Python, but this may just be prejudice. Python is a good interstitial language.
To showcase python, we’ll be trying to reproduce an analysis by Oli Hawkins, showing off a support-vector machine classification of Penguin data
Before the class, please
install Python. note Python underwent a major update from version 2 to 3. Make sure you install version 3+ of python.
- You MAY have installed python and conda already, when you installed R, if you did this via the Anaconda distribution.
clone the penguin-model repo
create a conda environment, following the instructions in the repo README.md
conda env create -f environment.yml
We will use the Spyder IDE
conda install spyder
I will showcase some features of python, focusing on things which are the same or frustratingly different from R
- for loops
- zero indexing
This is useful if you can’t manage to install python: Google Colaboratory
We will do some basic data wrangling with pandas and seaborn
conda install seaborn
We will run the models, and explore the data and code, focusing on hacking a complex, existing, project
Oli strongly recommends Aurelien Geron’s 2019 book Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow
If you want to learn Python, I recommend:
- Resource list: Python for Non-Programmers
- Pages RealPython: Learn Python Programming, By Example
- Book: Python Data Science Handbook by Jake VanderPlas
- Book: Python for Data Analysis by Wes McKinney
- Pages: chrisalbon.com
- Pages: TowardsDataScience, for example this one A Quick Introduction to the “Pandas” Python Library
- Book: Python for Experimental Psychologists by Edwin Dalmaijer
- Pages: w3Schools
- Tutorial: WTF Python (“Exploring and understanding Python through surprising snippets”)
- Book: Scientific Visualization – Python & Matplotlib, Nicolas P. Rougier
- Nicolas P. Rougier: Matplotlib tutorial