- Python

This course teaches general principles of coding and computation, and specific skills for data management and visualisation in R. Lots of people in the data science world, particularly in areas which align with computer science / machine learning, use Python.

I decided to teach this course in R because the community around a language is as important as the language. You can do anything in any language, and most things as easily in Python as in R, but Psychologists, biologists and scholars from across the social sciences tend to use R, so I’m teaching this course in R.

10.19 Motivation

In my very prejudiced opinion, there are three reasons to prefer Python to R. These are

  • Conda, for package, dependency and environment management
  • Pandas, the data wrangling toolkit
  • Certain toolkits,
    • e.g. Machine Learning tools. Computer Scientists tend to use Python, so the latest in ML, Deep Learning etc is more accessible if you use Python (e.g. PyTorth, scikit-learn)
    • e.g. Running experiments, using psychopy

I’d probably also do anything involving talking to hardware, an API, web-scraping or bulk text processing in Python, but this may just be prejudice. Python is a good interstitial language.

10.20 A worked example

To showcase python, we’ll be trying to reproduce an analysis by Oli Hawkins, showing off a support-vector machine classification of Penguin data

repo: github.com/olihawkins/penguin-models

10.21 Preparation

Before the class, please

  • install Python. note Python underwent a major update from version 2 to 3. Make sure you install version 3+ of python.

  • install miniconda

    • You MAY have installed python and conda already, when you installed R, if you did this via the Anaconda distribution.
  • clone the penguin-model repo

  • create a conda environment, following the instructions in the repo README.md

    conda env create -f environment.yml

10.22 In class

We will use the Spyder IDE

conda install spyder

I will showcase some features of python, focusing on things which are the same or frustratingly different from R

  • for loops
  • zero indexing

This is useful if you can’t manage to install python: Google Colaboratory

We will do some basic data wrangling with pandas and seaborn

conda install seaborn

We will run the models, and explore the data and code, focusing on hacking a complex, existing, project

10.23 After class

Oli strongly recommends Aurelien Geron’s 2019 book Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow

If you want to learn Python, I recommend:

Other resources: