- Database queries: SQL

10.15 Context

SQL, you can pronouce it S-Q-L or “sequel,” is a database system. It is old, in internet terms, but not going anywhere, and core to many data science jobs.

10.16 Key information

There are different flavours, such as MySQL which is used by WordPress websites, or PostgreSQL / postgres.

The common core of SQL is that that it is a ‘relational’ database management systems. It is used for storing information and saying how it is connected to other bits of information. Compare this to the spreadsheet/CSV, which stores information in a list.

Note that SQL is designed for databases with potentially billions of rows of data. It demands working habits where you can’t “look” at the data all at once, as you would with a spreadsheet.

SQL has it’s own terminology but to start off we will think about two things

  • Tables: these are rows and columns of data.
  • Queries: these are commands we use to select, pull out and/or combine tables

Tables have ‘fields’ (the variable names which define what information is stored, you may be used to thinking of these as the columns of your data)

Tables also have ‘records’ (the stored information, which you be used to thinking of as the rows of your data)

10.17 Exercises

In class, I will illustrate the key concepts of SQL with an example from my own research

Then we will look at this interactive turorial:

Specifically, we will do

10.18 Checklist

By the end of the class, you should

  • Know what SQL is and why it is important for data science
  • Understand key concepts: tables, queries, fields, rows
  • Recognise and solve problems involving SQL commands such as SELECT, FROM, WHERE, wildcard (%), JOIN, ON,

10.19 Resources