- Database queries: SQL

10.14 Context

SQL, you can pronouce it S-Q-L or “sequel”, is a database system. It is old, in internet terms, but not going anywhere, and core to many data science jobs.

10.15 Key information

There are different flavours, such as MySQL which is used by WordPress websites, or PostgreSQL / postgres.

The common core of SQL is that that it is a ‘relational’ database management systems. It is used for storing information and saying how it is connected to other bits of information. Compare this to the spreadsheet/CSV, which stores information in a list.

Note that SQL is designed for databases with potentially billions of rows of data. It demands working habits where you can’t “look” at the data all at once, as you would with a spreadsheet.

SQL has it’s own terminology but to start off we will think about two things

Tables: these are rows and columns of data.
Queries: these are commands we use to select, pull out and/or combine tables

Tables have ‘fields’ (the variable names which define what information is stored, you may be used to thinking of these as the columns of your data)

Tables also have ‘records’ (the stored information, which you be used to thinking of as the rows of your data)

10.16 Exercises

In class, I will illustrate the key concepts of SQL with an example from my own research

Stafford, T. (2018). Female chess players outperform expectations when playing men. Psychological Science, 29(3), 429-436. (Preprint).

Then we will look at this interactive turorial:

sqlzoo.net

Specifically, we will do

SELECT basics https://sqlzoo.net/wiki/SELECT_basics
JOIN https://sqlzoo.net/wiki/The_JOIN_operation

10.17 Checklist

By the end of the class, you should

Know what SQL is and why it is important for data science
Understand key concepts: tables, queries, fields, rows
Recognise and solve problems involving SQL commands such as SELECT, FROM, WHERE, wildcard (%), JOIN, ON,

10.18 Resources

Select Star SQL - interactive tutorial
w3schools.com JOIN
Interactive tutorial sqlzoo.net/wiki/SELECT_basics
SQL Database access dplyr
SQLite is great for R and Shiny. The dbmisc package may help a bit.
example SQL data from Kaggle: SQLLite database on Mental Health in the Tech Industry
askedith.ai - natural language interface to SQL