SQL Notebook

Core

Theory

You want to store data in a structured manner? Relational database is the right answer. Modern society would not be there without databases. It allows to store insane amounts of data in a structured manner and allows easy data query and manipulation with SQL. SQL stands for Structured Query Language. You create tables, jam them with information, make sure that the table is at least in the 3rd normal form(database normalization) and go wild with the queries.

SQL is a bundle of four parts: DDL, DQL, DML and DCL.

This is a table like you would see in some excel file:

id item amount
1 egg 12
2 milk 2

This is the same table defined in SQL:

CREATE TABLE (
    id PRIMARY KEY,
    item TEXT,
    amount INTEGER
);

SQL vs relational algebra: Relations are sets without duplicates. SQL Tables are multisets and may contain duplicates. Look at the picture below and skip it with great pain:

relational algebra

DDL

DDL is used to define database structure. You create, modify and delete tables using three commands: CREATE, ALTER and DROP.

CREATE

All attributes are defined in CREATE statement. There are many different database implementations and they have different available data types that you can use for attributes. SQLite has NULL, INTEGER, REAL, TEXT and BLOB. PostgreSQL will have a whole magnitude more. Make sure you follow the documentation of your chosen database, or you will run into problems. It is good to pick the smallest available data type for your column, since smaller data types requires less storage. If you cannot fill your chosen data type, pick a smaller one. Unless you are planning to fill them up.

CREATE TABLE students (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT UNIQUE
)

-- Create table if it does not exist yet.
CREATE TABLE IF NOT EXISTS nuclear_warheads (<attrs>);

ALTER

Unless you are using SQLite, you can also modify existing tables!

-- Add semester column to students table.
ALTER TABLE students ADD COLUMN semester INTEGER;

DROP

-- Delete table
DROP TABLE important;

-- Delete table if it exists
DROP TABLE IF EXISTS vaults;

-- Delete table and all related tables !DANGEROUS!
DROP TABLES customers CASCADE;

-- Auto increment 
create sequence PersonId;
create sequence PersonId start with 1 increment by 1;
insert into Professoren values (nextval("PersonId"), "Alex", "A1");

Integrity Requirements

You can specify requirements for attributes in your table columns. There are PRIMARY KEY, FOREIGN KEY, NOT NULL, DEFAULT VALUE, CHECK, UNIQUE checks. Keys are discussed in previous block. Scroll up.

PRIMARY KEY

Primary keys are needed to…

You can specify keys in two ways, either directly appending it to the attribute, or using PRIMARY KEY () at the end of your creation statement. This way, you can join multiple attributes into joint primary key!

CREATE TABLE professors (
    id INTEGER PRIMARY KEY,
    name VARCHAR(80) NOT NULL
);


CREATE TABLE messages (
    id INTEGER, 
    chat_id INTEGER, 
    content TEXT, 
    PRIMARY KEY (id, chat_id)
);
FOREIGN KEY

Foreign keys are used to reference stuff, make stuff connected. You can specify multiple foreign keys. They can be used to trigger actions when the referenced table is updated. ON UPDATE, ON DELETE are such triggers. They work with modifiers SET NULL, CASCADE, RESTRICT, which are executed when those triggers happen. This is the reason you want foreign keys. If a professor quits, you can then delete lectures held by him or set the professor to NULL and then look for a new professor to replace the old one.

CREATE TABLE lectures (
    id INTEGER PRIMARY KEY,
    prof_id INTEGER NOT NULL,
    -- Connect lectures table with professors table using FK.
    FOREIGN KEY(prof_id) REFERENCES professors(id)
);

! If you are going to use sqlite3 in Python, make sure that you execute con.execute("PRAGMA foreign_keys = ON") before deleting stuff. Otherwise it will ignore ON DELETE and you will wonder why your schema was ignored ! I spend like 30 minutes trying to find bugs and typos before I found this answer.

AUTO INCREMENT

This varies from database to database. https://sqlite.org/autoinc.html

-- Autoincrement in SQLite.
CREATE TABLE products (
    id INTEGER AUTOINCREMENT,
    name TEXT UNIQUE,
    cost INTEGER NOT NULL,
    price INTEGER NOT NULL,
    made_in TEXT NOT NULL
);
NOT NULL

NOT NULL specifies that a value cannot be blank. When inserting into a table, leaving that attribute blank will raise error. Primary keys imply NOT NULL. You do not have to add it yourself.

CREATE TABLE students (
    id INTEGER PRIMARY KEY,
    name VARCHAR(80) NOT NULL,
);
DEFAULT VALUE

DEFAULT VALUE is a value that will be inserted if no value will be given during the insertion.

CREATE TABLE students (
    id INTEGER PRIMARY KEY,
    name VARCHAR(80) NOT NULL,
    -- Default value is the birthday you use on Steam.
    birthday DATE DEFAULT "01.01.1990"
);
CHECK

CHECK is used to check if the given value fits in the defined range. Imagine you are building a database of your fridge. There is no chance you can fit 999 bottles of beer in it.

CREATE TABLE fridge_contents (
    id INTEGER PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    amount INTEGER CHECK (amount > 0 and amount < 50)  
);

-- There are also more ways to check stuff:
CREATE TABLE fridge_contents (
    id INTEGER PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    amount INTEGER CHECK amount between 1 and 50 
);

CREATE TABLE ranks (
    id INTEGER PRIMARY KEY,
    rank VARCHAR(2) CHECK rank IN ("b1", "s1", "g1", "d1")
);

CHECK (id NOT IN (
            SELECT students.id 
            FROM students 
            WHERE module.id = listens.id)
      )
UNIQUE

UNIQUE makes sure that value is unique to one thing in your table. Columns with UNIQUE constraint may have one NULL value, but not more. It is similar to PRIMARY KEY, but you can define more than one UNIQUE column, but only one PRIMARY KEY.

CREATE TABLE students (
    id INTEGER PRIMARY KEY,
    name VARCHAR(80) NOT NULL,
    email VARCHAR(120) UNIQUE
);
ON UPDATE
ON DELETE
SET NULL

This can be used instead of CASCADE to set children rows to NULL where they were connected with FOREIGN KEY, instead of deleting the whole row. Data loss is ugly.

CASCADE

If you have tables that utilize FOREIGN KEY on some columns, you can delete those entries when referenced parent is deleted. You have a store that sells items? Item probably has a column with store identifier, where it is located. If you delete store you may want to delete those items. This does exactly this. But it is also very dangerous. You might want to sell those items after closing store? Also, making a lot of nested tables depending on each other may cause a havoc in your database. It is enough to have one of those and lose a lot of data.

-- Delete table and all related tables !DANGEROUS!
DROP TABLES customers CASCADE;
RESTRICT

DML

DML is used to manipulate data in the database. You have INSERT, ALTER, UPDATE, DELETE methods to work with your data. Since they are kinda self-explanatory, look at the examples below. I find that looking at many examples is a bit easier that reading theory:

INSERT

Inserting values… You declared a table, defined attributes and their modifiers. Now it is time to add some data. Well, you will probably do this from your program using some kind of ORM, but it is nice to know how it works.

-- If you are inserting all attributes, you can just pass an ordered tuple with the values. 
-- Otherwise, you need to tell the table what kind of values you are inserting.
INSERT INTO customers
VALUES ("Tom", "Joe", "Cool Street 4", "Badlands");

INSERT INTO customers (name, lastname)
VALUES ("Tom", "Joe");

-- You can also insert many values at once, just pass multiple tuples!
INSERT INTO customers
VALUES ("Tom", "Joe", "Cool Street 4", "Badlands"),
("Bob", "Tom", "Bad Street 111", "Badlands");

UPDATE

-- Change the street of our customer Bob Tom.
UPDATE customers
SET street = "Badder Street 111"
WHERE name == "Bob"
AND lastname == "Tom";

-- Examples with students table.
CREATE TABLE students (
    id INTEGER PRIMARY KEY,
    name VARCHAR(80) NOT NULL,
    email VARCHAR(120) UNIQUE,
    will_fail_math INTEGER DEFAULT 0,
    semester INTEGER DEFAULT 1
);

-- Update data in a table.
UPDATE students 
SET will_fail_math = 0;

UPDATE students 
SET semester = semester + 1;

UPDATE students 
SET semester = 15
WHERE name = "Old Bob";

DELETE FROM

Delete data from matching table. If you are not going to specify conditions, it will just nuke the whole table. DROP table deletes table and contents, DELETE FROM without conditions will leave the table empty.

-- Table used in examples.
CREATE TABLE students (
    id INTEGER PRIMARY KEY,
    name VARCHAR(80) NOT NULL,
    email VARCHAR(120) UNIQUE,
    will_fail_math INTEGER DEFAULT 0,
    semester INTEGER DEFAULT 1
);

-- Delete from a table, will delete everything!
DELETE FROM students;

-- Delete only certain data.
DELETE FROM students
WHERE will_fail_math = 1;

DML & DQL

WHERE

Specify requirements. If you want to get something specific, you need WHERE clause in your query.

-- id that is equal to 3.
WHERE id = 3;

-- All ids that are not 3.
WHERE id != 3;

-- All ids that are bigger than 3.
WHERE id > 3;

-- Values between two values.
WHERE id BETWEEN 1 AND 300;

-- More examples below.
SELECT manufacturer, city
FROM factories
WHERE country == "Germany";
LIKE

You can use LIKE on strings. https://www.w3schools.com/SQL/sql_like.asp

AND OR

Lets you join multiple filters.

-- Get all people that have Tom as last and first name, "Tom Tom".
SELECT *
FROM customers
WHERE first_name == "Tom"
AND last_name == "Tom";

-- Get all people that have Tom as last or first name.
SELECT *
FROM customers
WHERE first_name == "Tom"
OR last_name == "Tom";

-- () do matter! First example returns all people that have first name Tom 
-- and then all Toms that paid or have lastname Tom. Second query returns 
-- either people who are named Tom Tom or have paid for their orders!
SELECT *
FROM customers
WHERE first_name == "Tom"
AND (last_name == "Tom";
OR paid == 1);

SELECT *
FROM customers
WHERE (first_name == "Tom"
AND last_name == "Tom")
OR paid == 1;;

DQL

DQL is used to query from the database. This is the part of SQL you will be doing most of the time. Construting queries is art.

SELECT

* is a wildcard that selects everything!

-- Select specific columns:
SELECT id, amount FROM storage;

-- Select all rows with *, returns the whole table.
SELECT * FROM storage;

-- You can also temporary rename attributes in your result table. Is very handy when joining multiple tables to reduce bloat.
SELECT id as customer_id, name
FROm customers;

-- You can also modify values in return table, for example add VAT to price.
SELECT price * 1,21
FROM items;

SELECT *, cost - price AS revenue_per_item
FROM products;

-- You can capitalize return values.
select UPPER(name) from students where semester = 69;

-- Select from multiple tables.
select studentId, module 
from students, lectures 
where studentId = attendeeId 
and title = "Inf";
DISTINCT

Lets you get values without duplicates. Beware, if you select multiple columns with distinct, it will make sure that the whole tuple is distinct, not the separate items!

-- Select values without duplicates!
SELECT DISTINCT first_name FROM employees;

-- This will return all names, since ids are unique.
SELECT DISTINCT id, first_name FROM employees;

ORDER BY

You can sorty columns by their row values in ascending or descending order. You can sort multiple columns at once.

-- Select orders with more than 99 odered items and sort by city name in descending order.
SELECT *
FROM orders
WHERE amount >= 100
ORDER BY city DESC;

-- Order by name and lastname. If firstnames are equal, order by lastname.
SELECT *
FROM students
ORDER BY firstname, lastname ASC;

select name, price from inventory order by price desc, name asc;

LIMIT

Limit the amount of results you will retrieve from the query.

SELECT *
FROM customer
LIMIT 10;

NULL

You can look up values based on the fact that they are NULL.

-- Get all customers that do not have street set.
SELECT *
FROM customers
WHERE street IS NULL

Joins

xd0kgoexwon21

JOIN INNER JOIN
NATURAL JOIN
LEFT JOIN
RIGHT JOIN
FULL OUTER JOIN
-- join 
select Name, Titel
from Studenten, hoeren, Vorlesung
where Studenten.MatrNr = hoeren.MatrNr
and hoeren.MatrNr = Vorlesung.VorlNr;

-- example above can be written using join keywprd, returns the same result
select s.Name, v.Titel
from Studenten s
join hoeren h on s.MatrNr = h.MatrNr
join Vorlesung v on h.VorlNr = v.VorlNr

-- outer join
-- studentend auch ohne vorlesung
select s.Name, v.Titel
from Studenten s
left outer join hoeren h 
on s.MatrNr = h.MatrNr
left outer join Vorlesungen v 
on h.VorlNr = v.VorlNr;
-- vorlesungen auch ohne studenten
select s.Name, v.Titel
from Studenten s
join hoeren h 
on s.MatrNr = h.MatrNr
right outer join Vorlesungen v 
on h.VorlNr = v.VorlNr;
-- alternative mit klammerung
select s.Name, v.Titel
from Studenten s 
left outer join (hoerenh joinVorlesungen v onh.VorlNr= v.VorlNr) 
on s. MatrNr= h.MatrNr

Set Operations

UNION

No duplicates.

(select Name from Students) union (select Name from Professors)
UNION ALL

Duplicates are allowed.

INTERSECT
MINUS EXCEPT

Aggregation and Grouping

GROUP BY

SQL erzeugt pro eindeutiger Kombination von Attributwerten in der GROUP BY Klauser ein Ergebnistuple.

HAVING
AVG MAX MIN SUM

Die Aggregationsfunktionen sind avg, max, min, count, count(distinct), sum. Alle in der SELECT Klausel ausgeführten Attribute(außer aggregierte Werte) müssen auch in der GROUP BY Klauser ausgeführt werden!

-- Find out the average study time of students.
SELECT AVG(semester)
FROM students;

-- Find the cheapest products you sell.
SELECT name, MIN(price)
FROM products
ORDER BY MIN(price)
LIMIT 10;

-- Get total sum of all orders.
SELECT SUM(price) as estimated_income
FROM orders;
COUNT
-- Get all orders that have above average order price.
SELECT COUNT(id)
FROM orders
WHERE price > 
    (SELECT AVG(price) FROM orders);

Query Nesting

SQL allows nested queries with SELECT, FROM, WHERE, GROUP BY and HAVING.

IN

Lets you create subtables or sub-tables for your query and then select from them!

-- Prints out customer data for people that have the same name as customer with the id 44.
SELECT *
FROM customers
WHERE name 
    -- Select name of customer with id 44.
    IN (SELECT name FROM customers WHERE id = 44);

DCL

Nothing here yet.

Advanced

Stuff about building index and ORM?

SQLite

SQLite is my favorite database. Read the nicely done FAQ. It is just a file, you do not have to create any clients, setup a server, manage connections and etc. Just make a database file, use sqlite3 database driver in Python and go wild. But beware, it is not thread safe. INSERT, UPDATE, DELETE will lock the database, if you have multiple functions writing to one database from multiple processes - you will have problems.

DATE strftime

If you ever had a problem that required working with time, this is the right place. SQLite has a nice overview of those functions.

-- Lasft scrobbles per day
SELECT DATE(date), COUNT(DATE(date)) 
FROM lastfm_scrobbles
GROUP BY DATE(date) 
ORDER BY DATE(date) DESC

-- Lastfm Scrobbles by hour for the last 7 days.
SELECT strftime('%H', date) as hour, COUNT(strftime('%H', date)) 
FROM (
    SELECT * 
    FROM lastfm_scrobbles 
    WHERE date >= datetime('now', '-6 days')
    ORDER BY date DESC
    )
GROUP BY hour
ORDER BY hour DESC

GENERATED ALWAYS AS

https://sqlite.org/gencol.html

Value generation to ease your job. Can be generated as VIRTUAL or STORED. First one generates values on database read, without storing actual data in the file. Second will store the data after computing it on write. Decide for yourself. What is more expensive in your environment, storage or cpu time?

RANDOM

Random function returns an integer. Can be used to get random row from the table in a format like this:

-- Get a random product from products table.
SELECT *
FROM products
ORDER BY random()
LIMIT 1

WINDOW FUNCTIONS

Overview

Q & A

Examples

Examples from the real world to give you some more advanced examples to work with.

-- Discord bot shenanigans:
select count(author_id) as messages, author_name 
from messages
where server_id==705007538294226945 
group by author_name 
order by count(author_name) desc;

-- Count messages by channel_id. Returns a table of total messages per channel_id.
select count(*) as messages, channel_id 
from messages 
where server_id==705007538294226945
group by channel_id 
order by count(*) desc

-- Counts messages by author per channel.
select count(message_id), channel_name, author_name 
from messages inner JOIN channels on channels.channel_id=messages.channel_id 
group by author_name 
order by count(message_id) desc

-- Get reactions and total amount of those reactions.
select count(reaction_id), reaction_id 
from message_reactions
group by reaction_id
order by count(reaction_id) desc

-- Get messages by most reactions.
select count(messages.message_id), messages.message_id
from message_reactions inner join messages on messages.message_id == message_reactions.message_id
where server_id==686598539136073780 and is_deleted==0
group by messages.message_id
order by count(reaction_id) desc

-- Get top 10 messages by total reactions!
select count(), messages.author_name, messages.date, channels.channel_name
from message_reactions
    inner join messages on messages.message_id == message_reactions.message_id
    inner join channels on messages.channel_id == channels.channel_id
where server_id==686598539136073780 and is_deleted==0
group by messages.message_id
order by count() desc
limit 10

-- Extract firefox browsing history.
SELECT url, title, datetime((visit_date/1000000), 'unixepoch', 'localtime') AS visit_date 
FROM moz_places INNER JOIN moz_historyvisits on moz_historyvisits.place_id = moz_places.id 
ORDER BY visit_date;

-- Get messages by hour UTC of messages for last 7 days
SELECT strftime('%H', date) AS hour
    ,COUNT(strftime('%H', date))
FROM (
    SELECT *
    FROM messages
    WHERE date >= DATETIME (
            'now'
            ,'-6 days'
            )
        AND server_id == 686598539136073780
    ORDER BY DATE DESC
    )
GROUP BY hour
ORDER BY hour DESC

-- Get total messages per month?
SELECT strftime('%Y-%m', date) as f, COUNT(strftime('%Y-%m', date)) 
FROM messages
GROUP BY f
ORDER BY f DESC

-- heatmap aka messages per hour per day aka that fancy colored block graph 8)
SELECT strftime('%H', date) as hour, strftime('%w', date) as weekday, COUNT(strftime('%H', date)) as amount
from messages
group by hour, weekday

https://www.sqlitetutorial.net/