You want to store data in a structured manner? Relational database is the right answer. Modern society would not be there without databases. It allows to store insane amounts of data in a structured manner and allows easy data query and manipulation with SQL. SQL stands for Structured Query Language. You create tables, jam them with information, make sure that the table is at least in the 3rd normal form(database normalization) and go wild with the queries.
SQL is a bundle of four parts: DDL, DQL, DML and DCL.
- DDL: Data Definition Language
- DQL: Data Query Language
- DML: Data Manipulation Language
- DCL: Data Control Language
This is a table like you would see in some excel file:
This is the same table defined in SQL:
CREATE TABLE ( id PRIMARY KEY, item TEXT, amount INTEGER );
SQL vs relational algebra: Relations are sets without duplicates. SQL Tables are multisets and may contain duplicates. Look at the picture below and skip it with great pain:
DDL is used to define database structure. You create, modify and delete tables using three commands: CREATE, ALTER and DROP.
All attributes are defined in CREATE statement. There are many different database implementations and they have different available data types that you can use for attributes. SQLite has NULL, INTEGER, REAL, TEXT and BLOB. PostgreSQL will have a whole magnitude more. Make sure you follow the documentation of your chosen database, or you will run into problems. It is good to pick the smallest available data type for your column, since smaller data types requires less storage. If you cannot fill your chosen data type, pick a smaller one. Unless you are planning to fill them up.
CREATE TABLE students ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, email TEXT UNIQUE ) -- Create table if it does not exist yet. CREATE TABLE IF NOT EXISTS nuclear_warheads (<attrs>);
Unless you are using SQLite, you can also modify existing tables!
-- Add semester column to students table. ALTER TABLE students ADD COLUMN semester INTEGER;
-- Delete table DROP TABLE important; -- Delete table if it exists DROP TABLE IF EXISTS vaults; -- Delete table and all related tables !DANGEROUS! DROP TABLES customers CASCADE; -- Auto increment create sequence PersonId; create sequence PersonId start with 1 increment by 1; insert into Professoren values (nextval("PersonId"), "Alex", "A1");
You can specify requirements for attributes in your table columns. There are PRIMARY KEY, FOREIGN KEY, NOT NULL, DEFAULT VALUE, CHECK, UNIQUE checks. Keys are discussed in previous block. Scroll up.
Primary keys are needed to…
You can specify keys in two ways, either directly appending it to the attribute, or using PRIMARY KEY () at the end of your creation statement. This way, you can join multiple attributes into joint primary key!
CREATE TABLE professors ( id INTEGER PRIMARY KEY, name VARCHAR(80) NOT NULL ); CREATE TABLE messages ( id INTEGER, chat_id INTEGER, content TEXT, PRIMARY KEY (id, chat_id) );
Foreign keys are used to reference stuff, make stuff connected. You can specify multiple foreign keys. They can be used to trigger actions when the referenced table is updated. ON UPDATE, ON DELETE are such triggers. They work with modifiers SET NULL, CASCADE, RESTRICT, which are executed when those triggers happen. This is the reason you want foreign keys. If a professor quits, you can then delete lectures held by him or set the professor to NULL and then look for a new professor to replace the old one.
CREATE TABLE lectures ( id INTEGER PRIMARY KEY, prof_id INTEGER NOT NULL, -- Connect lectures table with professors table using FK. FOREIGN KEY(prof_id) REFERENCES professors(id) );
! If you are going to use sqlite3 in Python, make sure that you execute
con.execute("PRAGMA foreign_keys = ON") before deleting stuff. Otherwise it will ignore ON DELETE and you will wonder why your schema was ignored ! I spend like 30 minutes trying to find bugs and typos before I found this answer.
This varies from database to database. https://sqlite.org/autoinc.html
-- Autoincrement in SQLite. CREATE TABLE products ( id INTEGER AUTOINCREMENT, name TEXT UNIQUE, cost INTEGER NOT NULL, price INTEGER NOT NULL, made_in TEXT NOT NULL );
NOT NULL specifies that a value cannot be blank. When inserting into a table, leaving that attribute blank will raise error. Primary keys imply NOT NULL. You do not have to add it yourself.
CREATE TABLE students ( id INTEGER PRIMARY KEY, name VARCHAR(80) NOT NULL, );
DEFAULT VALUE is a value that will be inserted if no value will be given during the insertion.
CREATE TABLE students ( id INTEGER PRIMARY KEY, name VARCHAR(80) NOT NULL, -- Default value is the birthday you use on Steam. birthday DATE DEFAULT "01.01.1990" );
CHECK is used to check if the given value fits in the defined range. Imagine you are building a database of your fridge. There is no chance you can fit 999 bottles of beer in it.
CREATE TABLE fridge_contents ( id INTEGER PRIMARY KEY, name VARCHAR(100) NOT NULL, amount INTEGER CHECK (amount > 0 and amount < 50) ); -- There are also more ways to check stuff: CREATE TABLE fridge_contents ( id INTEGER PRIMARY KEY, name VARCHAR(100) NOT NULL, amount INTEGER CHECK amount between 1 and 50 ); CREATE TABLE ranks ( id INTEGER PRIMARY KEY, rank VARCHAR(2) CHECK rank IN ("b1", "s1", "g1", "d1") ); CHECK (id NOT IN ( SELECT students.id FROM students WHERE module.id = listens.id) )
UNIQUE makes sure that value is unique to one thing in your table. Columns with UNIQUE constraint may have one NULL value, but not more. It is similar to PRIMARY KEY, but you can define more than one UNIQUE column, but only one PRIMARY KEY.
CREATE TABLE students ( id INTEGER PRIMARY KEY, name VARCHAR(80) NOT NULL, email VARCHAR(120) UNIQUE );
This can be used instead of CASCADE to set children rows to NULL where they were connected with FOREIGN KEY, instead of deleting the whole row. Data loss is ugly.
If you have tables that utilize FOREIGN KEY on some columns, you can delete those entries when referenced parent is deleted. You have a store that sells items? Item probably has a column with store identifier, where it is located. If you delete store you may want to delete those items. This does exactly this. But it is also very dangerous. You might want to sell those items after closing store? Also, making a lot of nested tables depending on each other may cause a havoc in your database. It is enough to have one of those and lose a lot of data.
-- Delete table and all related tables !DANGEROUS! DROP TABLES customers CASCADE;
DML is used to manipulate data in the database. You have INSERT, ALTER, UPDATE, DELETE methods to work with your data. Since they are kinda self-explanatory, look at the examples below. I find that looking at many examples is a bit easier that reading theory:
Inserting values… You declared a table, defined attributes and their modifiers. Now it is time to add some data. Well, you will probably do this from your program using some kind of ORM, but it is nice to know how it works.
-- If you are inserting all attributes, you can just pass an ordered tuple with the values. -- Otherwise, you need to tell the table what kind of values you are inserting. INSERT INTO customers VALUES ("Tom", "Joe", "Cool Street 4", "Badlands"); INSERT INTO customers (name, lastname) VALUES ("Tom", "Joe"); -- You can also insert many values at once, just pass multiple tuples! INSERT INTO customers VALUES ("Tom", "Joe", "Cool Street 4", "Badlands"), ("Bob", "Tom", "Bad Street 111", "Badlands");
-- Change the street of our customer Bob Tom. UPDATE customers SET street = "Badder Street 111" WHERE name == "Bob" AND lastname == "Tom"; -- Examples with students table. CREATE TABLE students ( id INTEGER PRIMARY KEY, name VARCHAR(80) NOT NULL, email VARCHAR(120) UNIQUE, will_fail_math INTEGER DEFAULT 0, semester INTEGER DEFAULT 1 ); -- Update data in a table. UPDATE students SET will_fail_math = 0; UPDATE students SET semester = semester + 1; UPDATE students SET semester = 15 WHERE name = "Old Bob";
Delete data from matching table. If you are not going to specify conditions, it will just nuke the whole table. DROP table deletes table and contents, DELETE FROM without conditions will leave the table empty.
-- Table used in examples. CREATE TABLE students ( id INTEGER PRIMARY KEY, name VARCHAR(80) NOT NULL, email VARCHAR(120) UNIQUE, will_fail_math INTEGER DEFAULT 0, semester INTEGER DEFAULT 1 ); -- Delete from a table, will delete everything! DELETE FROM students; -- Delete only certain data. DELETE FROM students WHERE will_fail_math = 1;
DML & DQL
Specify requirements. If you want to get something specific, you need WHERE clause in your query.
-- id that is equal to 3. WHERE id = 3; -- All ids that are not 3. WHERE id != 3; -- All ids that are bigger than 3. WHERE id > 3; -- Values between two values. WHERE id BETWEEN 1 AND 300; -- More examples below. SELECT manufacturer, city FROM factories WHERE country == "Germany";
You can use LIKE on strings. https://www.w3schools.com/SQL/sql_like.asp
Lets you join multiple filters.
-- Get all people that have Tom as last and first name, "Tom Tom". SELECT * FROM customers WHERE first_name == "Tom" AND last_name == "Tom"; -- Get all people that have Tom as last or first name. SELECT * FROM customers WHERE first_name == "Tom" OR last_name == "Tom"; -- () do matter! First example returns all people that have first name Tom -- and then all Toms that paid or have lastname Tom. Second query returns -- either people who are named Tom Tom or have paid for their orders! SELECT * FROM customers WHERE first_name == "Tom" AND (last_name == "Tom"; OR paid == 1); SELECT * FROM customers WHERE (first_name == "Tom" AND last_name == "Tom") OR paid == 1;;
DQL is used to query from the database. This is the part of SQL you will be doing most of the time. Construting queries is art.
* is a wildcard that selects everything!
-- Select specific columns: SELECT id, amount FROM storage; -- Select all rows with *, returns the whole table. SELECT * FROM storage; -- You can also temporary rename attributes in your result table. Is very handy when joining multiple tables to reduce bloat. SELECT id as customer_id, name FROm customers; -- You can also modify values in return table, for example add VAT to price. SELECT price * 1,21 FROM items; SELECT *, cost - price AS revenue_per_item FROM products; -- You can capitalize return values. select UPPER(name) from students where semester = 69; -- Select from multiple tables. select studentId, module from students, lectures where studentId = attendeeId and title = "Inf";
Lets you get values without duplicates. Beware, if you select multiple columns with distinct, it will make sure that the whole tuple is distinct, not the separate items!
-- Select values without duplicates! SELECT DISTINCT first_name FROM employees; -- This will return all names, since ids are unique. SELECT DISTINCT id, first_name FROM employees;
You can sorty columns by their row values in ascending or descending order. You can sort multiple columns at once.
-- Select orders with more than 99 odered items and sort by city name in descending order. SELECT * FROM orders WHERE amount >= 100 ORDER BY city DESC; -- Order by name and lastname. If firstnames are equal, order by lastname. SELECT * FROM students ORDER BY firstname, lastname ASC; select name, price from inventory order by price desc, name asc;
Limit the amount of results you will retrieve from the query.
SELECT * FROM customer LIMIT 10;
You can look up values based on the fact that they are NULL.
-- Get all customers that do not have street set. SELECT * FROM customers WHERE street IS NULL
- DQL supports multiple joins and joins can be combined in one query:
- join aka inner join
- left join
- right join
- full outer join
- Joins werden von links nach rehcts ausgeführt, wichtig für outer joins, dürfen aber durch Klammerung angepasst werden.
JOIN INNER JOIN
FULL OUTER JOIN
-- join select Name, Titel from Studenten, hoeren, Vorlesung where Studenten.MatrNr = hoeren.MatrNr and hoeren.MatrNr = Vorlesung.VorlNr; -- example above can be written using join keywprd, returns the same result select s.Name, v.Titel from Studenten s join hoeren h on s.MatrNr = h.MatrNr join Vorlesung v on h.VorlNr = v.VorlNr -- outer join -- studentend auch ohne vorlesung select s.Name, v.Titel from Studenten s left outer join hoeren h on s.MatrNr = h.MatrNr left outer join Vorlesungen v on h.VorlNr = v.VorlNr; -- vorlesungen auch ohne studenten select s.Name, v.Titel from Studenten s join hoeren h on s.MatrNr = h.MatrNr right outer join Vorlesungen v on h.VorlNr = v.VorlNr; -- alternative mit klammerung select s.Name, v.Titel from Studenten s left outer join (hoerenh joinVorlesungen v onh.VorlNr= v.VorlNr) on s. MatrNr= h.MatrNr
(select Name from Students) union (select Name from Professors)
Duplicates are allowed.
Aggregation and Grouping
SQL erzeugt pro eindeutiger Kombination von Attributwerten in der GROUP BY Klauser ein Ergebnistuple.
AVG MAX MIN SUM
Die Aggregationsfunktionen sind avg, max, min, count, count(distinct), sum. Alle in der SELECT Klausel ausgeführten Attribute(außer aggregierte Werte) müssen auch in der GROUP BY Klauser ausgeführt werden!
-- Find out the average study time of students. SELECT AVG(semester) FROM students; -- Find the cheapest products you sell. SELECT name, MIN(price) FROM products ORDER BY MIN(price) LIMIT 10; -- Get total sum of all orders. SELECT SUM(price) as estimated_income FROM orders;
-- Get all orders that have above average order price. SELECT COUNT(id) FROM orders WHERE price > (SELECT AVG(price) FROM orders);
SQL allows nested queries with SELECT, FROM, WHERE, GROUP BY and HAVING.
Lets you create subtables or sub-tables for your query and then select from them!
-- Prints out customer data for people that have the same name as customer with the id 44. SELECT * FROM customers WHERE name -- Select name of customer with id 44. IN (SELECT name FROM customers WHERE id = 44);
Nothing here yet.
Stuff about building index and ORM?
SQLite is my favorite database. Read the nicely done FAQ. It is just a file, you do not have to create any clients, setup a server, manage connections and etc. Just make a database file, use sqlite3 database driver in Python and go wild. But beware, it is not thread safe. INSERT, UPDATE, DELETE will lock the database, if you have multiple functions writing to one database from multiple processes - you will have problems.
If you ever had a problem that required working with time, this is the right place. SQLite has a nice overview of those functions.
-- Lasft scrobbles per day SELECT DATE(date), COUNT(DATE(date)) FROM lastfm_scrobbles GROUP BY DATE(date) ORDER BY DATE(date) DESC -- Lastfm Scrobbles by hour for the last 7 days. SELECT strftime('%H', date) as hour, COUNT(strftime('%H', date)) FROM ( SELECT * FROM lastfm_scrobbles WHERE date >= datetime('now', '-6 days') ORDER BY date DESC ) GROUP BY hour ORDER BY hour DESC
GENERATED ALWAYS AS
Value generation to ease your job. Can be generated as VIRTUAL or STORED. First one generates values on database read, without storing actual data in the file. Second will store the data after computing it on write. Decide for yourself. What is more expensive in your environment, storage or cpu time?
Random function returns an integer. Can be used to get random row from the table in a format like this:
-- Get a random product from products table. SELECT * FROM products ORDER BY random() LIMIT 1
Q & A
- What is the difference between SQLite and PostgreSQL or MySQL?
- SQLite is embedded. PostgreSQL and MySQL are client/server model databases. More work and more power.
Examples from the real world to give you some more advanced examples to work with.
-- Discord bot shenanigans: select count(author_id) as messages, author_name from messages where server_id==705007538294226945 group by author_name order by count(author_name) desc; -- Count messages by channel_id. Returns a table of total messages per channel_id. select count(*) as messages, channel_id from messages where server_id==705007538294226945 group by channel_id order by count(*) desc -- Counts messages by author per channel. select count(message_id), channel_name, author_name from messages inner JOIN channels on channels.channel_id=messages.channel_id group by author_name order by count(message_id) desc -- Get reactions and total amount of those reactions. select count(reaction_id), reaction_id from message_reactions group by reaction_id order by count(reaction_id) desc -- Get messages by most reactions. select count(messages.message_id), messages.message_id from message_reactions inner join messages on messages.message_id == message_reactions.message_id where server_id==686598539136073780 and is_deleted==0 group by messages.message_id order by count(reaction_id) desc -- Get top 10 messages by total reactions! select count(), messages.author_name, messages.date, channels.channel_name from message_reactions inner join messages on messages.message_id == message_reactions.message_id inner join channels on messages.channel_id == channels.channel_id where server_id==686598539136073780 and is_deleted==0 group by messages.message_id order by count() desc limit 10 -- Extract firefox browsing history. SELECT url, title, datetime((visit_date/1000000), 'unixepoch', 'localtime') AS visit_date FROM moz_places INNER JOIN moz_historyvisits on moz_historyvisits.place_id = moz_places.id ORDER BY visit_date; -- Get messages by hour UTC of messages for last 7 days SELECT strftime('%H', date) AS hour ,COUNT(strftime('%H', date)) FROM ( SELECT * FROM messages WHERE date >= DATETIME ( 'now' ,'-6 days' ) AND server_id == 686598539136073780 ORDER BY DATE DESC ) GROUP BY hour ORDER BY hour DESC -- Get total messages per month? SELECT strftime('%Y-%m', date) as f, COUNT(strftime('%Y-%m', date)) FROM messages GROUP BY f ORDER BY f DESC -- heatmap aka messages per hour per day aka that fancy colored block graph 8) SELECT strftime('%H', date) as hour, strftime('%w', date) as weekday, COUNT(strftime('%H', date)) as amount from messages group by hour, weekday