《SQL必知必會》讀書筆記，30分鐘入門SQL！

2019 年 11 月 27 日
筆記

來源：https://segmentfault.com/p/1210000011760973/read

猴哥推薦寄語：

數據分析中有個數據倉庫的概念，它本質上是通過資料庫技術來存儲數據的。只不過數據倉庫更適用於數量龐大的場景，同時它按照更方便於數據分析和數據挖掘的形式來存儲。

本篇文章是《SQL 必知必會》的讀書筆記，SQL必知必會的英文名叫做 Sams Teach Yourself in 10 Minutes。但是，我肯定是不能夠在10分鐘就能學會本書所有涉及到的sql，所以就起個名字叫30分鐘學會SQL語句。

目前手邊的資料庫是 mysql，所以以下示例均是由 mysql 演示。由於現在大部分工具都支援語法高亮，所以以下關鍵字都使用小寫。

一. 準備

1、工具

mycli，一個使用python編寫的終端工具，支援語法高亮，自動補全，多行模式，並且如果你熟悉vi的話，可以使用vi-mode快速移動，編輯。總之，vi + mycli 簡直是神器！

同樣， postgreSQL 可以使用pgcli。

pip install -U mycli    # 默認你已經安裝了pip

2、樣例表

示例中有兩個表，分為 student 學生表與 class 班級表。student 表中有 class_id 關聯 class 表。以下是兩個表數據的 sql。另外，最後有三道小練習題會用到樣例表。

create table class (    id int(11) not null auto_increment,    name varchar(50) not null,    primary key (id)  );    create table student (    id int(11) not null auto_increment,    name varchar(50) not null,    age smallint default 20,    sex enum('male', 'famale'),    score tinyint comment '入學成績',    class_id int(11),    createTime timestamp default current_timestamp,    primary key (id),    foreign key (class_id) references class (id)  );    insert into class (name) values ('軟體工程'), ('市場營銷');    insert into student (name, age, sex, score, class_id) values ('張三', 21, 'male', 100, 1);  insert into student (name, age, sex, score, class_id) values ('李四', 22, 'male', 98, 1);  insert into student (name, age, sex, score, class_id) values ('王五', 22, 'male', 99, 1);  insert into student (name, age, sex, score, class_id) values ('燕七', 21, 'famale', 34, 2);  insert into student (name, age, sex, score, class_id) values ('林仙兒', 23, 'famale', 78, 2);

2. SQL 基礎

1、術語

Database

資料庫值一系列有關聯數據的集合，而操作和管理這些數據的是DBMS，包括MySQL，PostgreSQL，MongoDB，Oracle，SQLite等等。RDBMS 是基於關係模型的資料庫，使用 SQL 管理和操縱數據。另外也有一些 NoSQL 資料庫，比如 MongoDB。因為 NoSQL為非關係型資料庫，一般不支援join操作，因此會有一些非正則化(denormalization)的數據，查詢也比較快。

Table

具有特定屬性的結構化文件。比如學生表，學生屬性有學號，年齡，性別等。schema (模式) 用來描述這些資訊。NoSQL 不需要固定列，一般沒有 schema，同時也利於垂直擴展。

Column

表中的特定屬性，如學生的學號，年齡。每一列都具有數據類型。

Data Type

每一列都具有數據類型，如 char, varchar，int，text，blob, datetime，timestamp。根據數據的粒度為列選擇合適的數據類型，避免無意義的空間浪費。如下有一些類型對比

char, varchar

需要存儲數據的長度方差小的時候適合存儲`char`，否則`varchar`。  `varchar` 會使用額外長度存儲字元串長度，佔用存儲空間較大。  兩者對字元串末尾的空格處理的策略不同，不同的DBMS又有不同的策略，設計資料庫的時候應當注意到這個區別。

datetime, timestamp

`datetime` 存儲時間範圍從1001年到9999年。  `timestamp` 保存了自1970年1月1日的秒數，因為存儲範圍比較小，自然存儲空間佔用也比較小。  日期類型可以設置更新行時自動更新日期，建議日期時間類型根據精度存儲為這兩個類型。  如今 DBMS 能夠存儲微秒級別的精度，比如 `mysql` 默認存儲精度為秒，但可以指定到微秒級別，即小數點後六位小數

enum

對於一些固定，不易變動的狀態碼建議存儲為 `enum` 類型，具有更好的可讀性，更小的存儲空間，並且可以保證數據有效性。

小問題：如何存儲IP地址？

Row

數據表的每一行記錄。如學生張三。

2、檢索數據

# 檢索單列  select name from student;    # 檢索多列  select name, age, class from student;    # 檢索所有列  select * from student;    # 對某列去重  select distinct class from student;    # 檢索列-選擇區間  # offset 基數為0，所以 `offset 1` 代表從第2行開始  select * from student limit 1, 10;  select * from student limit 10 offset 1;

3、排序

默認排序是 ASC，所以一般升序的時候不需指定，降序的關鍵字是 DESC。使用 B-Tree 索引可以提高排序性能，但只限最左匹配。關於索引可以查看以下 FAQ。

# 根據學號降序排列  select * from student order by number desc;    # 添加索引 (score, name) 可以提高排序性能  # 但是索引 (name, score) 對性能毫無幫助，此謂最左匹配  select * from student order by score desc, name;

4、數據過濾

# 找到學號為1的學生  select * from student where number = 1;    # 找到學號為在 [1, 10] 的學生(閉區間)  select * from student where number between 1 and 10;    # 找到未設置電子郵箱的學生  # 注意不能使用 =  select * from student where email is null;    # 找到一班中大於23歲的學生  select * from student where class_id = 1 and age > 23;    # 找到一班或者大於23歲的學生  select * from student where class_id = 1 or age > 22;    # 找到一班與二班的學生  select * from student where class_id in (1, 2);    # 找到不是一班二班的學生  select * from student where class_id not in (1, 2);

5、計算欄位

CONCAT

select concat(name, '(', age, ')') as nameWithAge from student;      select concat('hello', 'world') as helloworld;

Math

select age - 18 as relativeAge from student;      select 3 * 4 as n;

更多函數可以查看 API 手冊，同時也可以自定義函數(User Define Function)。

可以直接使用 select 調用函數

select now();  select concat('hello', 'world');

6、數據匯總

聚集函數，一些對數據進行匯總的函數，常見有 COUNT， MIN， MAX， AVG， SUM 五種。

# 統計1班人數  select count(*) from student where class_id = 1;

7、數據分組

使用 group by 進行數據分組，可以使用聚合函數對分組數據進行匯總，使用 having 對分組數據進行篩選。

# 按照班級進行分組並統計各班人數  select class_id, count(*) from student group by class_id;    # 列出大於三個學生的班級  select class_id, count(*) as cnt from student group by class_id having cnt > 3;

8、子查詢

# 列出軟體工程班級中的學生  select * from student where class_id in (    select id from class where class_id = '軟體工程'  );

9、關聯聯接

雖然兩個表擁有公共欄位便可以創建聯接，但是使用外鍵可以更好地保證數據完整性。比如當對一個學生插入一條不存在的班級的時候，便會插入失敗。一般來說，聯接比子查詢擁有更好的性能。

# 列出軟體工程班級中的學生  select * from student, class  where student.class_id = class.id and class.name = '軟體工程';

內聯接

內聯接又叫等值聯接。

# 列出軟體工程班級中的學生    select * from student    inner join class on student.class_id = class.id    where class.name = '軟體工程';

自聯接

# 列出與張三同一班級的學生    select * from student s1    inner join student s2 on s1.class_id = s2.class_id    where s1.name = '張三';

外聯接

# 列出每個學生的班級，弱沒有班級則為null    select name, class.name from student    left join class on student.class_id = class.id;

10、插入數據

可以採用以下方法插入一條數據，不過嚴重依賴表中列的順序關係，推薦指定列名插入數據，並且可以插入部分列。

# 插入一條數據  insert into student values(8, '陸小鳳', 24, 1, 3);    insert into student(name, age, sex, class_id) values(9, '花無缺', 25, 1, 3);

11、修改數據

更新

# 修改張三的班級    update student set class_id = 2 where name = '張三';

刪除

# 刪除張三的數據    delete from student where name = '張三';      # 刪除表中所有數據    delete from student;      # 更快地刪除表中所有數據    truncate table student;

12、創建表與更新表

# 創建學生表  create table student (    id int(11) not null auto_increment,    name varchar(50) not null,    age smallint default 20,    sex enum('male', 'famale'),    score tinyint comment '入學成績',    class_id int(11),    createTime timestamp default current_timestamp,    primary key (id),    foreign key (class_id) references class (id)  );    # 根據舊錶創建新表  create table student_copy as select * from student;    # 刪除 age 列  alter table student drop column age;    # 添加 age 列  alter table student add column age smallint;    # 刪除學生表  drop table student;

13、視圖

視圖是一種虛擬的表，便於更好地在多個表中檢索數據，視圖也可以作寫操作，不過最好作為只讀。在需要多個表聯接的時候可以使用視圖。

create view v_student_with_classname as  select student.name name, class.name class_name  from student left join class  where student.class_id = class.id;    select * from v_student_with_classname;

14、約束

primiry key

任意兩行絕對沒有相同的主鍵，且任一行不會有兩個主鍵且主鍵絕不為空。使用主鍵可以加快索引。

alter table student add constraint primary key (id);

foreign key

外鍵可以保證數據的完整性。有以下兩種情況。

插入張三丰5班到student表中會失敗，因為5班在class表中不存在。
class表刪除3班會失敗，因為陸小鳳和楚留香還在3班。

alter table student add constraint foreign key (class_id) references class (id);

unique key

唯一索引保證該列值是唯一的，但可以允許有null。

alter table student add constraint unique key (name);

check

檢查約束可以使列滿足特定的條件，如果學生表中所有的人的年齡都應該大於0。

不過很可惜mysql不支援，可以使用觸發器代替

alter table student add constraint check (age > 0);

index

索引可以更快地檢索數據，但是降低了更新操作的性能。

create index index_on_student_name on student (name);      alter table student add constraint key(name );

15、觸發器

可以在插入，更新，刪除行的時候觸發事件。

# 創建觸發器  # 比如mysql中沒有check約束，可以使用創建觸發器，當插入數據小於0時，置為0。  create trigger reset_age before insert on student for each row  begin    if NEW.age < 0 then      set NEW.age = 0;    end if;  end;    # 列印觸發器列表  show triggers;

16、存儲過程

存儲過程可以視為一個函數，根據輸入執行一系列的 sql 語句。存儲過程也可以看做對一系列資料庫操作的封裝，一定程度上可以提高資料庫的安全性。

# 創建存儲過程  create procedure create_student(name varchar(50))  begin    insert into students(name) values (name);  end;    # 調用存儲過程  call create_student('shanyue');

3.SQL 練習

1. 根據班級學生的分數進行排名，如果分數相等則為同一名次

select id, name, score, (    select count(distinct score) from student s2 where s2.score >= s1.score  ) as rank  from student s1 order by s1.score desc;

在where以及排序中經常用到的欄位需要添加Btree索引，因此 score 上可以添加索引。

Result:

id	name	score	rank
1	張三	100	1
3	王五	99	2
2	李四	98	3
5	林仙兒	78	4
4	燕七	34	5

2. 寫一個函數，獲取第 N 高的分數

create function getNthHighestScore(N int) return int  begin    declare M int default N-1;    return (      select distinct score from student order by score desc limit M, 1;    )  end;    select getNthHighestScore(2);

Result:

getNthHighestScore(2)
99

3. 檢索每個班級分數前兩名學生，並顯示排名

select class.id class_id, class.name class_name,  s.name student_name, score, rank  from (    select *, (      select count(distinct score) from student s2      where s2.score >= s1.score and s2.class_id = s1.class_id    ) as rank from student s1  ) as s left join class on s.class_id = class.id where rank <= 2;    # 如果不想在from中包含select子句，也可以像如下檢索，不過不顯示排名  select class.id class_id, class.name class_name, s1.name name, score  from student s1 left join class on s1.class_id = class.id  where (select count(*) from student s2    where s2.class_id = s1.class_id and s1.score <= s2.score  ) <= 2  order by s1.class_id, score desc;

Result:

class_name	student_name	score	rank
軟體工程	張三	100	1
軟體工程	王五	99	2
市場營銷	燕七	34	2
市場營銷	林仙兒	78	1

FAQ

1. `inner join` 與 `outer join` 的區別是什麼

2. 如何根據一個表的數據更新另一個表

比如以上 student 表保存著成績，另有一表 score_correct 記憶體因失誤而需修改的學生成績。

在mysql中，可以使用如下語法

update student, score_correct  set student.score = score_correct.score  where student.id = score_correct.uid;

3. 索引是如何工作的

簡單來說，索引分為 hash 和 B-Tree 兩種。hash 查找的時間複雜度為O(1)。B-Tree 其實是 B+Tree，一種自平衡多叉搜索數，自平衡代表每次插入和刪除數據都會需要動態調整樹高，以降低平衡因子。B+Tree 只有葉子節點會存儲資訊，並且會使用鏈錶鏈接起來。因此適合範圍查找以及排序，不過只能搜索最左前綴，如只能索引以 a開頭的姓名，卻無法索引以 a結尾的姓名。另外，Everything is trade off。B+Tree的自平衡特性保證能夠快速查找的同時也降低了更新的性能，需要權衡利弊。

4. 如何聯接多個行的欄位

在mysql中，可以使用 group_concat

select group_concat(name) from student;

5. 如何在一個sql語句中插入多行數據

values 使用逗號相隔，可以插入多行數據

insert into student(id, name) values (), (), ()

6. 如何在 `select`中使用條件表達式

示例，在student表中，查詢所有人成績，小於60則顯示為0

select id, name, if(score < 60, 0, score) score from student;

7. 如何找到重複項

select name, sex, count(*) times from student  group by name, sex  having times > 1;

8. 什麼是SQL注入

如有一條查詢語句為

"select * from (" + table + ");"

當table取值 student);drop table student;-- 時，語句變為了，會刪掉表，造成攻擊。

"select * from (student); drop table student; --);"

—End—