一文详解MySQL怎么管理层次结构的数据
来源:SegmentFault
时间:2023-01-26 15:43:32 185浏览 收藏
哈喽!今天心血来潮给大家带来了《一文详解MySQL怎么管理层次结构的数据》,想必大家应该对数据库都不陌生吧,那么阅读本文就都不会很困难,以下内容主要涉及到MySQL,若是你正在学习数据库,千万别错过这篇文章~希望能帮助到你!
最初是在 MySQL 官方网站上看到这篇名为 Managing Hierarchical Data in MySQL 的文章(MySQL 随 Sun 一起被 Oracle 收购后,现在只能通过 archive.org 找回了),在原作者 Mike Hillyer 的个人网站上再次看到。
这份笔记在 MySQL 5.7 环境中对原文进行了复盘,调整了例子,同时SQL 语句根据个人习惯和 MySQL 版本的变化相比原文略有调整。
概述
我们知道,关系数据库的表更适合扁平的列表,而不是像 XML 那样可以直管的保存具有父子关系的层次结构数据。
首先定义一下我们讨论的层次结构,是这样的一组数据,每个条目只能有一个父条目,可以有零个或多个子条目(唯一的例外是根条目,它没有父条目)。许多依赖数据库的应用都会遇到层次结构的数据,例如论坛或邮件列表的线索、企业的组织结构图、内容管理系统或商城的分类目录等等。我们如下数据作为示例:

数据来源于维基百科的这个页面,为什么挑了这几个条目,以及是否准确合理在这里就不深究了。
Mike Hillyer 考虑了两种不同的模型——邻接表(Adjacency List)和嵌套集(Nested Set)来实现这个层次结构。
邻接表(Adjacency List)模型
我们可以很直观的使用下面的方式来保存如图所示的结构。
创建名为
CREATE TABLE distributions ( id INT NOT NULL AUTO_INCREMENT, name VARCHAR(32) NOT NULL, parent INT NULL DEFAULT NULL, PRIMARY KEY (id) ) ENGINE = InnoDB DEFAULT CHARACTER SET = utf8;
插入数据:
INSERT INTO distributions VALUES (1, 'Linux', NULL), (2, 'Debian', 1), (3, 'Knoppix', 2), (4, 'Ubuntu', 2), (5, 'Gentoo', 1), (6, 'Red Hat', 1), (7, 'Fedora Core', 6), (8, 'RHEL', 6), (9, 'CentOS', 8), (10, 'Oracle Linux', 8);
执行:
SELECT * FROM distributions;
可以看到表中的数据形如:
+----+--------------+--------+ | id | name | parent | +----+--------------+--------+ | 1 | Linux | NULL | | 2 | Debian | 1 | | 3 | Knoppix | 2 | | 4 | Ubuntu | 2 | | 5 | Gentoo | 1 | | 6 | Red Hat | 1 | | 7 | Fedora Core | 6 | | 8 | RHEL | 6 | | 9 | CentOS | 8 | | 10 | Oracle Linux | 8 | +----+--------------+--------+
使用链接表模型,表中的每一条记录都包含一个指向其上层记录的指针。顶层记录(这个例子中是
SELECT t1.name AS level1, t2.name as level2, t3.name as level3, t4.name as level4 FROM distributions AS t1 LEFT JOIN distributions AS t2 ON t2.parent = t1.id LEFT JOIN distributions AS t3 ON t3.parent = t2.id LEFT JOIN distributions AS t4 ON t4.parent = t3.id WHERE t1.name = 'Linux';
结果如下:
+--------+---------+-------------+--------------+ | level1 | level2 | level3 | level4 | +--------+---------+-------------+--------------+ | Linux | Red Hat | RHEL | CentOS | | Linux | Red Hat | RHEL | Oracle Linux | | Linux | Debian | Knoppix | NULL | | Linux | Debian | Ubuntu | NULL | | Linux | Red Hat | Fedora Core | NULL | | Linux | Gentoo | NULL | NULL | +--------+---------+-------------+--------------+
可以看到,实际上客户端代码拿到这个结果也不容易处理。对比原文,我们发现返回结果的顺序也是不确定的。在实践中没有什么参考意义。不过可以通过增加一个
SELECT t1.name AS level1, t2.name as level2, t3.name as level3, t4.name as level4 FROM distributions AS t1 LEFT JOIN distributions AS t2 ON t2.parent = t1.id LEFT JOIN distributions AS t3 ON t3.parent = t2.id LEFT JOIN distributions AS t4 ON t4.parent = t3.id WHERE t1.name = 'Linux' AND t4.name = 'CentOS';
结果如下:
+--------+---------+--------+--------+ | level1 | level2 | level3 | level4 | +--------+---------+--------+--------+ | Linux | Red Hat | RHEL | CentOS | +--------+---------+--------+--------+
找出所有的叶节点
使用
SELECT distributions.id, distributions.name FROM distributions LEFT JOIN distributions as child ON distributions.id = child.parent WHERE child.id IS NULL;
结果如下:
+----+--------------+ | id | name | +----+--------------+ | 3 | Knoppix | | 4 | Ubuntu | | 5 | Gentoo | | 7 | Fedora Core | | 9 | CentOS | | 10 | Oracle Linux | +----+--------------+
邻接表模型的限制
使用纯 SQL 处理邻接表模型即便在最好的情况下也是困难的。要获得一个分类的完整路径之前我们需要知道它的层次有多深。除此之外,当我们删除一个节点时我们需要格外的谨慎,因为这可能潜在的在处理过程中整个子树成为孤儿(例如删除『便携式小家电』则所有其子分类都成为孤儿了)。其中一些限制可以在客户端代码或存储过程中定位并处理。例如在存储过程中我们可以自下而上的遍历这个结构以便返回整棵树或一个路径。我们也可以使用存储过程来删除节点,通过提升其一个子节点的层次并重新设置所有其它子节点的父节点为这个节点,来避免整棵子树成为孤儿。
嵌套集(Nested Set)模型
由于使用纯 SQL 处理邻接表模型存在种种不便,因此 Mike Hillyer 郑重的介绍了嵌套集(Nested Set)模型。当使用这种模型时,我们把层次结构的节点和路径从脑海中抹去,把它们想象为一个个容器:

可以看到层次关系没有改变,大的容器包含子容器。我们使用容器的左值和右值来建立数据表:
CREATE TABLE nested ( id INT NOT NULL AUTO_INCREMENT, name VARCHAR(32) NOT NULL, `left` INT NOT NULL, `right` INT NOT NULL, PRIMARY KEY (id) ) ENGINE = InnoDB DEFAULT CHARACTER SET = utf8;
需要注意
INSERT INTO nested VALUES (1, 'Linux', 1, 20), (2, 'Debian', 2, 7), (3, 'Knoppix', 3, 4), (4, 'Ubuntu', 5, 6), (5, 'Gentoo', 8, 9), (6, 'Red Hat', 10, 19), (7, 'Fedora Core', 11, 12), (8, 'RHEL', 13, 18), (9, 'CentOS', 14, 15), (10, 'Oracle Linux', 16, 17);
查看内容:
SELECT * FROM nested ORDER BY id;
可以看到:
+----+--------------+------+-------+ | id | name | left | right | +----+--------------+------+-------+ | 1 | Linux | 1 | 20 | | 2 | Debian | 2 | 7 | | 3 | Knoppix | 3 | 4 | | 4 | Ubuntu | 5 | 6 | | 5 | Gentoo | 8 | 9 | | 6 | Red Hat | 10 | 19 | | 7 | Fedora Core | 11 | 12 | | 8 | RHEL | 13 | 18 | | 9 | CentOS | 14 | 15 | | 10 | Oracle Linux | 16 | 17 | +----+--------------+------+-------+
我们是如何确定左编号和右编号的呢,通过下图我们可以直观的发现只要会数数即可完成:

回到树形模型该怎么处理,通过下图,对数据结构稍有概念的人都会知道,稍加改动的先序遍历算法即可完成这项编号的工作:

获取整棵树
一个节点的左编号总是介于其父节点的左右编号之间,利用这个特性使用 self-join 链接到父节点,可以获取整棵树:
SELECT node.name FROM nested AS node, nested AS parent WHERE node.`left` BETWEEN parent.`left` AND parent.`right` AND parent.name = 'Linux' ORDER BY node.`left`;
结果如下:
+--------------+ | name | +--------------+ | Linux | | Debian | | Knoppix | | Ubuntu | | Gentoo | | Red Hat | | Fedora Core | | RHEL | | CentOS | | Oracle Linux | +--------------+
但是这样我们丢失了层次的信息。怎么办呢?使用
SELECT node.name, (COUNT(parent.name) - 1) AS depth FROM nested AS node, nested AS parent WHERE node.`left` BETWEEN parent.`left` AND parent.`right` GROUP BY node.name ORDER BY ANY_VALUE(node.`left`);
需要注意 MySQL 5.7.5 开始默认启用了
ERROR 1055 (42000): Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'test.node.left' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
+--------------+-------+
| name | depth |
+--------------+-------+
| Linux | 0 |
| Debian | 1 |
| Knoppix | 2 |
| Ubuntu | 2 |
| Gentoo | 1 |
| Red Hat | 1 |
| Fedora Core | 2 |
| RHEL | 2 |
| CentOS | 3 |
| Oracle Linux | 3 |
+--------------+-------+
SELECT
CONCAT(REPEAT(' ', COUNT(parent.name) - 1), node.name) AS name
FROM
nested AS node,
nested AS parent
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
GROUP BY node.name
ORDER BY ANY_VALUE(node.`left`);
+-----------------+
| name |
+-----------------+
| Linux |
| Debian |
| Knoppix |
| Ubuntu |
| Gentoo |
| Red Hat |
| Fedora Core |
| RHEL |
| CentOS |
| Oracle Linux |
+-----------------+
SELECT
node.name, (COUNT(parent.name) - ANY_VALUE(sub_tree.depth) - 1) AS depth
FROM
nested AS node,
nested AS parent,
nested AS sub_parent,
(
SELECT
node.name, (COUNT(parent.name) - 1) AS depth
FROM
nested AS node,
nested AS parent
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
AND node.name = 'Red Hat'
GROUP BY node.name, node.`left`
ORDER BY node.`left`
) AS sub_tree
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
AND node.`left` BETWEEN sub_parent.`left` AND sub_parent.`right`
AND sub_parent.name = sub_tree.name
GROUP BY node.name
ORDER BY ANY_VALUE(node.`left`);
+--------------+-------+
| name | depth |
+--------------+-------+
| Red Hat | 0 |
| Fedora Core | 1 |
| RHEL | 1 |
| CentOS | 2 |
| Oracle Linux | 2 |
+--------------+-------+
寻找一个节点的直接子节点
使用邻接表模型时这相当简单。使用嵌套集时,我们可以在上面获取子树各节点深度的基础上增加一个
SELECT
node.name, (COUNT(parent.name) - ANY_VALUE(sub_tree.depth) - 1) AS depth
FROM
nested AS node,
nested AS parent,
nested AS sub_parent,
(
SELECT
node.name, (COUNT(parent.name) - 1) AS depth
FROM
nested AS node,
nested AS parent
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
AND node.name = 'Red Hat'
GROUP BY node.name, node.`left`
ORDER BY node.`left`
) AS sub_tree
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
AND node.`left` BETWEEN sub_parent.`left` AND sub_parent.`right`
AND sub_parent.name = sub_tree.name
GROUP BY node.name
HAVING depth = 1
ORDER BY ANY_VALUE(node.`left`);
+-------------+-------+
| name | depth |
+-------------+-------+
| Fedora Core | 1 |
| RHEL | 1 |
+-------------+-------+
获取所有叶节点
观察带编号的嵌套模型,叶节点的判断相当简单,右编号恰好比左编号多 1 的节点就是叶节点:
SELECT id, name FROM nested WHERE `right` = `left` + 1;
+----+--------------+
| id | name |
+----+--------------+
| 3 | Knoppix |
| 4 | Ubuntu |
| 5 | Gentoo |
| 7 | Fedora Core |
| 9 | CentOS |
| 10 | Oracle Linux |
+----+--------------+
获取单个节点的完整路径
仍然是使用 self-join 技巧,不过现在无需顾虑节点的深度了:
SELECT parent.name
FROM
nested AS node,
nested AS parent
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
AND node.name = 'CentOS'
ORDER BY parent.`left`;
+---------+
| name |
+---------+
| Linux |
| Red Hat |
| RHEL |
| CentOS |
+---------+
聚集操作
CREATE TABLE releases (
id INT NOT NULL AUTO_INCREMENT,
distribution_id INT NULL,
name VARCHAR(32) NOT NULL,
PRIMARY KEY (id),
INDEX distribution_id_idx (distribution_id ASC),
CONSTRAINT distribution_id
FOREIGN KEY (distribution_id)
REFERENCES nested (id)
ON DELETE CASCADE
ON UPDATE CASCADE
)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;
INSERT INTO releases (distribution_id, name) VALUES
(2, '7'), (2, '8'),
(4, '14.04 LTS'), (4, '15.10'),
(7, '22'), (7, '23'),
(9, '5'), (9, '6'), (9, '7');
那么,下面的查询可以知道每个节点下涉及的发布版数量,如果这是一个软件支持的发布版清单,或许测试人员想要知道他们得准备多少种虚拟机吧:
SELECT
parent.name, COUNT(releases.name)
FROM
nested AS node ,
nested AS parent,
releases
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
AND node.id = releases.distribution_id
GROUP BY parent.name
ORDER BY ANY_VALUE(parent.`left`);
+-------------+----------------------+
| name | COUNT(releases.name) |
+-------------+----------------------+
| Linux | 9 |
| Debian | 4 |
| Ubuntu | 2 |
| Red Hat | 5 |
| Fedora Core | 2 |
| CentOS | 3 |
+-------------+----------------------+
如果层次结构是一个分类目录,这个技巧可以用于查询各个类别下有多少关联的商品。
添加节点

LOCK TABLE nested WRITE;
SELECT @baseIndex := `right` FROM nested WHERE name = 'Gentoo';
UPDATE nested SET `right` = `right` + 2 WHERE `right` > @baseIndex;
UPDATE nested SET `left` = `left` + 2 WHERE `left` > @baseIndex;
INSERT INTO nested (name, `left`, `right`) VALUES
('Slackware', @baseIndex + 1, @baseIndex + 2);
UNLOCK TABLES;
SELECT
CONCAT(REPEAT(' ', COUNT(parent.name) - 1), node.name) AS name
FROM
nested AS node,
nested AS parent
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
GROUP BY node.name
ORDER BY ANY_VALUE(node.`left`);
+-----------------+
| name |
+-----------------+
| Linux |
| Debian |
| Knoppix |
| Ubuntu |
| Gentoo |
| Slackware |
| Red Hat |
| Fedora Core |
| RHEL |
| CentOS |
| Oracle Linux |
+-----------------+
如果新增的节点的父节点原来是叶节点,我们需要稍微调整一下之前的代码。例如,我们要新增
LOCK TABLE nested WRITE;
SELECT @baseIndex := `left` FROM nested WHERE name = 'Slackware';
UPDATE nested SET `right` = `right` + 2 WHERE `right` > @baseIndex;
UPDATE nested SET `left` = `left` + 2 WHERE `left` > @baseIndex;
INSERT INTO nested(name, `left`, `right`) VALUES ('Slax', @baseIndex + 1, @baseIndex + 2);
UNLOCK TABLES;
+-----------------+
| name |
+-----------------+
| Linux |
| Debian |
| Knoppix |
| Ubuntu |
| Gentoo |
| Slackware |
| Slax |
| Red Hat |
| Fedora Core |
| RHEL |
| CentOS |
| Oracle Linux |
+-----------------+
删除节点
LOCK TABLE nested WRITE;
SELECT
@nodeLeft := `left`,
@nodeRight := `right`,
@nodeWidth := `right` - `left` + 1
FROM nested
WHERE name = 'Slackware';
DELETE FROM nested WHERE `left` BETWEEN @nodeLeft AND @nodeRight;
UPDATE nested SET `right` = `right` - @nodeWidth WHERE `right` > @nodeRight;
UPDATE nested SET `left` = `left` - @nodeWidth WHERE `left` > @nodeRight;
UNLOCK TABLES;
+-----------------+
| name |
+-----------------+
| Linux |
| Debian |
| Knoppix |
| Ubuntu |
| Gentoo |
| Red Hat |
| Fedora Core |
| RHEL |
| CentOS |
| Oracle Linux |
+-----------------+
稍加调整,如果对介于要删除节点左右编号直接的节点对应编号左移 1,右侧节点对应编号左移 2,则可以实现删除一个节点,其子节点提升一层的效果,例如我们尝试删除
LOCK TABLE nested WRITE;
SELECT
@nodeLeft := `left`,
@nodeRight := `right`
FROM nested
WHERE name = 'RHEL';
DELETE FROM nested WHERE `left` = @nodeLeft;
UPDATE nested SET `right` = `right` - 1, `left` = `left` - 1 WHERE `left` BETWEEN @nodeLeft AND @nodeRight;
UPDATE nested SET `right` = `right` - 2 WHERE `right` > @nodeRight;
UPDATE nested SET `left` = `left` - 2 WHERE `left` > @nodeRight;
UNLOCK TABLES;
SELECT
CONCAT(REPEAT(' ', COUNT(parent.name) - 1), node.name) AS name
FROM
nested AS node,
nested AS parent
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
GROUP BY node.name
ORDER BY ANY_VALUE(node.`left`);
+----------------+
| name |
+----------------+
| Linux |
| Debian |
| Knoppix |
| Ubuntu |
| Gentoo |
| Red Hat |
| Fedora Core |
| CentOS |
| Oracle Linux |
+----------------+
本文 2016-03-23 首次发表于 SegmentFault.com。
今天关于《一文详解MySQL怎么管理层次结构的数据》的内容就介绍到这里了,是不是学起来一目了然!想要了解更多关于mysql的内容请关注golang学习网公众号!
-
499 收藏
-
244 收藏
-
235 收藏
-
157 收藏
-
101 收藏
-
285 收藏
-
259 收藏
-
374 收藏
-
475 收藏
-
483 收藏
-
462 收藏
-
469 收藏
-
289 收藏
-
239 收藏
-
315 收藏
-
361 收藏
-
184 收藏
-
- 前端进阶之JavaScript设计模式
- 设计模式是开发人员在软件开发过程中面临一般问题时的解决方案,代表了最佳的实践。本课程的主打内容包括JS常见设计模式以及具体应用场景,打造一站式知识长龙服务,适合有JS基础的同学学习。
- 立即学习 542次学习
-
- GO语言核心编程课程
- 本课程采用真实案例,全面具体可落地,从理论到实践,一步一步将GO核心编程技术、编程思想、底层实现融会贯通,使学习者贴近时代脉搏,做IT互联网时代的弄潮儿。
- 立即学习 508次学习
-
- 简单聊聊mysql8与网络通信
- 如有问题加微信:Le-studyg;在课程中,我们将首先介绍MySQL8的新特性,包括性能优化、安全增强、新数据类型等,帮助学生快速熟悉MySQL8的最新功能。接着,我们将深入解析MySQL的网络通信机制,包括协议、连接管理、数据传输等,让
- 立即学习 497次学习
-
- JavaScript正则表达式基础与实战
- 在任何一门编程语言中,正则表达式,都是一项重要的知识,它提供了高效的字符串匹配与捕获机制,可以极大的简化程序设计。
- 立即学习 487次学习
-
- 从零制作响应式网站—Grid布局
- 本系列教程将展示从零制作一个假想的网络科技公司官网,分为导航,轮播,关于我们,成功案例,服务流程,团队介绍,数据部分,公司动态,底部信息等内容区块。网站整体采用CSSGrid布局,支持响应式,有流畅过渡和展现动画。
- 立即学习 484次学习