首页 > Golang > Go教程

深入了解Golang的map增量扩容

来源：脚本之家

时间：2022-12-31 16:12:10 442浏览收藏

来到golang学习网的大家，相信都是编程学习爱好者，希望在这里学习Golang相关编程知识。下面本篇文章就来带大家聊聊《深入了解Golang的map增量扩容》，介绍一下map、增量、扩容，希望对大家的知识积累有所帮助，助力实战开发！

核心思想

以空间换时间，访问速度与填充因子有关

扩容hash表的时候每次都增大2倍，hash表大小始终为2的整数倍，有(hash mod 2^B) == (hash & (2^B-1))，方便于简化运算，避免取余操作

扩容前后的 hash mod 容量大小是不等的，因此要重新计算每一项在hash表中的位置，扩容后需要将old pair重新hash到新的hash表上（就是一个evacuate的过程）。这个过程不是一次性完成的，在每次insert、remove的时候会搬移1-2个pair。就是使用的是增量扩容

每个旧桶的键值对都会分流到2个不同的新桶中

为什么要使用增量扩容?

主要是缩短map容器的响应时间。如果不用增量扩容，当一个map存储很多元素后进行扩容，会阻塞很长时间无法响应请求。增量扩容的本质其实就是将总的扩容时间分摊到了每一次hash操作上

在搬数据的时候，并不会把旧的bucket从oldbucket中删除，只是加上了一个已删除的标记

扩容期间一部分数据在oldbucket中，一部分在bucket中，会对hash表的insert，remove，lookup操作的处理逻辑产生影响，如耗时更长等

只有当oldbucket中所有bucket移动到新表后，才会将oldbucket释放掉

扩容方式

如果grow的太频繁，空间的利用率就会很低，如果很久才grow，会形成很多的overflow buckets，查找速率会下降

map的填充因子是6.5

即当count / 2^B > 6.5时会触发一次grow.翻倍扩容

如果负载因子没有超标，但是使用的溢出桶较多，也会触发扩容。但是是等量扩容

原因是原桶中有太多的键值对被删除，等量扩容可以使得剩余的键值对排列更加紧凑，节省空间

	// Maximum average load of a bucket that triggers growth is 6.5.
	// Represent as loadFactorNum/loadFactorDen, to allow integer math.
	loadFactorNum = 13
	loadFactorDen = 2

这个6.5来源于作者的一个测试程序，取了一个相对适中的值

// Picking loadFactor: too large and we have lots of overflow
// buckets, too small and we waste a lot of space. I wrote
// a simple program to check some stats for different loads:
// (64-bit, 8 byte keys and elems)
//  loadFactor    %overflow  bytes/entry     hitprobe    missprobe
//        4.00         2.13        20.77         3.00         4.00
//        4.50         4.05        17.30         3.25         4.50
//        5.00         6.85        14.77         3.50         5.00
//        5.50        10.55        12.94         3.75         5.50
//        6.00        15.27        11.67         4.00         6.00
//        6.50        20.90        10.79         4.25         6.50
//        7.00        27.14        10.15         4.50         7.00
//        7.50        34.03         9.73         4.75         7.50
//        8.00        41.10         9.40         5.00         8.00
//
// %overflow   = percentage of buckets which have an overflow bucket
// bytes/entry = overhead bytes used per key/elem pair
// hitprobe    = # of entries to check when looking up a present key
// missprobe   = # of entries to check when looking up an absent key
//
// Keep in mind this data is for maximally loaded tables, i.e. just
// before the table grows. Typical tables will be somewhat less loaded.

源码分析

// 只是分配新的buckets，并将老的buckets挂到oldbuckets字段上
// 真正搬迁的动作在growWork()中
func hashGrow(t *maptype, h *hmap) {
	// If we've hit the load factor, get bigger.
	// Otherwise, there are too many overflow buckets,
	// so keep the same number of buckets and "grow" laterally.
    // B+1 相当于之前的2倍空间
	bigger := uint8(1)
    // 对应条件2
	if !overLoadFactor(h.count+1, h.B) {
        // 进行等量扩容，B不变
		bigger = 0
		h.flags |= sameSizeGrow
	}
    // 将oldbuckets挂到buckets上
	oldbuckets := h.buckets
    // 申请新的buckets空间
	newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

    // 对标志位的处理
    // &^表示 按位置0
    // x=01010011
    // y=01010100
    // z=x&^y=00000011
    // 如果y的bit位为1，那么z相应的bit位就为0
    // 否则z对应的bit位就和x对应的bit位相同
    // 
    // 其实就是将h.flags的iterator和oldItertor位置为0
    // 如果发现iterator位为1，那就把它转接到 oldIterator 位
    // 使得 oldIterator 标志位变成 1
    // bucket挂到了oldbuckets下，那么标志位也一样转移过去
	flags := h.flags &^ (iterator | oldIterator)
	if h.flags&iterator != 0 {
		flags |= oldIterator
	}
    
    // // 可能有迭代器使用 buckets
    // iterator     = 1
    // 可能有迭代器使用 oldbuckets
    // oldIterator  = 2
    // 有协程正在向 map 中写入 key
    // hashWriting  = 4
    // 等量扩容（对应条件 2）
    // sameSizeGrow = 8
    // 提交grow的动作
	// commit the grow (atomic wrt gc)
	h.B += bigger
	h.flags = flags
	h.oldbuckets = oldbuckets
	h.buckets = newbuckets
    // 搬迁进度为0
	h.nevacuate = 0
    // 溢出bucket数量为0
	h.noverflow = 0

	if h.extra != nil && h.extra.overflow != nil {
		// Promote current overflow buckets to the old generation.
		if h.extra.oldoverflow != nil {
			throw("oldoverflow is not nil")
		}
		h.extra.oldoverflow = h.extra.overflow
		h.extra.overflow = nil
	}
	if nextOverflow != nil {
		if h.extra == nil {
			h.extra = new(mapextra)
		}
		h.extra.nextOverflow = nextOverflow
	}

	// the actual copying of the hash table data is done incrementally
	// by growWork() and evacuate().
}

// growWork 真正执行搬迁工作的函数
// 调用其的动作在mapssign和mapdelete函数中，也就是插入、修改或删除的时候都会尝试进行搬迁
func growWork(t *maptype, h *hmap, bucket uintptr) {
	// make sure we evacuate the oldbucket corresponding
	// to the bucket we're about to use
    // 确保搬迁的老bucket对应的正在使用的新bucket
    // bucketmask 作用就是将key算出来的hash值与bucketmask相&，得到key应该落入的bucket
    // 只有hash值低B位决策key落入那个bucket
	evacuate(t, h, bucket&h.oldbucketmask())

	// evacuate one more oldbucket to make progress on growing
    // 再搬迁一个bucket，加快搬迁进度，这就是说为什么可能每次操作会搬迁1-2个bucket
	if h.growing() {
		evacuate(t, h, h.nevacuate)
	}
}

// 返回扩容前的bucketmask
// 
// 所谓的bucketmask作用就是将 key 计算出来的哈希值与 bucketmask 相与
// 得到的结果就是 key 应该落入的桶
// 比如 B = 5，那么 bucketmask 的低 5 位是 11111，其余位是 0
// hash 值与其相与的意思是，只有 hash 值的低 5 位决策 key 到底落入哪个 bucket。
// oldbucketmask provides a mask that can be applied to calculate n % noldbuckets().
func (h *hmap) oldbucketmask() uintptr {
	return h.noldbuckets() - 1
}

// 检查oldbuckets是否搬迁完
// growing reports whether h is growing. The growth may be to the same size or bigger.
func (h *hmap) growing() bool {
	return h.oldbuckets != nil
}

// evacDst is an evacuation destination.
type evacDst struct {
    // 标识bucket移动的目标地址
	b *bmap          // current destination bucket
    // k-v的索引
	i int            // key/elem index into b
    // 指向k
	k unsafe.Pointer // pointer to current key storage
    // 指向v
	e unsafe.Pointer // pointer to current elem storage
}
// evacuate 核心搬迁函数
func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
    // 定位老的bucket的地址
	b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
	// 结果是2^B，进行计算 如 B = 5，结果为32
    newbit := h.noldbuckets()
    // 如果b没被搬迁过
	if !evacuated(b) {
		// TODO: reuse overflow buckets instead of using new ones, if there
		// is no iterator using the old buckets.  (If !oldIterator.)

		// xy contains the x and y (low and high) evacuation destinations.
        // xy包含了两个可能搬迁到的目的bucket地址
        // 默认是等量扩容的，用x来搬迁
		var xy [2]evacDst
		x := &xy[0]
		x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
		x.k = add(unsafe.Pointer(x.b), dataOffset)
		x.e = add(x.k, bucketCnt*uintptr(t.keysize))

        // 如果不是等量扩容，前后的bucket序号有变
        // 使用y来搬迁
		if !h.sameSizeGrow() {
			// Only calculate y pointers if we're growing bigger.
			// Otherwise GC can see bad pointers.
			y := &xy[1]
            // y代表的bucket序号增加了2^B
			y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
			y.k = add(unsafe.Pointer(y.b), dataOffset)
			y.e = add(y.k, bucketCnt*uintptr(t.keysize))
		}

        // 遍历所有的bucket，包括溢出bucket
        // b是老bucket的地址
		for ; b != nil; b = b.overflow(t) {
			k := add(unsafe.Pointer(b), dataOffset)
			e := add(k, bucketCnt*uintptr(t.keysize))
            // 遍历bucket里所有的cell
			for i := 0; i  newbit {
		stop = newbit
	}
    // 寻找没有搬迁过的bucket
	for h.nevacuate != stop && bucketEvacuated(t, h, h.nevacuate) {
		h.nevacuate++
	}

    // 现在h.nevacuate之前的bucket都被搬迁完毕了

    // 如果所有bucket搬迁完毕
	if h.nevacuate == newbit { // newbit == # of oldbuckets
        // 清除oldbuckets，释放bucket array
		// Growing is all done. Free old main bucket array.
		h.oldbuckets = nil
        // 清除老的溢出bucket
        // [0]表示当前溢出bucket
        // [1]表示老的溢出bucket
		// Can discard old overflow buckets as well.
		// If they are still referenced by an iterator,
		// then the iterator holds a pointers to the slice.
		if h.extra != nil {
			h.extra.oldoverflow = nil
		}
        // 清除正在扩容的标志位
		h.flags &^= sameSizeGrow
	}
}

源码里提到 X, Y part，其实就是我们说的如果是扩容到原来的 2 倍，桶的数量是原来的 2 倍，前一半桶被称为 X part，后一半桶被称为 Y part。一个 bucket 中的 key 会分裂落到 2 个桶中。一个位于 X part，一个位于 Y part。所以在搬迁一个 cell 之前，需要知道这个 cell 中的 key 是落到哪个 Part。

其实很简单，重新计算 cell 中 key 的 hash，并向前“多看”一位，决定落入哪个 Part

设置 key 在原始 buckets 的 tophash 为 evacuatedX 或是 evacuatedY，表示已经搬迁到了新 map 的 x part 或是 y part。新 map 的 tophash 则正常取 key 哈希值的高 8 位。

对于增量扩容来说：某个 key 在搬迁前后 bucket 序号可能和原来相等，也可能是相比原来加上 2^B（原来的 B 值），取决于 hash 值第 6 bit 位是 0 还是 1。

当搬迁碰到 math.NaN() 的 key 时，只通过 tophash 的最低位决定分配到 X part 还是 Y part（如果扩容后是原来 buckets 数量的 2 倍）。如果 tophash 的最低位是 0 ，分配到 X part；如果是 1 ，则分配到 Y part，已搬迁完的key的tophash值是一个状态值，表示key的搬迁去向

到这里，我们也就讲完了《深入了解Golang的map增量扩容》的内容了。个人认为，基础知识的学习和巩固，是为了更好的将其运用到项目中，欢迎关注golang学习网公众号，带你了解更多关于golang的知识点！

golang map 增量扩容

声明：本文转载于：脚本之家如有侵犯，请联系study_golang@163.com删除