Go语言Map的底层原理

概念

map 又称字典，是一种常用的数据结构，核心特征包含下述三点：

（1）存储基于 key-value 对映射的模式；

（2）基于 key 维度实现存储数据的去重；

（3）读、写、删操作控制，时间复杂度 O(1).

****key 的类型要求
map 中，key 的数据类型必须为可比较的类型，切片、map、func不可比较

指针类型是可以比较的。

如果是结构体会怎么样？

结构体中的所有字段的类型都必须是可比较的类型的才能作为map的key

type student struct {name stringage  int
}func TestMap(t *testing.T) {var m map[student]stringm = map[student]string{student{"Jane", 20}: "Jane",}t.Log(m)
}

同理数组也是一样。

遍历

在执行 map 遍历操作时，获取的 key-value 对并没有一个固定的顺序，因此前后两次遍历顺序可能存在差异.

并发冲突

map 不是并发安全的数据结构，倘若存在并发读写行为，会抛出 fatal error.

具体规则是：

（1）并发读没有问题；

（2）并发读写中的“写”是广义上的，包含写入、更新、删除等操作；

（3）读的时候发现其他 goroutine 在并发写，抛出 fatal error；

（4）写的时候发现其他 goroutine 在并发写，抛出 fatal error.

fatal("concurrent map read and map write")
fatal("concurrent map writes")

需要关注，此处并发读写会引发 fatal error，是一种比 panic 更严重的错误，无法使用 recover 操作捕获.

核心原理

map 又称为 hash map，在算法上基于 hash 实现 key 的映射和寻址；在数据结构上基于桶数组实现 key-value 对的存储.

以一组 key-value 对写入 map 的流程为例进行简述：

（1）通过哈希方法取得 key 的 hash 值；

（2）hash 值对桶数组长度取模，确定其所属的桶；

（3）在桶中插入 key-value 对.

hash 的性质，保证了相同的 key 必然产生相同的 hash 值，因此能映射到相同的桶中，通过桶内遍历的方式锁定对应的 key-value 对.

因此，只要在宏观流程上，控制每个桶中 key-value 对的数量，就能保证 map 的几项操作都限制为常数级别的时间复杂度.

Hash

hash 译作散列，是一种将任意长度的输入压缩到某一固定长度的输出摘要的过程，由于这种转换属于压缩映射，输入空间远大于输出空间（这里的压缩是将无限的输入空间压缩成有限的输出域），因此不同输入可能会映射成相同的输出结果. 此外，hash在压缩过程中会存在部分信息的遗失，因此这种映射关系具有不可逆的特质.

（1）hash 的可重入性：相同的 key，必然产生相同的 hash 值；

（2）hash 的离散性：只要两个 key 不相同，不论其相似度的高低，产生的 hash 值会在整个输出域内均匀地离散化；

（3）hash 的单向性：企图通过 hash 值反向映射回 key 是无迹可寻的.

（4）hash 冲突：由于输入域（key）无穷大，输出域（hash 值）有限，因此必然存在不同 key 映射到相同 hash 值的情况，称之为 hash 冲突.

可能会感到有点冲突，但（2）和（4）并不相矛盾，离散型好的哈希函数只是减少冲突的概率，并不能完全避免。

桶数组

map中，会通过长度为2的整数次幂的桶数组进行 key-value 对的存储：

（1）每个桶固定可以存放8个key-value对

（2）倘若超过 8 个 key-value 对打到桶数组的同一个索引当中，此时会通过创建桶链表的方式来化解这一问题。

在这里插入图片描述

1.：hash冲突不同的key，可能会存在相同的hash值

2：不同的hash值通过对桶长度进行取模之后，也有可能会被打到同一个桶中。

综上面两点：不同的 key-value 可能被映射到 map 的同一个桶当中。

拉链法解决hash冲突

拉链法中，将命中同一个桶的元素通过链表的形式进行链接，因此很便于动态扩展.

Go map 的做法（拉链法的优化）：

每个 bucket 固定容纳 8 个 key-value
如果满了，就通过 overflow 字段挂一个溢出桶
实际上是一个 结构化拉链法（struct-based chaining）

开放寻址法解决hash冲突

所有元素都存在 哈希表本身。如果发生冲突，就按照某种“探测序列”寻找下一个可用位置。

常见策略：

线性探测（index+1）
二次探测（index+1^2, +2^2…）
双重哈希（用另一个哈希函数重新计算偏移）

方法	优点
拉链法	简单常用；无需预先为元素分配内存。
开放寻址法	无需额外的指针用于链接元素；内存地址完全连续，可以基于局部性原理，充分利用CPU高速缓存。

Go 的 map 实现确实结合了 开放寻址法 和 拉链法 的思想，但它并不是标准意义上的链表拉链法，而是桶链表 + 定长数组 + 溢出桶机制的混合方案。

（1）每个桶是一个结构体（Go 源码中为bmap），包含：

一个 tophash 数组（快速判断 key 匹配），数组长度为8，也就是说一个桶中可以存储8个Key-Value.

当桶满了，不是在桶内继续链表扩展 key-value，而是通过一个 溢出指针数组

指向额外的桶（overflow bucket）

所以Go的map实际上是：哈希桶数组+每桶最多存储8个kv+桶装满后追加溢出桶形成“桶链表”

// A bucket for a Go map.
type bmap struct {// tophash generally contains the top byte of the hash value// for each key in this bucket. If tophash[0] < minTopHash,// tophash[0] is a bucket evacuation state instead.tophash [abi.OldMapBucketCount]uint8// Followed by bucketCnt keys and then bucketCnt elems.// NOTE: packing all the keys together and then all the elems together makes the// code a bit more complicated than alternating key/elem/key/elem/... but it allows// us to eliminate padding which would be needed for, e.g., map[int64]int8.// Followed by an overflow pointer.
}

（2）key 命中一个桶后，在桶的 8 个位置中寻找空位插入，这就是类似开放寻址法的本地桶内查找

（3）如果桶的 8 个位置都被占满，则找下一个溢出桶，重复第（2）步

（4）如果遍历所有溢出桶都没有空位，则新建溢出桶并插入

Go map 扩容机制

为什么要扩容？

如果桶数组的长度一直保持不变，那么随着key-value对的增长，当一个桶下挂载的key-value达到一定的量级，时间复杂度上升，无法满足诉求。

因此，为了将操作的时间复杂度维持在 O(1)，map 会在满足一定条件时触发扩容，以控制每个桶的平均负载在常量级别。

map 扩容机制的核心点包括：

（1）扩容分为增量扩容和等量扩容；

（2）增量扩容：当 map 的负载因子（key 数量 / 桶数量）大于 6.5 时，会触发增量扩容，将桶数组长度扩大为原值的两倍。

（3) 等量扩容：当桶内溢出桶数量大于等于 2^B 时( B 为桶数组长度的指数，B 最大取 15)，发生等量扩容，桶的长度保持为原值；等量扩容用于处理 hash 分布不均（热点 key）导致的溢出桶爆炸问题，此时不会改变桶数量，而是重新散列所有 key。

（4）采用渐进扩容的方式，当桶被实际操作到时，由使用者负责完成数据迁移，避免因为一次性的全量数据迁移引发性能抖动。渐进迁移通过“读写操作触发搬迁”的方式，把迁移成本分摊到用户的操作中，避免瞬时卡顿。

渐进扩容的核心流程

扩容时，Go map 并不会马上把所有旧桶迁移完，而是：

分配一份新的 bucket 数组（oldbuckets 和 newbuckets 并存）；
设置 oldbuckets 指针指向旧桶；
在后续的 每次写入操作（插入/删除）中，顺便“搬一两个旧桶”的数据到新桶中；
迁移的数据会被重新计算 hash 值并分配到新桶；
同时维护一个 nevacuate 指针记录迁移进度，当 nevacuate 指向了所有旧桶的末尾，说明所有桶都搬迁完毕。
当所有旧桶都迁移完毕后，map 会把旧桶释放，指针 oldbuckets 清空。才切换为新桶数组。

在 map 渐进扩容过程中，如何判断一个旧桶的 key 应该迁移到新桶的哪个桶中？

在 Go 的 map 中，扩容时新桶数量 = 旧桶数量 × 2（增量扩容）。

因此：一个旧桶中的 key-value 对只可能被迁移到两个新桶中的一个。

判断规则是：对于旧桶中每个 key，根据 hash 值重新计算位置，看是落到原桶 b，还是 b + oldBucketCount。

因为旧桶的索引是 h & (oldBucketCount - 1)，即取 hash 的低 B-1 位。

扩容后只是多了一位（第 B 位），也就是说：

如果这第 B 位是 0：落到原桶 b；
如果这第 B 位是 1：落到新桶 b + oldBucketCount。

Go map的桶（bucket）结构是：

每个桶可以存放 最多 8 个 key-value 对；
为了加快桶内查找效率，Go 把 每个 key 的 hash 值的“高 8 位” 存进一个 tophash 数组（长度为 8，对应每个 key）；
而 hash 值的 低若干位（具体取决于桶数量） 用于决定 key 属于哪个桶。

数据结构

hmap

type hmap struct {// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.// Make sure this stays in sync with the compiler's definition.count     int // # live cells == size of map.  Must be first (used by len() builtin)flags     uint8B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for detailshash0     uint32 // hash seedbuckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growingnevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)clearSeq   uint64extra *mapextra // optional fields
}

（1）count：map 中的 key-value 总数；

（2）flags：map 状态标识，可以标识出 map 是否被 goroutine 并发读写；

（3）B：桶数组长度的指数，桶数组长度为 2^B；

（4）noverflow：map 中溢出桶的数量；

（5）hash0：hash 随机因子，生成 key 的 hash 值时会使用到；

（6）buckets：桶数组；

（7）oldbuckets：扩容过程中老的桶数组；

（8）nevacuate：扩容时的进度标识，index 小于 nevacuate 的桶都已经由老桶转移到新桶中；

（9）extra：预申请的溢出桶.

mapextra

这个mapextra是hmap中的一个“辅助字段”，专门用于管理溢出桶（overflow buckets）

type mapextra struct {// If both key and elem do not contain pointers and are inline, then we mark bucket// type as containing no pointers. This avoids scanning such maps.// However, bmap.overflow is a pointer. In order to keep overflow buckets// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.// overflow and oldoverflow are only used if key and elem do not contain pointers.// overflow contains overflow buckets for hmap.buckets.// oldoverflow contains overflow buckets for hmap.oldbuckets.// The indirection allows to store a pointer to the slice in hiter.overflow    *[]*bmapoldoverflow *[]*bmap// nextOverflow holds a pointer to a free overflow bucket.nextOverflow *bmap
}

在 map 初始化时，倘若容量过大，会提前申请好一批溢出桶，以供后续使用，这部分溢出桶存放在 hmap.mapextra 当中：

（1）mapextra.overflow：供桶数组 buckets 使用的溢出桶；

（2）mapextra.oldoverFlow: 扩容流程中，供老桶数组 oldBuckets 使用的溢出桶；

（3）mapextra.nextOverflow：下一个可用的溢出桶.

Go 会需要一个“溢出桶”来存放新插入的数据，这时候它有一个“拿溢出桶”的优先顺序：

优先从 mapextra.nextOverflow 取：这个是预先分配好的一批溢出桶
如果 nextOverflow 没桶了，就：去 overflow[] 列表里找是否有空闲的桶（可能是之前用过又回收的）
如果都没有，才会向 Go 的内存分配器（heap）重新申请一个新的溢出桶。

bmap

// A bucket for a Go map.
type bmap struct {// tophash generally contains the top byte of the hash value// for each key in this bucket. If tophash[0] < minTopHash,// tophash[0] is a bucket evacuation state instead.tophash [abi.OldMapBucketCount]uint8// Followed by bucketCnt keys and then bucketCnt elems.// NOTE: packing all the keys together and then all the elems together makes the// code a bit more complicated than alternating key/elem/key/elem/... but it allows// us to eliminate padding which would be needed for, e.g., map[int64]int8.// Followed by an overflow pointer.
}