Golang Runtime 内存模型

楔子

读《Go语言学习笔记》下卷

粗读一遍，记录下值得所关注以及记忆的内容。较少涉及汇编内容。

本文整理过程中可能包含了不同版本的源码，1.5.1、1.16。

init

初始化过程中，会涉及到内存分配、垃圾回收器、并发调度器的初始化，还有诸如调度器、环境变量、调试信号等等。这些运行时的初始化完毕后，进而执行runtime.main 。其中会定义执行栈、系统后台监控（GC、并发调度任务）、垃圾回收器后台操作。

值得开发者注意的是包初始化函数的执行。

与此相关的就是runtime_init和main_init。这两个函数均由编译器动态生成。

runtime和main内的多个init函数被赋予唯一的符号名，再由runtime.init和main.init今天统一调用。

同时zversion.go也是动态生成的，用来定义一些版本信息等。

main内的init函数主要用于调用lib以及被引用的包。

最后需要记住：所有init函数都会在同一个goroutine内进行；所有init函数结束后才会执行main.main

内存管理

前置知识

我们知道，程序运行时，内存大致是这样排布的。

←高地址方向
| 内核段 | 栈空间段 | 空闲 | 动态链接库 | 空闲 | 堆空间段 | 数据段 | 代码段 | 保留 |

栈由高地址向低地址增长。
堆由低地址向高地址增长。

这里讲的地址空间段，是在虚拟地址空间内的。

进程内存空间的大小不等的分段，会被操作系统的分页机制，拆分成大小相等的页（Page）。

大小相等的（虚拟）页，会被打散到物理地址空间内的（物理）页上。

虚拟到物理，这个过程对于进程来说是不关心的，由操作系统进行调度。

下文中所述的内存空间，内存页，内存段，均以虚拟地址空间为标准。

分配

由于Golang内置运行时，故会抛弃传统的内存分配方式，避开系统调用带来的性能问题，也为了更好的处理GC问题。

Golang的内存分配基本策略如下：

每次申请一大块内存，减少小对象操作时的频繁系统调用。
大块内存切分，形成链表
分配内存时，从链表上取合适的一小块即可。
GC时，归还到链表上。
若链表过大，尝试归还内存。

基本流程就是申请一块大的进行自我管理，按需归还。

TIPS：内存管理不关心对象状态，只有对象被回收后，才会触发内存回收。

Span

Span是Golang内存管理的基本单位，每个Span管理指定规格（以Page计）的内存块，且页地址连续。Span大小一定是8KB的整数倍。一共有67种规格的Span。

//from runtime.go\sizeclasses.go

// class  bytes/obj  bytes/span  objects  tail waste  max waste
//     1          8        8192     1024           0     87.50%
//     2         16        8192      512           0     43.75%
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
//    65      28672       57344        2           0      4.91%
//    66      32768       32768        1           0     12.50%

上述列出了66种不同的Span

如ID为1的Span：Object大小是8b（class_to_size[1]=8）；Span大小是8KB，为1页（class_to_allocnpages[1]=1）；该Span内可存放1024个Object。
Object最大为32KB，对于大于32KB的对象（大对象），会被区别对待，Class ID为0。
Span结构体中mspan 存储了起始页号、页数、待分配Object链表等信息，而mspan之间以双向链表结构进行连接

type mspan struct {
    next *mspan     // next span in list, or nil if none
    prev *mspan     // previous span in list, or nil if none
    list *mSpanList // For debugging. TODO: Remove.

    startAddr uintptr // address of first byte of span aka s.base()
    npages    uintptr // number of pages in span

    manualFreeList gclinkptr // list of free objects in mSpanManual spans

    // Object n starts at address n*elemsize + (start << pageShift).
    freeindex uintptr
    nelems uintptr // number of object in the span.

    allocCache uint64

    allocBits  *gcBits
    gcmarkBits *gcBits

    // sweep generation:
    // if sweepgen == h->sweepgen - 2, the span needs sweeping
    // if sweepgen == h->sweepgen - 1, the span is currently being swept
    // if sweepgen == h->sweepgen, the span is swept and ready to use
    // if sweepgen == h->sweepgen + 1, the span was cached before sweep began and is still cached, and needs sweeping
    // if sweepgen == h->sweepgen + 3, the span was swept and then cached and is still cached
    // h->sweepgen is incremented by 2 after every GC

    sweepgen    uint32
    divMul      uint16        // for divide by elemsize - divMagic.mul
    baseMask    uint16        // if non-0, elemsize is a power of 2, & this will get object allocation base
    allocCount  uint16        // number of allocated objects
    spanclass   spanClass     // size class and noscan (uint8)
    state       mSpanStateBox // mSpanInUse etc; accessed atomically (get/set methods)
    needzero    uint8         // needs to be zeroed before allocation
    divShift    uint8         // for divide by elemsize - divMagic.shift
    divShift2   uint8         // for divide by elemsize - divMagic.shift2
    elemsize    uintptr       // computed from sizeclass or from npages
    limit       uintptr       // end of data in span
    speciallock mutex         // guards specials list
    specials    *special      // linked list of special records sorted by offset.
}

Object

Span章节种讲到了Object。

讲Span按照特定大小切分（见表），每个小块即存储一个Object。

mCache

每一个Runtime的工作线程（M），会有一个mcache，由于是一个线程和一个mcache对应，故mcache无需加锁。工作线程（M）通过mcache管理每个GoRoutine（G）需要的内存，即从mcache中获取空闲Span。

根据分配对象的大小，内部会使用不同的内存分配机制，详细参考函数 mallocgo()

<16B 会使用微小对象内存分配器，主要使用 mcache.tinyXXX 这类的字段
16B-32KB 从P下面的 mcache 中分配
>32KB 直接从 mheap 中分配（大对象）

type mcache struct {
    nextSample uintptr 
    scanAlloc  uintptr 

//小对象
    tiny       uintptr
    tinyoffset uintptr
    tinyAllocs uintptr

// 16B-32KB大小对象的存储之处，所以Class ID为0的Span不可能存在这里
    alloc [numSpanClasses]*mspan 

    stackcache [_NumStackOrders]stackfreelist //每个 G 绑定的栈空间
    flushGen uint32
}

每个span存两次，一个不包含指针的对象列表和另一个包含指针的对象列表。这种区别将使垃圾收集的工作更容易，因为它不必扫描不包含任何指针的范围。
第一组span对象中包含了指针，叫做scan，表示需要gc scan；第二组没有指针，叫做noscan。提高gc scan性能。
mcache初始没有span，G先从central动态申请span，并缓存在cache。

Central

按Span class对Span分类，串联成链表，当mcache的某个级别Span的内存被分配光时，它会向mcentral申请1个当前级别的Span。所有线程共享的缓存，会有多个线程访问central，故需要加锁访问。

//go 1.16
type mcentral struct {
    spanclass spanClass //(uint8)
    partial [2]spanSet // list of spans with a free object
    full    [2]spanSet // list of spans with no free objects
}

type spanSet struct { //有点略像Slice的实现
    spineLock mutex
    spine     unsafe.Pointer // *[N]*spanSetBlock, accessed atomically
    spineLen  uintptr        // Spine array length, accessed atomically
    spineCap  uintptr        // Spine array cap, accessed under lock
    index headTailIndex
}

每个mCentral包含两个spanSet

full：双向span链表，包括没有空闲对象的Span或缓存mCache中的Span。当此处的Span被释放时，它将被移至partial链表。
partial：有空闲对象的span双向链表。当从mCentral请求新的Span，mCentral将从该链表中获取span并将其移入full链表。
Tips：在1.5.1版本里面，一个Central只有一个同步锁，上述代码选自1.16版本，我们可以看到同步锁粒度更小，猜测因此带来的性能更好。同时，原先的mSpanList变成了spanSet 。命名也发生了变化，个人觉得新版的源码使用full、partial相比empty和nonempty来说更易理解。

Heap

mCentral只管理特定的大小的Span，所以必然有一个更上层的数据结构，管理所有的central，这就是mheap。它把从OS申请出的内存页组织成Span，并保存起来。当mCentral的Span不够用时会向mHeap申请，mHeap的Span不够用时会向OS申请，向OS的内存申请是按页来的，然后把申请来的内存页生成Span组织起来，同样也是需要加锁访问的。大对象(>32KB)直接从mHeap上分配。

mHeap的结构相对比较复杂，每个Golang程序启动时候会向操作系统申请一块虚拟内存空间，但仅仅是虚拟内存空间，真正需要的时候才会发生缺页中断（OS层面），进而向系统申请真正的物理空间。在Golang1.11版本以后，申请的内存空间会放在一个heapArena数组里，由arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena表示，用于应用程序内存分配，根据源码公式，在64位非Windows系统分配大小是64MB，Windows 64位是4MB。

type mheap struct {
    // lock must only be acquired on the system stack, otherwise a g
    // could self-deadlock if its stack grows with the lock held.
    lock      mutex
    pages     pageAlloc // page allocation data structure
    sweepgen  uint32    // sweep generation, see comment in mspan; written during STW
    sweepdone uint32    // all spans are swept
    sweepers  uint32    // number of active sweepone calls

    allspans []*mspan // all spans out there

    _ uint32 // align uint64 fields on 32-bit for atomics

    pagesInUse         uint64  // pages of spans in stats mSpanInUse; updated atomically
    pagesSwept         uint64  // pages swept this cycle; updated atomically
    pagesSweptBasis    uint64  // pagesSwept to use as the origin of the sweep ratio; updated atomically
    sweepHeapLiveBasis uint64  // value of heap_live to use as the origin of sweep ratio; written with lock, read without
    sweepPagesPerByte  float64 // proportional sweep ratio; written with lock, read without
    scavengeGoal uint64

    reclaimIndex uint64

    reclaimCredit uintptr

    arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena

    heapArenaAlloc linearAlloc

    arenaHints *arenaHint

    arena linearAlloc

    allArenas []arenaIdx

    sweepArenas []arenaIdx

    markArenas []arenaIdx

    curArena struct {
        base, end uintptr
    }

    _ uint32 // ensure 64-bit alignment of central

    central [numSpanClasses]struct {
        mcentral mcentral
        pad      [cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize]byte
    }

    spanalloc             fixalloc // allocator for span*
    cachealloc            fixalloc // allocator for mcache*
    specialfinalizeralloc fixalloc // allocator for specialfinalizer*
    specialprofilealloc   fixalloc // allocator for specialprofile*
    speciallock           mutex    // lock for special record allocators.
    arenaHintAlloc        fixalloc // allocator for arenaHints

    unused *specialfinalizer // never set, just here to force the specialfinalizer type into DWARF
}

分配规则

tiny对象内存分配，直接向mCache的tiny对象分配器申请，如果空间不足，则向mCache的tinySpanClass规格的span链表申请，如果没有，则向mCentral申请对应规格mSpan，依旧没有，则向mHeap申请，最后都用光则向操作系统申请。
小对象内存分配，先向本线程mCache申请，发现mSpan没有空闲的空间，向mCentral申请对应规格的mSpan，如果mCentral对应规格没有，向mHeap申请对应页初始化新的mSpan，如果也没有，则向操作系统申请，分配页。
大对象内存分配，直接向mHeap申请spanClassID=0，如果没有则向操作系统申请。

分配流程

计算对象规格（Size）
从cache.alloc寻找对应规格的Span
从Span.manualFreeList 提取可用Object
若Span空间不够，从Central获取新的Span
如果Central没有对应的Span ，则从Heap中获取，并切分成Object链表。
如果Heap中没有合适大小的Span，则向操作系统申请新内存块。

释放流程

标记可回收的Object，归还给Span.manualFreeList 。
Span被放回Central，可供任意Cache获取使用。
若Span已回收全部Object，则归还给Heap，以便重新切分复用。
定期扫描Heap中闲置Span，释放其占用内存。

其他

对于大对象，直接从Heap中进行分配和回收。
不被共享的Cache是实现高性能的核心（没有锁）。Central是在多个Cache之间提高Object利用率，避免浪费。
回收操作会裁剪Span中空闲的部分，归还给Central
Span最终归还给Heap是为了在不同规格的Object需求之间平衡。例如：某规格Object需求短时激增。

本文标题：Golang Runtime 内存模型
本文连接：https://blog.dextercai.com/archives/150.html
除另行说明，本站文字内容采用创作共用版权 CC-BY-NC-ND 4.0 许可协议，版权归本人所有。
除另行说明，本站图片内容版权归本人所有，未经许可前，严禁以任何形式的使用。

即日起视情况关闭全站评论区，您可以通过关于页面的电邮地址和我取得联系，谢谢

Catalog