OC方法缓存策略底层探究

源码结构分析

我们先来看一下OC类的最新结构定义：

struct objc_class : objc_object {
    // Class ISA;
    Class superclass;
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags
    // 函数部分...
}
struct objc_object {
private:
    isa_t isa;
public:
    // 函数部分...
}

之前的文章中我们分析了bits这个属性，里面存放的就是类的子结构，比如方法，属性，协议等。今天我们再来探究下上面的cache内部的结构和底层原理。

实操

相信大家对cache都不会陌生，即OC中的方法缓存，在objc_msgSend的流程中，最先查找的就是这个列表，那OC是如何维护这个列表的呢，内部的存储结构又是如何？今天我们就来一探究竟。

之前我们使用lldb调试了bits内部的存储结构，但是这个方式比较繁琐，今天我们换一种简单点的方式来Debug，我们首先镜像一个objc_class结构体，然后对镜像的结构体进行分析。

struct lly_bucket_t {
    SEL _sel;
    IMP _imp;
};
struct lly_cache_t {
    struct lly_bucket_t * _buckets;
    uint32_t _mask;
    uint16_t _flags;
    uint16_t _occupied;
};
struct lly_class_data_bits_t {
    Class objc_class;
    // Values are the FAST_ flags above.
    uintptr_t bits;
};
struct lly_objc_class {
    Class ISA;
    Class superclass;
    struct lly_cache_t cache;
    struct lly_class_data_bits_t bits;
};
void printCache(Class model) {
    
    struct lly_objc_class * llyClass = (__bridge struct lly_objc_class *)(model);
    struct lly_cache_t cache = llyClass->cache;
    NSLog(@"_occupied = %d,_mask = %d",cache._occupied,cache._mask);
    for (uint32_t i = 0; i < cache._mask; i++) {
        struct lly_bucket_t bucket = cache._buckets[i];
        NSLog(@"method : sel = %@, imp = %p",NSStringFromSelector(bucket._sel),bucket._imp);
    }
    
}
int main(int argc, const char * argv[]) {
    @autoreleasepool {
	    // insert code here...
	    LLYModel *objc2 = [LLYModel alloc];
	    Class llyClass = [LLYModel class];
	    printCache(llyClass);
    }
}

首先我们不调用任何实例方法，查看打印结果

1	2021-01-03 15:59:51.995108+0800 LLYObjc[1738:6603344] _occupied = 0,_mask = 0

可以看到，初始状态都是0，里面的for也没有进说明当前缓存列表为空。

然后我们分次调用方法并打印：

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        // insert code here...
        LLYModel *objc2 = [LLYModel alloc];
        Class llyClass = [LLYModel class];
        printCache(llyClass);
        
        [objc2 fun0];
        [objc2 fun1];
        
        printCache(llyClass);
                
        [objc2 fun2];
        [objc2 fun3];
        
        printCache(llyClass);
    }
    return 0;
}

查看打印结果如下：

2021-01-03 16:05:09.335088+0800 LLYModel[1804:6607181] _occupied = 0,_mask = 0
2021-01-03 16:05:09.335668+0800 LLYModel[1804:6607181] ---------------------------------------------------------------------------
2021-01-03 16:05:09.335785+0800 LLYModel[1804:6607181] -[LLYModel fun0]
2021-01-03 16:05:09.335875+0800 LLYModel[1804:6607181] -[LLYModel fun1]
2021-01-03 16:05:09.335959+0800 LLYModel[1804:6607181] _occupied = 2,_mask = 3
2021-01-03 16:05:09.336174+0800 LLYModel[1804:6607181] method : sel = fun1, imp = 0xbf90
2021-01-03 16:05:09.336245+0800 LLYModel[1804:6607181] method : sel = (null), imp = 0x0
2021-01-03 16:05:09.336344+0800 LLYModel[1804:6607181] method : sel = fun0, imp = 0xbfc0
2021-01-03 16:05:09.336397+0800 LLYModel[1804:6607181] ---------------------------------------------------------------------------
2021-01-03 16:05:09.336447+0800 LLYModel[1804:6607181] -[LLYModel fun2]
2021-01-03 16:05:09.336497+0800 LLYModel[1804:6607181] -[LLYModel fun3]
2021-01-03 16:05:09.336542+0800 LLYModel[1804:6607181] _occupied = 2,_mask = 7
2021-01-03 16:05:09.336618+0800 LLYModel[1804:6607181] method : sel = (null), imp = 0x0
2021-01-03 16:05:09.336737+0800 LLYModel[1804:6607181] method : sel = fun3, imp = 0xbf30
2021-01-03 16:05:09.336818+0800 LLYModel[1804:6607181] method : sel = (null), imp = 0x0
2021-01-03 16:05:09.344380+0800 LLYModel[1804:6607181] method : sel = (null), imp = 0x0
2021-01-03 16:05:09.344554+0800 LLYModel[1804:6607181] method : sel = fun2, imp = 0xbf60
2021-01-03 16:05:09.344657+0800 LLYModel[1804:6607181] method : sel = (null), imp = 0x0
2021-01-03 16:05:09.344746+0800 LLYModel[1804:6607181] method : sel = (null), imp = 0x0
2021-01-03 16:05:09.344833+0800 LLYModel[1804:6607181] ---------------------------------------------------------------------------

分析打印结果，我们存在几个疑惑的地方：

_occupied 和 _mask 分别是什么，为什么调用方法会改变它们的值？
方法的调用顺序和缓存列表的顺序为什么不一致？
当我们调用后面的方法后，前面缓存的方法为什么会丢失？

带着上面的问题，我们去源码中看看能不能找到满意的答案。

源码逻辑分析

通过对occupied关键字的搜索，我们最终定位到下面这个函数：

ALWAYS_INLINE
void cache_t::insert(Class cls, SEL sel, IMP imp, id receiver)
{	 
	// 去掉我们不关系的旁枝末节
    // Use the cache as-is if it is less than 3/4 full
    // 每次新插入缓存时occupied + 1，这里我们大概就能猜到它的含义了，就是保存当前已缓存方法的数量。
    mask_t newOccupied = occupied() + 1;
    unsigned oldCapacity = capacity(), capacity = oldCapacity;
    
    // 如果当前缓存列表为空 去创建一个
    if (slowpath(isConstantEmptyCache())) {
        // Cache is read-only. Replace it.
        // INIT_CACHE_SIZE_LOG2 = 2,
    	 // INIT_CACHE_SIZE      = (1 << INIT_CACHE_SIZE_LOG2),
    	 // 看这里，初始的缓存列表的容量是 1 << 2 = 4。
        if (!capacity) capacity = INIT_CACHE_SIZE;
        reallocate(oldCapacity, capacity, /* freeOld */false);
    }
    else if (fastpath(newOccupied + CACHE_END_MARKER <= capacity / 4 * 3)) {
        // Cache is less than 3/4 full. Use it as-is.
        // (当前缓存列表用到的容量 + 1 ) <= 总容量的四分之三时 说明还够用 啥也不用干
    }
    else {
    	 // 要扩容了，新的容量大小为当前容量的2倍哦 当然有最大值的限制哈。 
    	 // 具体扩容逻辑下看面的扩容函数注释
        capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE;
        if (capacity > MAX_CACHE_SIZE) {
            capacity = MAX_CACHE_SIZE;
        }
        reallocate(oldCapacity, capacity, true);
    }
    bucket_t *b = buckets();
    // _mask = 当前容量 - 1
    mask_t m = capacity - 1;
    // 缓存的索引通过这个hash计算 
    mask_t begin = cache_hash(sel, m);
    mask_t i = begin;
    // Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot because the
    // minimum size is 4 and we resized at 3/4 full.
    do {
    	 // 如果当前索引没有数据，直接插入
        if (fastpath(b[i].sel() == 0)) {
        	  // _occupied++;
            incrementOccupied();
            b[i].set<Atomic, Encoded>(sel, imp, cls);
            return;
        }
        // 如果当前索引内存的就是传入的方法，直接返回
        if (b[i].sel() == sel) {
            // The entry was added to the cache by some other thread
            // before we grabbed the cacheUpdateLock.
            return;
        }
        // 否则就是寻找下一个索引
    } while (fastpath((i = cache_next(i, m)) != begin));
    cache_t::bad_cache(receiver, (SEL)sel, cls);
}
/// 初始化和扩容都走这个函数
ALWAYS_INLINE
void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity, bool freeOld)
{
    bucket_t *oldBuckets = buckets();
    // 分配内存
    bucket_t *newBuckets = allocateBuckets(newCapacity);
    // Cache's old contents are not propagated. 
    // This is thought to save cache memory at the cost of extra cache fills.
    // fixme re-measure this
    ASSERT(newCapacity > 0);
    ASSERT((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);
	
	// 存储新创建缓存列表 _occupied置0
    setBucketsAndMask(newBuckets, newCapacity - 1);
    
    // 要扩容了 free调旧值
    if (freeOld) {
        cache_collect_free(oldBuckets, oldCapacity);
    }
}
// 当前索引 
static inline mask_t cache_hash(SEL sel, mask_t mask) 
{
    return (mask_t)(uintptr_t)sel & mask;
}
// 下一个索引
static inline mask_t cache_next(mask_t i, mask_t mask) {
    return (i+1) & mask;
}

这又是一个内联函数，意味这你在调用堆栈里面是看不到的。但是我们通过源码还是可以Debug进来的。
通过以上源码的分析过程，应该能够回答我们上面的疑问了。

_occupied表示当前已缓存的的方法数，_mask标识当前缓存列表最大数-1，有新的方法调用_occupied就会更新，缓存列表扩容时_mask会更新。
存储到缓存列表中的方法并不一定是连续的，和具体的hash算法有关。
当缓存列表扩容后，之前缓存过的方法都会被清除，所以会丢失。

小结

当前的探索因为工程比较小，可能还不能看出方法cache的好处，在实际的工程中，方法的调用是大量且频繁的，这时就能体现出方法缓存的实际意义。之前的学习中对方法缓存丢失的逻辑不太了解，经过这次的探索，对整个方法缓存的策略有了更深刻的理解。