redis數據結構及內部編碼-string數據結構

  • 2019 年 10 月 4 日
  • 筆記

在redis中,當我們想要知道一個key的類型的時候,我們可以使用type命令 eg

127.0.0.1:6379> set a "123"  OK  127.0.0.1:6379> type a  string

如果這個key不存在的話,會返回none eg:

127.0.0.1:6379> type abcd  none

type命令實際返回的就是當前鍵的數據結構類型,它們分別是:

  • string(字符串)
  • hash(哈希)
  • list(列表)
  • set(集合)
  • zset(有序集合) 但這些只是Redis對外的數據結構。每種數據結構都有自己底層的內部實現,並且每個都有多種實現,這樣方便redis在合適的場景選擇適合當前的編碼方式。 下圖是redis每種數據結構對應的內部編碼

redis數據結構內部編碼 我們 可以通過 object encoding命令查詢 eg:

127.0.0.1:6379> set hello "sss"  OK  127.0.0.1:6379> object encoding hello  "embstr"  127.0.0.1:6379> set hel "123"  OK  127.0.0.1:6379> object encoding hel  "int"  127.0.0.1:6379> set bigstr "dddddddddddfffffffffffdddddddddddddddddddddddddddddddddddddddddddsssssss"  OK  127.0.0.1:6379> object encoding bigstr  "raw"

從上面查詢的結果我們可以看到,redis的string數據結構會根據輸入的value不同使用不同的數據結構。 下面我們從源碼(基於redis 5.0.5)來分析下 在redis中,的每個鍵值內部都是使用一個名字叫做 redisObject 這個 C語言結構體保存的,其代碼如下:

typedef struct redisObject {      unsigned type:4;      unsigned encoding:4;      unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or                              * LFU data (least significant 8 bits frequency                              * and most significant 16 bits access time). */      int refcount;      void *ptr;  } robj;
  • type:表示鍵值的數據類型,包括 String、List、Set、ZSet、Hash
  • encoding:表示鍵值的內部編碼方式,從 Redis源碼看目前取值有如下幾種:
/* Objects encoding. Some kind of objects like Strings and Hashes can be   * internally represented in multiple ways. The 'encoding' field of the object   * is set to one of this fields for this object. */  #define OBJ_ENCODING_RAW 0     /* Raw representation */  #define OBJ_ENCODING_INT 1     /* Encoded as integer */  #define OBJ_ENCODING_HT 2      /* Encoded as hash table */  #define OBJ_ENCODING_ZIPMAP 3  /* Encoded as zipmap */  #define OBJ_ENCODING_LINKEDLIST 4 /* No longer used: old list encoding. */  #define OBJ_ENCODING_ZIPLIST 5 /* Encoded as ziplist */  #define OBJ_ENCODING_INTSET 6  /* Encoded as intset */  #define OBJ_ENCODING_SKIPLIST 7  /* Encoded as skiplist */  #define OBJ_ENCODING_EMBSTR 8  /* Embedded sds string encoding */  #define OBJ_ENCODING_QUICKLIST 9 /* Encoded as linked list of ziplists */  #define OBJ_ENCODING_STREAM 10 /* Encoded as a radix tree of listpacks */
  • refcount:表示該鍵值被引用的數量,即一個鍵值可被多個鍵引用。

String類型的內部編碼 在了解string類型的內部編碼之前,我們先看下SDS:

SDS(簡單動態字符串): 當你在閱讀源碼的時候,你可以很容易見到這個這個詞。在代碼里定義了5種SDS(源碼在sds.h)

  /* Note: sdshdr5 is never used, we just access the flags byte directly.   * However is here to document the layout of type 5 SDS strings. */  struct __attribute__ ((__packed__)) sdshdr5 {      unsigned char flags; /* 3 lsb of type, and 5 msb of string length */      char buf[];  };  struct __attribute__ ((__packed__)) sdshdr8 {      uint8_t len; /* used */      uint8_t alloc; /* excluding the header and null terminator */      unsigned char flags; /* 3 lsb of type, 5 unused bits */      char buf[];  };  struct __attribute__ ((__packed__)) sdshdr16 {      uint16_t len; /* used */      uint16_t alloc; /* excluding the header and null terminator */      unsigned char flags; /* 3 lsb of type, 5 unused bits */      char buf[];  };  struct __attribute__ ((__packed__)) sdshdr32 {      uint32_t len; /* used */      uint32_t alloc; /* excluding the header and null terminator */      unsigned char flags; /* 3 lsb of type, 5 unused bits */      char buf[];  };  struct __attribute__ ((__packed__)) sdshdr64 {      uint64_t len; /* used */      uint64_t alloc; /* excluding the header and null terminator */      unsigned char flags; /* 3 lsb of type, 5 unused bits */      char buf[];  };

從上面的代碼片段中,我們可以看出每個struct內的變量都差不多

  • len:字符串的長度(實際使用的長度)
  • alloc:分配內存的大小
  • flags:標誌位,低三位表示類型,其餘五位未使用
  • buf:字符數組

通過上面的一系列枯燥的鋪墊,我們開始切入正題

1. INT 編碼方式

當字符串鍵值的內容可以用一個64位有符號整型表示的時候,redis會將鍵值轉換為long類型來存儲,其對應的編碼類型為:OBJ_ENCODING_INT

對於set hel "123"命令,內存結構如下

Redis 啟動時會預先建立 10000 個分別存儲 0~9999 的 redisObject 變量作為共享對象,這就意味着如果 set字符串的鍵值在 0~10000 之間的話,則可以 直接指向共享對象 而不需要再建立新對象。

/* Check if we can represent this string as a long integer.     * Note that we are sure that a string larger than 20 chars is not     * representable as a 32 nor 64 bit integer. */    len = sdslen(s);    // 長度小於20 (64位有符號整型)    if (len <= 20 && string2l(s,len,&value)) {        /* This object is encodable as a long. Try to use a shared object.         * Note that we avoid using shared integers when maxmemory is used         * because every object needs to have a private LRU field for the LRU         * algorithm to work well. */        // 當value在[0,1000)的時候,使用字符串的共享策略        if ((server.maxmemory == 0 ||            !(server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS)) &&            value >= 0 &&            value < OBJ_SHARED_INTEGERS)        {            decrRefCount(o);            incrRefCount(shared.integers[value]);            return shared.integers[value];        } else {            if (o->encoding == OBJ_ENCODING_RAW) sdsfree(o->ptr);            o->encoding = OBJ_ENCODING_INT;            o->ptr = (void*) value;            return o;        }    }

2. EMBSTR編碼格式

Redis 在保存長度小於 44 位元組的字符串時會採用 OBJ_ENCODING_EMBSTR 編碼方式,源碼如下(object.c):

/* Create a string object with EMBSTR encoding if it is smaller than   * OBJ_ENCODING_EMBSTR_SIZE_LIMIT, otherwise the RAW encoding is   * used.   *   * The current limit of 44 is chosen so that the biggest string object   * we allocate as EMBSTR will still fit into the 64 byte arena of jemalloc. */  #define OBJ_ENCODING_EMBSTR_SIZE_LIMIT 44  robj *createStringObject(const char *ptr, size_t len) {      //字符串長度小於等於44的時候使用embstr編碼格式,大於44的時候使用raw編碼格式      if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT)          return createEmbeddedStringObject(ptr,len);      else          return createRawStringObject(ptr,len);  }    * Create a string object with encoding OBJ_ENCODING_EMBSTR, that is   * an object where the sds string is actually an unmodifiable string   * allocated in the same chunk as the object itself. */  robj *createEmbeddedStringObject(const char *ptr, size_t len) {      robj *o = zmalloc(sizeof(robj)+sizeof(struct sdshdr8)+len+1);      struct sdshdr8 *sh = (void*)(o+1);        o->type = OBJ_STRING;      o->encoding = OBJ_ENCODING_EMBSTR;      o->ptr = sh+1;      o->refcount = 1;      if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {          o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;      } else {          o->lru = LRU_CLOCK();      }        sh->len = len;      sh->alloc = len;      sh->flags = SDS_TYPE_8;      if (ptr == SDS_NOINIT)          sh->buf[len] = '';      else if (ptr) {          memcpy(sh->buf,ptr,len);          sh->buf[len] = '';      } else {          memset(sh->buf,0,len+1);      }      return o;  }

指令 set hello 「sss」 所設置的鍵值,其內存結構示意圖如下:

3. RAW 編碼格式

通過上面的源碼分析,當字符串鍵值的長度大於44的時候,redis會將鍵值的內部編碼方式改為OBJ_ENCODING_RAW格式

/* Create a string object with encoding OBJ_ENCODING_RAW, that is a plain   * string object where o->ptr points to a proper sds string. */  robj *createRawStringObject(const char *ptr, size_t len) {      return createObject(OBJ_STRING, sdsnewlen(ptr,len));  }      /* ===================== Creation and parsing of objects ==================== */    robj *createObject(int type, void *ptr) {      robj *o = zmalloc(sizeof(*o));      o->type = type;      o->encoding = OBJ_ENCODING_RAW;      o->ptr = ptr;      o->refcount = 1;        /* Set the LRU to the current lruclock (minutes resolution), or       * alternatively the LFU counter. */      if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {          o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;      } else {          o->lru = LRU_CLOCK();      }      return o;  }

與上面的 OBJ_ENCODING_EMBSTR 編碼方式的不同之處在於 此時動態字符串 sds 的內存與其依賴的 redisObject 的內存不再連續了