前言
我們都知道,redis是基于內(nèi)存的K-V數(shù)據(jù)庫。由于內(nèi)存是斷電易失的,所以redis提供了相應(yīng)的持久化機(jī)制。
本篇主要講解redis提供的RDB和AOF兩種持久化方式,以及他們的實現(xiàn)原理。
RDB
RDB(Redis DataBase)是指把某個時刻內(nèi)存中的數(shù)據(jù)生成快照(snapshot),以dump.rdb文件的形式存在磁盤上。RDB每次生成的快照(snapshot)都是redis中的全量數(shù)據(jù)。
生成快照可以由兩個命令完成,分別是save和bgsave,先看下這兩個命令的描述
127.0.0.1:6379> help save
SAVE -
summary: Synchronously save the dataset to disk
since: 1.0.0
group: server
127.0.0.1:6379> help bgsave
BGSAVE -
summary: Asynchronously save the dataset to disk
since: 1.0.0
group: server
從描述上來看,這兩個命令實現(xiàn)的功能一模一樣,只是save是以同步的方式寫入磁盤,而bgsave是以異步的方式,bg就是Background的意思。
事實上調(diào)用save命令后,redis進(jìn)程會被阻塞,直到快照生成完成,期間redis不能對外提供服務(wù)。而bgsave會調(diào)用linux的fork()函數(shù)來創(chuàng)建一個子進(jìn)程,讓子進(jìn)程來生成快照,期間redis依然可以對外提供服務(wù)。
了解了RDB的相關(guān)命令,再來思考下這個問題:假設(shè)redis中有6G數(shù)據(jù),要給這6G數(shù)據(jù)生成一個快照,不可能在一瞬間完成,肯定會持續(xù)一段時間。那么從快照開始生成(t1),到快照生成成功(t2)的這段時間內(nèi),redis中被修改的數(shù)據(jù)應(yīng)該怎么處理?持久化的數(shù)據(jù)應(yīng)該是t1時刻的數(shù)據(jù),還是t2時刻的數(shù)據(jù)呢?
對于save的方式來說,生成快照期間,redis不能對外提供服務(wù),所以在t1到t2期間不會有數(shù)據(jù)被修改。但是對于bgsave方式來說,生成快照期間,redis依然可以對外提供服務(wù),所以極有可能有些數(shù)據(jù)被修改。這時子進(jìn)程是根據(jù)t1時刻的數(shù)據(jù)來生成快照的。t1到t2期間被修改的數(shù)據(jù)只能在下一次生成快照時處理。但是在t1到t2期間被修改的值,對外部調(diào)用方來說是可以實時訪問的。也就是說redis不僅要存儲快照生成點(t1)時刻的所有值,還要存儲變量的最新值。這樣的話,redis中6G的數(shù)據(jù),在生成快照的時候,會瞬間變成12G。
但是事實并非如此,以性能著稱的redis肯定不允許這樣的事發(fā)生。那這個問題是如果解決的呢?這樣就不得不說copy on write機(jī)制了
copy on write(COW,寫時復(fù)制)是一種計算機(jī)程序設(shè)計領(lǐng)域的優(yōu)化策略。
其核心思想是,如果有多個調(diào)用者(callers)同時請求相同資源(如內(nèi)存或磁盤上的數(shù)據(jù)存儲),他們會共同獲取相同的指針指向相同的資源,直到某個調(diào)用者試圖修改資源的內(nèi)容時,系統(tǒng)才會真正復(fù)制一份專用副本(private copy)給該調(diào)用者,而其他調(diào)用者所見到的最初的資源仍然保持不變。這過程對其他的調(diào)用者都是透明的(transparently)。
前文提到調(diào)用bgsave時,會調(diào)用linux系統(tǒng)的fork()函數(shù)來創(chuàng)建子進(jìn)程,讓子進(jìn)程去生成快照。fork()函數(shù)實現(xiàn)了copy on write機(jī)制。
如下圖所示,redis調(diào)用bgsave之后,bgsave調(diào)用fork。也就是在t1時刻,內(nèi)存中的數(shù)據(jù)并不會為了兩個進(jìn)程而復(fù)制成兩份,而是兩個進(jìn)程中的指針都指向同一個內(nèi)存地址。
此時子進(jìn)程開始生成快照,如果在生成快照期間,redis中的數(shù)據(jù)被修改了,k3的值由c變成了d。操作系統(tǒng)僅僅會把k3復(fù)制一份,而沒有變化的k1和k2不會被復(fù)制。這就是寫時復(fù)制(copy on write)機(jī)制??梢钥吹酱藭r子進(jìn)程取到的數(shù)據(jù)還是t1時刻的數(shù)據(jù),而redis對外提供的服務(wù)也能獲取最新數(shù)據(jù)。
此處用copy on write優(yōu)化的前提是生成快照的過程持續(xù)的時間較短,期間只有少量的數(shù)據(jù)發(fā)生了變化。如果期間所有的數(shù)據(jù)都發(fā)生了變化,也就相當(dāng)于真的把6G數(shù)據(jù)變成了12G。
寫時復(fù)制是一種優(yōu)化思想,在JDK中也能看它的實現(xiàn)
配置
前文說RDB模式生成快照的命令是save和bgsave,但是在實際使用redis的時候,也沒見我們定期手動執(zhí)行這兩個命令。所以快照的生成還有一種自動的觸發(fā)方式,在配置文件中可以找到相關(guān)的配置
################################ SNAPSHOTTING ################################
#
# Save the DB on disk:
#
# save <seconds> <changes>
#
# Will save the DB if both the given number of seconds and the given
# number of write operations against the DB occurred.
#
# In the example below the behaviour will be to save:
# after 900 sec (15 min) if at least 1 key changed
# after 300 sec (5 min) if at least 10 keys changed
# after 60 sec if at least 10000 keys changed
#
# Note: you can disable saving completely by commenting out all "save" lines.
#
# It is also possible to remove all the previously configured save
# points by adding a save directive with a single empty string argument
# like in the following example:
#
# save ""
save 900 1
save 300 10
save 60 10000
save配置表示調(diào)用bgsave。save 60 10000表示如果在60秒內(nèi),超過10000個key被修改了,就調(diào)用一次bgsave。同理save 300 10表示300秒內(nèi),超過10個key被修改了,就調(diào)用一次bgsave。多個save不是互斥的,如果配置多個save,只要滿足其中一個就會執(zhí)行bgsave,配置多個是為了適應(yīng)不同的場景。
配置save ""或者注釋所有的save表示不開啟RDB。
從配置文件配置的save參數(shù)來看,如果每60秒執(zhí)行一次bgsave,而在59秒的時候服務(wù)宕機(jī)了,這樣就丟失了59秒內(nèi)修改的數(shù)據(jù),因為還沒來得及生成快照。數(shù)據(jù)丟失量這么大,肯定是不被允許的。為此,redis還提供了另一種持久化方式,那就是AOF
AOF
AOF(Append Only File)是把對redis的修改命令以特定的格式記錄在指定文件中。也就是說RDB記錄的是數(shù)據(jù)快照,而AOF記錄的是命令。AOF默認(rèn)是關(guān)閉的。
############################## AppEND ONLY MODE ###############################
# By default Redis asynchronously dumps the dataset on disk. This mode is
# good enough in many applications, but an issue with the Redis process or
# a power outage may result into a few minutes of writes lost (depending on
# the configured save points).
#
# The Append Only File is an alternative persistence mode that provides
# much better durability. For instance using the default data fsync policy
# (see later in the config file) Redis can lose just one second of writes in a
# dramatic event like a server power outage, or a single write if something
# wrong with the Redis process itself happens, but the operating system is
# still running correctly.
#
# AOF and RDB persistence can be enabled at the same time without problems.
# If the AOF is enabled on startup Redis will load the AOF, that is the file
# with the better durability guarantees.
#
# Please check http://redis.io/topics/persistence for more information.
appendonly no
# The name of the append only file (default: "appendonly.aof")
appendfilename "appendonly.aof"
如果開啟了AOF,相應(yīng)的命令會記錄在appendonly.aof文件中。
appendonly.aof這個文件的內(nèi)容本身也需要寫到磁盤中,如果appendonly.aof還未來得及寫入磁盤,服務(wù)就宕機(jī)了,也會造成appendonly.aof文件內(nèi)容丟失,而丟失redis的修改命令,進(jìn)而丟失redis的修改數(shù)據(jù)。
為此redis為appendonly.aof的持久化提供了三種配置方式:
# The fsync() call tells the Operating System to actually write data on disk
# instead of waiting for more data in the output buffer. Some OS will really flush
# data on disk, some other OS will just try to do it ASAP.
#
# Redis supports three different modes:
#
# no: don't fsync, just let the OS flush the data when it wants. Faster.
# always: fsync after every write to the append only log. Slow, Safest.
# everysec: fsync only one time every second. Compromise.
#
# The default is "everysec", as that's usually the right compromise between
# speed and data safety. It's up to you to understand if you can relax this to
# "no" that will let the operating system flush the output buffer when
# it wants, for better performances (but if you can live with the idea of
# some data loss consider the default persistence mode that's snapshotting),
# or on the contrary, use "always" that's very slow but a bit safer than
# everysec.
#
# More details please check the following article:
# http://antirez.com/post/redis-persistence-demystified.html
#
# If unsure, use "everysec".
# appendfsync always
appendfsync everysec
# appendfsync no
這三種方式都是通過參數(shù)appendfsync來指定。
- no:并不是不持久化,只將數(shù)據(jù)寫到OS buffer,由操作系統(tǒng)決定何時將數(shù)據(jù)寫到磁盤,這種方式速度最快
- always:每次在appendonly.aof中追加內(nèi)容,都調(diào)用fsync()將數(shù)據(jù)寫入磁盤,這種方式最慢,但是最安全
- everysec:默認(rèn)配置,表示每秒調(diào)用一次fsync(),將數(shù)據(jù)寫入磁盤,是一種折中的方式
根據(jù)配置可以知道,如果每秒將appendonly.aof的內(nèi)容寫到磁盤一次。那么在兩次寫磁盤的間隔,如果服務(wù)宕機(jī)了,還是有可能丟失部分命令,從而導(dǎo)致redis的修改數(shù)據(jù)丟失,不過相比于RDB來說,這種丟失已經(jīng)非常非常小了。
除此之外,appendonly.aof文件是以追加的方式寫入命令,對于長時間運行的服務(wù),必定會導(dǎo)致該文件過大。萬一服務(wù)宕機(jī)需要根據(jù)appendonly.aof文件恢復(fù)數(shù)據(jù),將會消耗相當(dāng)長的時間來執(zhí)行appendonly.aof中記錄的命令。
為了解決appendonly.aof文件過大的問題redis提供了一種機(jī)制,叫bgrewriteaof。
bgrewriteaof
bgrewriteaof命令描述如下
127.0.0.1:6379> help bgrewriteaof
BGREWRITEAOF -
summary: Asynchronously rewrite the append-only file
since: 1.0.0
group: server
這個命令的作用就是fork()出一個子進(jìn)程來對appendonly.aof文件進(jìn)行重寫。這個重寫操作在redis4.0以前和4.0以后有不同的實現(xiàn)方式。
redis4.0以前的重寫主要有兩點:刪除抵消的命令、合并重復(fù)的命令。對于set key1 a和del key1這樣相互抵消的命令會被直接刪除。對于set key1 a和set key1 b這樣重復(fù)的命令會進(jìn)行合并。這樣一通操作之后,AOF文件可能會變得很小。
redis4.0之后,開啟了RDB和AOF的混合模式。也就是將已有的數(shù)據(jù)以RDB的方式記錄在appendonly.aof文件的頭部,對于之后的增量數(shù)據(jù)以AOF的方式繼續(xù)追加在appendonly.aof文件中,也就是appendonly.aof文件前半段是快照數(shù)據(jù),后半段是redis指令。
這樣的混合模式結(jié)合了RDB和AOF的優(yōu)點,既能最大限度的減少數(shù)據(jù)丟失,又能在Redis重啟后迅速恢復(fù)數(shù)據(jù)。
那么在什么情況下會觸發(fā)bgrewriteaof呢?除了手動觸發(fā),配置文件中提供了幾個相關(guān)參數(shù)來實現(xiàn)自動觸發(fā)
# Automatic rewrite of the append only file.
# Redis is able to automatically rewrite the log file implicitly calling
# BGREWRITEAOF when the AOF log size grows by the specified percentage.
#
# This is how it works: Redis remembers the size of the AOF file after the
# latest rewrite (if no rewrite has happened since the restart, the size of
# the AOF at startup is used).
#
# This base size is compared to the current size. If the current size is
# bigger than the specified percentage, the rewrite is triggered. Also
# you need to specify a minimal size for the AOF file to be rewritten, this
# is useful to avoid rewriting the AOF file even if the percentage increase
# is reached but it is still pretty small.
#
# Specify a percentage of zero in order to disable the automatic AOF
# rewrite feature.
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
auto-aof-rewrite-min-size參數(shù)設(shè)置成64mb,意思是redis尚未執(zhí)行過bgrewriteaof(從啟動開始算),AOF文件需要達(dá)到64mb才會第一次執(zhí)行bgrewriteaof(此后不會再使用auto-aof-rewrite-min-size參數(shù)),redis會記錄每次執(zhí)行bgrewriteaof之后,AOF文件的大小。
auto-aof-rewrite-percentage設(shè)置成100,表示當(dāng)前的AOF文件大小超過上一次bgrewriteaof后AOF文件的百分比后觸發(fā)bgrewriteaof。如果上次bgrewriteaof后,AOF為200mb,現(xiàn)在需要AOF文件達(dá)到400mb才會執(zhí)行bgrewriteaof。
auto-aof-rewrite-percentage設(shè)置成0,表示禁用bgrewriteaof。auto-aof-rewrite-min-size參數(shù)的作用就是在AOF文件比較小的時候,防止因為增長過快而頻繁調(diào)用bgrewriteaof。
no-appendfsync-on-rewrite
redis主進(jìn)程在寫AOF文件采用always或者everysec配置,和子進(jìn)程在重寫AOF文件的時候,都會產(chǎn)生大量的I/O操作??赡軙筬sync阻塞很長時間,為了緩解這個問題,redis提供了no-appendfsync-on-rewrite這個參數(shù)
# When the AOF fsync policy is set to always or everysec, and a background
# saving process (a background save or AOF log background rewriting) is
# performing a lot of I/O against the disk, in some Linux configurations
# Redis may block too long on the fsync() call. Note that there is no fix for
# this currently, as even performing fsync in a different thread will block
# our synchronous write(2) call.
#
# In order to mitigate this problem it's possible to use the following option
# that will prevent fsync() from being called in the main process while a
# BGSAVE or BGREWRITEAOF is in progress.
#
# This means that while another child is saving, the durability of Redis is
# the same as "appendfsync none". In practical terms, this means that it is
# possible to lose up to 30 seconds of log in the worst scenario (with the
# default Linux settings).
#
# If you have latency problems turn this to "yes". Otherwise leave it as
# "no" that is the safest pick from the point of view of durability.
no-appendfsync-on-rewrite no
如果開啟該參數(shù),表示在bgsave和bgrewriteaof的過程中,主線程寫入AOF不會調(diào)用fsync(),相當(dāng)于配置appendfsync no。這樣有可能會導(dǎo)致redis的修改命令丟失,Linux默認(rèn)配置下,最多丟失30秒的數(shù)據(jù)。
如果關(guān)閉該參數(shù),表示在bgsave和bgrewriteaof的過程中,主線程寫入AOF會調(diào)用fsync(),并且被阻塞,這樣是最安全的,不會丟失數(shù)據(jù)。
總結(jié)
本文主要講解redis兩種持久化方式RDB和AOF,以及他們的實現(xiàn)原理。此外,還講解了AOF文件過大怎么處理。了解這些內(nèi)容,可以幫助我們更好的使用redis。
作者:Sicimike
原文鏈接:https://blog.csdn.net/Baisitao_/article/details/105461153