一、配置crush class
1. 創建ssd class
默認情況下,我們所有的osd都會class類型都是hdd:
# ceph osd crush class ls
[
"hdd"
]
查看當前的osd布局:
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-8 0 root cache
-7 0 host 192.168.3.9-cache
-1 0.37994 root default
-2 0 host 192.168.3.9
-5 0.37994 host kolla-cloud
0 hdd 0.10999 osd.0 up 1.00000 1.00000
1 hdd 0.10999 osd.1 up 1.00000 1.00000
2 hdd 0.10999 osd.2 up 1.00000 1.00000
3 hdd 0.04999 osd.3 up 1.00000 1.00000
將osd.3從 hdd class中刪除:
# ceph osd crush rm-device-class osd.3
done removing class of osd(s): 3
將這些osd.3添加至ssd class
# ceph osd crush set-device-class ssd osd.3
set osd(s) 3 to class 'ssd'
添加完成之后,我們再次查看osd布局:
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-8 0 root cache
-7 0 host 192.168.3.9-cache
-1 0.37994 root default
-2 0 host 192.168.3.9
-5 0.37994 host kolla-cloud
0 hdd 0.10999 osd.0 up 1.00000 1.00000
1 hdd 0.10999 osd.1 up 1.00000 1.00000
2 hdd 0.10999 osd.2 up 1.00000 1.00000
3 ssd 0.04999 osd.3 up 1.00000 1.00000
可以看到我們osd.3的class都變為了ssd。
然后我們再次查看crush class,也多出了一個名為ssd的class:
# ceph osd crush class ls
[
"hdd",
"ssd"
]
2. 創建基于ssd的class rule
創建一個class rule,取名為ssd_rule,使用ssd的osd:
# ceph osd crush rule create-replicated ssd_rule default host ssd
查看集群rule:
# ceph osd crush rule ls
replicated_rule
disks
ssd_rule
通過如下方式查看詳細的crushmap信息:
# ceph osd getcrushmap -o crushmap
26
# crushtool -d crushmap -o crushmap.txt
# cat crushmap.txt
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class ssd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host 192.168.3.9 {
id -2 # do not change unnecessarily
id -3 class hdd # do not change unnecessarily
id -13 class ssd # do not change unnecessarily
# weight 0.000
alg straw2
hash 0 # rjenkins1
}
host kolla-cloud {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
id -14 class ssd # do not change unnecessarily
# weight 0.380
alg straw2
hash 0 # rjenkins1
item osd.2 weight 0.110
item osd.1 weight 0.110
item osd.0 weight 0.110
item osd.3 weight 0.050
}
root default {
id -1 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
id -15 class ssd # do not change unnecessarily
# weight 0.380
alg straw2
hash 0 # rjenkins1
item 192.168.3.9 weight 0.000
item kolla-cloud weight 0.380
}
host 192.168.3.9-cache {
id -7 # do not change unnecessarily
id -9 class hdd # do not change unnecessarily
id -11 class ssd # do not change unnecessarily
# weight 0.000
alg straw2
hash 0 # rjenkins1
}
root cache {
id -8 # do not change unnecessarily
id -10 class hdd # do not change unnecessarily
id -12 class ssd # do not change unnecessarily
# weight 0.000
alg straw2
hash 0 # rjenkins1
item 192.168.3.9-cache weight 0.000
}
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule disks {
id 1
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule ssd_rule {
id 2
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 0 type host
step emit
}
# end crush map
修改crushmap.txt文件中的step take default class改成 step take default class hdd
rule disks {
id 1
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}
重新編譯crushmap并導入進去:
# crushtool -c crushmap.txt -o crushmap.new
# ceph osd setcrushmap -i crushmap.new
3. 創建基于ssd_rule規則的存儲池
創建一個基于該ssd_rule規則的存儲池:
# ceph osd pool create cache 64 64 ssd_rule
pool 'cache' created
查看cache的信息可以看到使用的crush_rule為1,也就是ssd_rule
# ceph osd pool get cache crush_rule
crush_rule: ssd_rule
查看pool使用rule情況,發現pool使用crush_rule 2
# # ceph osd dump | grep -i size
pool 1 'images' replicated size 1 min_size 1 crush_rule 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 80 lfor 0/71 flags hashpspool stripe_width 0 Application rbd
pool 2 'volumes' replicated size 1 min_size 1 crush_rule 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 89 lfor 0/73 flags hashpspool stripe_width 0 application rbd
pool 3 'backups' replicated size 1 min_size 1 crush_rule 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 84 lfor 0/75 flags hashpspool stripe_width 0 application rbd
pool 4 'vms' replicated size 1 min_size 1 crush_rule 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 86 lfor 0/77 flags hashpspool stripe_width 0 application rbd
pool 5 'cache' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 108 flags hashpspool stripe_width 0
二、配置緩存池
1. 創建一個緩存池及后端存儲
緩沖池已經在一.3已經創建,pool: cache_pool,可以參考
后端存儲:
# ceph osd pool create volumes2 64 64
2. 設置緩存層
將上面創建的cache_pool池綁定至存儲池的前端,volumes即為我們的后端存儲池
# ceph osd tier add volumes2 cache
pool 'cache' is now (or already was) a tier of 'volumes2'
設置緩存模式為writeback
# ceph osd tier cache-mode cache writeback
set cache-mode for pool 'cache' to writeback
將所有客戶端請求從標準池引導至緩存池
# ceph osd tier set-overlay volumes2 cache
overlay for 'volumes2' is now (or already was) 'cache'
此時,我們分別查看存儲池和緩存池的詳情,可以看到相關的緩存配置信息:
# ceph osd dump |egrep 'volumes2|cache'
pool 5 'cache' replicated size 1 min_size 1 crush_rule 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 125 lfor 125/125 flags hashpspool,incomplete_clones tier_of 6 cache_mode writeback stripe_width 0
pool 6 'volumes2' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 125 lfor 125/125 flags hashpspool tiers 5 read_tier 5 write_tier 5 stripe_width 0
3. 緩存層相關參數說明
對于生產環境的部署,目前只能使用bloom filters數據結構(看官方文檔的意思,好像目前只支持這一種filter):
ceph osd pool set cache hit_set_type bloom
設置當緩存池中的數據達到多少個字節或者多少個對象時,緩存分層代理就開始從緩存池刷新對象至后端存儲池并驅逐:
# 當緩存池中的數據量達到1TB時開始刷盤并驅逐
ceph osd pool set cache target_max_bytes 1099511627776
# 當緩存池中的對象個數達到100萬時開始刷盤并驅逐
ceph osd pool set cache target_max_objects 10000000
定義緩存層將對象刷至存儲層或者驅逐的時間:
ceph osd pool set cache cache_min_flush_age 600
ceph osd pool set cache cache_min_evict_age 600
定義當緩存池中的臟對象(被修改過的對象)占比達到多少時,緩存分層代理開始將object從緩存層刷至存儲層:
# 當臟對象占比達到10%時開始刷盤
ceph osd pool set cache cache_target_dirty_ratio 0.4
# 當臟對象占比達到60%時開始高速刷盤
ceph osd pool set cache cache_target_dirty_high_ratio 0.6
當緩存池的使用量達到其總量的一定百分比時,緩存分層代理將驅逐對象以維護可用容量(達到該限制時,就認為緩存池滿了),此時會將未修改的(干凈的)對象刷盤:
ceph osd pool set cache cache_target_full_ratio 0.8
4. 測試緩存池
配置好緩存池以后,我們可以先將其驅逐對象的最小時間設置為60s:
ceph osd pool set cache cache_min_evict_age 60
ceph osd pool set cache cache_min_flush_age 60
定義當緩存池中的臟對象(被修改過的對象)占比達到千分之一,緩存分層代理開始將object從緩存層刷至存儲層:
ceph osd pool set cache cache_target_dirty_ratio 0.001
然后,我們往存儲池中寫一個數據
rados -p volumes put test MySQL-community-client-5.7.31-1.el7.x86_64.rpm
查看存儲池,這時應該無法查看到該數據,查看緩存池,則可以看到數據存儲在緩存池中:
rados -p volumes2 ls |grep test
rados -p cache ls |grep test
等60s之后,數據刷盤,此時即可在存儲池中看到該數據,則緩存池中,該數據即被驅逐。
三、刪除緩存池
需要說明的是,根據緩存池類型的不同,刪除緩存池的方法也不同。
1. 刪除read-only緩存池
由于只讀緩存不具有修改的數據,因此可以直接禁用并刪除它,而不會丟失任何最近對緩存中的對象的更改。
將緩存模式個性為none以禁用緩存:
ceph osd tier cache-mode cache none
刪除緩存池:
# 解除綁定
ceph osd tier remove cephfs_data cache
2. 刪除writeback緩存池
由于回寫緩存可能具有修改的數據,所以必須采取措施以確保在禁用和刪除緩存前,不丟失緩存中對象的最近的任何更改。
將緩存模式更改為轉發,以便新的和修改的對象刷新至后端存儲池:
ceph osd tier cache-mode cache forward
查看緩存池以確保所有的對象都被刷新(這可能需要點時間):
rados -p cache ls
如果緩存池中仍然有對象,也可以手動刷新:
rados -p cache cache-flush-evict-all
刪除覆蓋層,以使客戶端不再將流量引導至緩存:
ceph osd tier remove-overlay cephfs_data
解除存儲池與緩存池的綁定:
ceph osd tier remove cephfs_data cache
ceph osd pool application enable sata-pool rbd
https://www.cnblogs.com/breezey/p/11080532.html
https://my.oschina.net/hanhanztj/blog/515410