Nginx原生模塊

我們在使用nginx做反向代理都會使用到以下兩個模塊：

1 .ngx_http_proxy_module定義允許將請求傳遞到另一臺服務器。此模塊下常用指令如下：

proxy_pass
proxy_cache
proxy_connect_timeout
proxy_read_timeout
proxy_send_timeout
proxy_next_upstream

ngx_http_upstream_module

用于定義可由proxy_pass，fastcgi_pass等指令引用的服務器組。此模塊下常用指令如下：

upstream
server
ip_hash

默認負載均衡配置

http {
    upstream myApp1 {
        server srv1.example.com;
        server srv2.example.com;
        server srv3.example.com;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://myapp1;
        }
    }
}

此時nginx默認的負載均衡策略是輪詢外，還有其他默認參數，如下：

http {
    upstream myapp1 {
        server srv1.example.com weight=1 max_fails=1 fail_timeout=10;
        server srv2.example.com weight=1 max_fails=1 fail_timeout=10;
        server srv3.example.com weight=1 max_fails=1 fail_timeout=10;
    }

    server {
        listen 80;
        proxy_send_timeout=60;
        proxy_connect_timeout=60;
        proxy_read_timeout=60;
        proxy_next_upstream=error timeout;

        location / {
            proxy_pass http://myapp1;
        }
    }
}

其中涉及到兩個功能點：

故障轉移

Syntax:   proxy_read_timeout time;
Default:   
proxy_read_timeout 60s;
Context:   http, server, location
定義從代理服務器讀取響應的超時。 僅在兩個連續的讀操作之間設置超時，而不是為整個響應的傳輸。如果代理服務器在此時間內未傳輸任何內容，則關閉連接。

Syntax:   proxy_connect_timeout time;
Default:   
proxy_connect_timeout 60s;
Context:   http, server, location
定義與代理服務器建立連接的超時。 應該注意，此超時通常不會超過75秒。

Syntax:   proxy_send_timeout time;
Default:   
proxy_send_timeout 60s;
Context:   http, server, location
設置將請求傳輸到代理服務器的超時。 僅在兩個連續的寫操作之間設置超時，而不是為整個請求的傳輸。如果代理服務器在此時間內未收到任何內容，則關閉連接

Syntax:   proxy_next_upstream error | timeout | invalid_header | http_500 | http_502 | http_503 | http_504 | http_403 | http_404 | http_429 | non_idempotent | off ...;
Default:   
proxy_next_upstream error timeout;
Context:   http, server, location
指定在何種情況下一個失敗的請求應該被發送到下一臺后端服務器：
error      和后端服務器建立連接時，或者向后端服務器發送請求時，或者從后端服務器接收響應頭時，出現錯誤
timeout    和后端服務器建立連接時，或者向后端服務器發送請求時，或者從后端服務器接收響應頭時，出現超時
invalid_header  后端服務器返回空響應或者非法響應頭
http_500   后端服務器返回的響應狀態碼為500
http_502   后端服務器返回的響應狀態碼為502
http_503   后端服務器返回的響應狀態碼為503
http_504   后端服務器返回的響應狀態碼為504
http_404   后端服務器返回的響應狀態碼為404
off        停止將請求發送給下一臺后端服務器

從以上幾個指令可以看出，在默認配置下，后端節點一旦出現error和timeout情況時，nginx會通過proxy_next_upstream進行故障轉移，將發往不健康節點的請求，自動轉移至健康節點。其中timeout設置和proxy_send_timeout time、proxy_connect_timeout time、proxy_read_timeout time有關。除了error、timeout，我們可以設置更詳細的觸發條件，如http_502、http_503等。
注意：只有在沒有向客戶端發送任何數據以前，將請求轉給下一臺后端服務器才是可行的。也就是說，如果在傳輸響應到客戶端時出現錯誤或者超時，這類錯誤是不可能恢復的。

健康檢查

Syntax:   server address [parameters];
Default:   —
Context:   upstream
max_fails=number   設定Nginx與服務器通信的嘗試失敗的次數。在fail_timeout參數定義的時間段內，如果失敗的次數達到此值，Nginx就認為服務器不可用。此時在接下來的fail_timeout時間段，服務器不會再被嘗試。失敗的嘗試次數默認是1。設為0就會停止統計嘗試次數，即不對后端節點進行健康檢查。認為服務器是一直可用的。
  
fail_timeout=time  設定服務器被認為不可用的時間段以及統計失敗嘗試次數的時間段。在這段時間中，服務器失敗次數達到指定的嘗試次數，服務器就被認為不可用。
默認情況下，該超時時間是10秒。

以上有幾點需要解釋：

失敗次數中的失敗是怎么定義的？
官網解釋是指由proxy_next_upstream，fastcgi_next_upstream，uwsgi_next_upstream，scgi_next_upstream，memcached_next_upstream和grpc_next_upstream指令定義，也是前面說的error、time、http_xxx狀態碼等。
如果mail_fail為0，此時健康檢查無效。因此此時整個nginx，只會由proxy_next_upstream判斷，進行相關故障轉移。

小結

在使用nginx上述的兩個模塊有以下缺點：

fail_time內的失敗檢測，超時時間以系統設置為主，效率低，等待超時影響性能；
后端一旦有問題，除后端禁用的fail_time時間段，其他時間nginx會把請求轉發給不健康節點的，然后再轉發給別的服務器，這樣以來就浪費了一次轉發。

因此除了上面介紹的nginx自帶模塊，還有一個更專業的模塊，來專門提供負載均衡器內節點的健康檢查的。這個就是淘寶技術團隊開發的nginx模塊。

nginx_upstream_check_module模塊

借助淘寶技術團隊開發的nginx模塊nginx_upstream_check_module來檢測后方realserver的健康狀態，如果后端服務器不可用，則會將其踢出upstream，所有的請求不轉發到這臺服務器。當其恢復正常時，將其加入upstream。

在淘寶自己的tengine上是自帶了該模塊的，大家可以訪問淘寶Tengine官網來獲取該版本的nginx，也可以到Gitbub上找到。如果沒有使用淘寶的tengine的話，可以通過補丁的方式來添加該模塊到我們自己的nginx。

安裝

#打補丁
#注意不同版本對應的補丁
cd nginx-1.6.0
patch -p1 < ../nginx_upstream_check_module-master/check_1.5.12+.patch
 ./configure --user=nginx --group=nginx --prefix=/usr/local/nginx1.6 --sbin-path=/usr/local/nginx1.6 --conf-path=/usr/local/nginx1.6/nginx.conf --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --with-http_ssl_module --with-http_stub_status_module --with-http_gzip_static_module --with-http_gunzip_module --with-http_sub_module --with-pcre=/usr/local/src/nginx/pcre-8.36 --with-zlib=/usr/local/src/nginx/zlib-1.2.8 --add-module=/usr/local/src/nginx/ngx_cache_purge-2.1 --add-module=/usr/local/src/nginx/headers-more-nginx-module-master --add-module=/usr/local/src/nginx/nginx_upstream_check_module-master

make
#不要執行make install命令

cd /usr/local/nginx1.6
#備份命令
cp nginx nginx.bak
nginx -s stop
cp -r /usr/local/src/nginx/nginx-1.6.0/objs/nginx .

配置

http {

        upstream cluster {

            # simple round-robin
            server 192.168.0.1:80;
            server 192.168.0.2:80;

            check interval=5000 rise=1 fall=3 timeout=4000;

            #check interval=3000 rise=2 fall=5 timeout=1000 type=ssl_hello;

            #check interval=3000 rise=2 fall=5 timeout=1000 type=http;
            #check_http_send "HEAD / HTTP/1.0rnrn";
            #check_http_expect_alive http_2xx http_3xx;
        }

        server {
            listen 80;

            location / {
                proxy_pass http://cluster;
            }

            location /status {
                check_status;

                access_log   off;
                allow SOME.IP.ADD.RESS;
                deny all;
           }
        }

    }

配置詳解：

Syntax:  check interval=milliseconds [fall=count] [rise=count] [timeout=milliseconds] [default_down=true|false] [type=tcp|http|ssl_hello|MySQL|ajp] [port=check_port]
Default: 如果沒有配置參數，默認值是：interval=30000 fall=5 rise=2 timeout=1000 default_down=true type=tcp
Context: upstream
 
該指令可以打開后端服務器的健康檢查功能。指令后面的參數意義是：
interval：向后端發送的健康檢查包的間隔,單位為毫秒。
fall(fall_count): 如果連續失敗次數達到fall_count，服務器就被認為是down。
rise(rise_count): 如果連續成功次數達到rise_count，服務器就被認為是up。
timeout: 后端健康請求的超時時間，單位毫秒。
default_down: 設定初始時服務器的狀態，如果是true，就說明默認是down的，如果是false，就是up的。默認值是true，也就是一開始服務器認為是不可用，要等健康檢查包達到一定成功次數以后才會被認為是健康的。
type：健康檢查包的類型，現在支持以下多種類型：
     tcp：簡單的tcp連接，如果連接成功，就說明后端正常。
     ssl_hello：發送一個初始的SSL hello包并接受服務器的SSL hello包。
     http：發送HTTP請求，通過后端的回復包的狀態來判斷后端是否存活。
     mysql: 向mysql服務器連接，通過接收服務器的greeting包來判斷后端是否存活。
     ajp：向后端發送AJP協議的Cping包，通過接收Cpong包來判斷后端是否存活。
     port: 指定后端服務器的檢查端口。你可以指定不同于真實服務的后端服務器的端口，比如后端提供的是443端口的應用，你可以去檢查80端口的狀態來判斷后端健康狀況。默認是0，表示跟后端server提供真實服務的端口一樣。該選項出現于Tengine-1.4.0。
     
Syntax: check_keepalive_requests request_num
Default: 1
Context: upstream
該指令可以配置一個連接發送的請求數，其默認值為1，表示Tengine完成1次請求后即關閉連接。
 
Syntax: check_http_send http_packet
Default: "GET / HTTP/1.0rnrn"
Context: upstream
該指令可以配置http健康檢查包發送的請求內容。為了減少傳輸數據量，推薦采用"HEAD"方法。
 
當采用長連接進行健康檢查時，需在該指令中添加keep-alive請求頭，如："HEAD / HTTP/1.1rnConnection: keep-alivernrn"。同時，在采用"GET"方法的情況下，請求uri的size不宜過大，確保可以在1個interval內傳輸完成，否則會被健康檢查模塊視為后端服務器或網絡異常。

Syntax: check_http_expect_alive [ http_2xx | http_3xx | http_4xx | http_5xx ]
Default: http_2xx | http_3xx
Context: upstream
該指令指定HTTP回復的成功狀態，默認認為2XX和3XX的狀態是健康的。

配置實例

server{
        listen 80;
        
        upstream test{
          server 192.168.3.12:8080 weight=5 max_fails=3 fail_timeout=10s;
          server 192.168.3.13:8080 weight=5 max_fails=3 fail_timeout=10s;
            
          check interval=5000 rise=1 fall=3 timeout=4000 type=http default_down=false;
          check_http_send "HEAD /test.jsp HTTP/1.0rnrn";
          check_http_expect_alive http_2xx http_3xx;
       }

        location / {

                proxy_set_header X-Real-IP        $remote_addr;
                proxy_set_header X-Forwarded-For  $proxy_add_x_forwarded_for;
                proxy_pass http://test;
                proxy_next_upstream error timeout  http_500 http_502 http_503;
        }
        #健康狀態監控
        location /status {
                check_status;
                access_log off;
        }
}