實驗背景
近一年來發(fā)生幾起的數(shù)據(jù)庫連接被打滿的情況,初步分析是應(yīng)用使用連接數(shù)量“超過了”連接池(DBCP1.4)的上限,導(dǎo)致數(shù)據(jù)庫連接被打滿,其中一個結(jié)論是連接池的bug導(dǎo)致
問題分析
1、DBCP連接池的Bug導(dǎo)致連接數(shù)超過大小設(shè)置,其根本原因是在連接池初始化的時候,有可能創(chuàng)建多個連接池導(dǎo)致
protected synchronized DataSource createDataSource()
throws SQLException {
if (closed) {
throw new SQLException("Data source is closed");
}
// Return the pool if we have already created it
if (dataSource != null) {
return (dataSource);
}
// create factory which returns raw physical connections
ConnectionFactory driverConnectionFactory = createConnectionFactory();
// create a pool for our connections
createConnectionPool();
// Set up statement pool, if desired
GenericKeyedObjectPoolFactory statementPoolFactory = null;
if (isPoolPreparedStatements()) {
statementPoolFactory = new GenericKeyedObjectPoolFactory(null,
-1, // unlimited maxActive (per key)
GenericKeyedObjectPool.WHEN_EXHAUSTED_FAIL,
0, // maxWait
1, // maxIdle (per key)
maxOpenPreparedStatements);
}
// Set up the poolable connection factory
createPoolableConnectionFactory(driverConnectionFactory, statementPoolFactory, abandonedConfig);
// Create and return the pooling data source to manage the connections
createDataSourceInstance();
try {
for (int i = 0 ; i < initialSize ; i++) {
connectionPool.addObject();
}
} catch (Exception e) {
throw new SQLNestedException("Error preloading the connection pool", e);
}
return dataSource;
}
createDataSource方法里邊調(diào)用createConnectionPool方法,如果后面方式執(zhí)行失敗,比如createPoolableConnectionFactory,當(dāng)再次調(diào)用createDataSource,又會再次調(diào)用createConnectionPool,從而初始化多次連接池,下面是createConnectionPool
protected void createConnectionPool() {
// Create an object pool to contain our active connections
GenericObjectPool gop;
if ((abandonedConfig != null) && (abandonedConfig.getRemoveAbandoned())) {
gop = new AbandonedObjectPool(null,abandonedConfig);
}
else {
gop = new GenericObjectPool();
}
gop.setMaxActive(maxActive);
gop.setMaxIdle(maxIdle);
gop.setMinIdle(minIdle);
gop.setMaxWait(maxWait);
gop.setTestOnBorrow(testOnBorrow);
gop.setTestOnReturn(testOnReturn);
gop.setTimeBetweenEvictionRunsMillis(timeBetweenEvictionRunsMillis);
gop.setNumTestsPerEvictionRun(numTestsPerEvictionRun);
gop.setMinEvictableIdleTimeMillis(minEvictableIdleTimeMillis);
gop.setTestWhileIdle(testWhileIdle);
connectionPool = gop;
}
顯然這個情況只能是模塊啟動初始化的時候產(chǎn)生,且連接都不會被使用,而線上出現(xiàn)的問題是連接都在執(zhí)行sql,不是這種情況!在1.4x的這個bug已經(jīng)修復(fù)
2、啟動參數(shù)分析,從啟動參數(shù)入手,看一下是不是連接池主動釋放連接,導(dǎo)致正在使用的連接被釋放,從而創(chuàng)建新的連接,給我們的現(xiàn)象是“連接數(shù)”超過了連接池的限制,下面是一個入庫的連接池參數(shù)
partition1.driverClassName=com.MySQL.jdbc.Driver
partition1.initialSize=2
partition1.maxActive=25
partition1.minIdle=2
partition1.maxIdle=5
partition1.maxWait=3000
partition1.threadPoolSize=10
partition1.logAbandoned=true
partition1.testWhileIdle=true
partition1.testOnReturn=false
partition1.testOnBorrow=true
partition1.validationQuery=select now()
//在每次空閑連接回收器線程(如果有)運行時檢查的連接數(shù)量
partition1.numTestsPerEvictionRun=5
//在空閑連接回收器線程運行期間休眠的時間值,以毫秒為單位
partition1.timeBetweenEvictionRunsMillis=30000
//連接在池中保持空閑而不被空閑連接回收器線程
partition1.minEvictableIdleTimeMillis=180000
//設(shè)置了rmoveAbandoned=true 那么在getNumactive()快要到getMaxActive()的時候,系統(tǒng)會進(jìn)行無效的Connection的回收,回收的 Connection為removeAbandonedTimeout(默認(rèn)300秒)中設(shè)置的秒數(shù)后沒有使用的Connection
partition1.removeAbandoned=true
//強(qiáng)制回收連接的時間,單位秒
partition1.removeAbandonedTimeout=18
注意connectionProperties參數(shù)
<bean id="partition[j]" class="org.Apache.commons.dbcp.BasicDataSource"
destroy-method="close">
<property name="driverClassName" value="${partition[j].driverClassName}" ></property>
<property name="url" value="${partition[j].url}" ></property>
<property name="username" value="${partition[j].username}" ></property>
<property name="password" value="${partition[j].password}" ></property>
<property name="defaultAutoCommit" value="false" ></property>
<property name="maxActive" value="${partition[j].maxActive}" ></property>
<property name="maxIdle" value="${partition[j].maxIdle}" ></property>
<property name="maxWait" value="${partition[j].maxWait}" ></property>
<property name="initialSize" value="${partition[j].initialSize}" ></property>
<property name="minIdle" value="${partition[j].minIdle}" ></property>
<property name="logAbandoned" value="${partition[j].logAbandoned}" ></property>
<property name="testWhileIdle" value="${partition[j].testWhileIdle}" ></property>
<property name="testOnReturn" value="${partition[j].testOnReturn}" ></property>
<property name="testOnBorrow" value="${partition[j].testOnBorrow}" ></property>
<property name="validationQuery" value="${partition[j].validationQuery}" ></property>
<property name="numTestsPerEvictionRun" value="${partition[j].numTestsPerEvictionRun}" ></property>
<property name="timeBetweenEvictionRunsMillis" value="${partition[j].timeBetweenEvictionRunsMillis}" ></property>
<property name="minEvictableIdleTimeMillis" value="${partition[j].minEvictableIdleTimeMillis}" ></property>
<property name="removeAbandoned" value="${partition[j].removeAbandoned}" ></property>
<property name="removeAbandonedTimeout" value="${partition[j].removeAbandonedTimeout}" ></property>
<property name="connectionProperties" value="useUnicode=true;
characterEncoding=utf8;initialTimeout=1;connectTimeout=1000;socketTimeout=6000;
rewriteBatchedStatements=true;autoReconnectForPools=true;autoReconnect=true;maxReconnects=1;
failOverReadOnly=false;roundRobinLoadBalance=true;allowMultiQueries=true"></property>
</bean>
從上面參數(shù)我們重點關(guān)注removeAbandonedTimeout 這個參數(shù)的意義
//創(chuàng)建連接
public Object borrowObject() throws Exception {
if (config != null
&& config.getRemoveAbandoned()
&& (getNumIdle() < 2)
&& (getNumActive() > getMaxActive() - 3) ) {
removeAbandoned();
}
Object obj = super.borrowObject();
if (obj instanceof AbandonedTrace) {
((AbandonedTrace) obj).setStackTrace();
}
if (obj != null && config != null && config.getRemoveAbandoned()) {
synchronized (trace) {
trace.add(obj);
}
}
...
private void removeAbandoned() {
// Generate a list of abandoned connections to remove
long now = System.currentTimeMillis();
long timeout = now - (config.getRemoveAbandonedTimeout() * 1000);
ArrayList remove = new ArrayList();
synchronized (trace) {
Iterator it = trace.iterator();
while (it.hasNext()) {
AbandonedTrace pc = (AbandonedTrace) it.next();
if (pc.getLastUsed() > timeout) {
continue;
}
if (pc.getLastUsed() > 0) {
remove.add(pc);
}
}
}
// Now remove the abandoned connections
Iterator it = remove.iterator();
while (it.hasNext()) {
AbandonedTrace pc = (AbandonedTrace) it.next();
if (config.getLogAbandoned()) {
pc.printStackTrace();
}
try {
invalidateObject(pc);
} catch (Exception e) {
e.printStackTrace();
}
}
}
代從代碼可以看出,配置了removeAbandonedOnBorrow和removeAbandonedTimeout就是觸發(fā)正在使用的連接remove,假如這個時候,數(shù)據(jù)庫正在執(zhí)行sql,就會出現(xiàn)連接池已經(jīng)斷開連接,而數(shù)據(jù)庫的連接還在執(zhí)行,造成數(shù)據(jù)庫的連接超過連接池的顯現(xiàn),如果少量慢sql執(zhí)行時間大于removeAbandonedTimeout,不會出現(xiàn)問題,出現(xiàn)大量的慢sql,就會導(dǎo)致數(shù)據(jù)庫中的這種連接越來越多,最后把數(shù)據(jù)庫打滿
實驗過程
創(chuàng)建測試庫
mkdir /usr/local/data/mysql
Docker run -d -e MYSQL_ROOT_PASSWORD=root --name centos/mysql-57-centos7 -v /usr/local/data/mysql:/var/lib/mysql -p 3306:3306 mysql --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci --lower_case_table_names=1
創(chuàng)建測試庫
create database test ;
構(gòu)建測試類,核心代碼
public void setUp() throws Exception {
ds = createDataSource();
ds.setDriverClassName("com.mysql.jdbc.Driver");
ds.setUrl("jdbc:mysql://127.0.0.1:3306/test?allowMultiQueries=true&characterEncoding=UTF-8&autoReconnect=true&failOverReadOnly=false&useSSL=false&allowPublicKeyRetrieval=true");
ds.setUsername("root");
ds.setPassword("Htbuy@2016");
ds.setMaxActive(1);
ds.setMaxWait(1000);
ds.setTestWhileIdle(true);
ds.setTestOnBorrow(true);
ds.setTestOnReturn(false);
ds.setValidationQuery("select now()");
ds.setNumTestsPerEvictionRun(5);
ds.setMinEvictableIdleTimeMillis(2000);
ds.setLogAbandoned(true);
ds.setRemoveAbandoned(true);
ds.setRemoveAbandonedTimeout(1);
}
public void testAbandoned() throws Exception {
for (int i = 0; i < 20; i++) {
Thread t = new Thread(new Runnable() {
@Override
public void run() {
try {
Connection conn = ds.getConnection();
Statement statement = conn.createStatement();
ResultSet resultSet = null;
#模擬慢sql
resultSet = statement.executeQuery("select sleep(100),now()");
while (resultSet.next()) {
System.out.println("result+" + resultSet.getString(1));
}
resultSet.close();
statement.close();
conn.close();
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
System.out.println(Thread.currentThread().getName() + "---------------------- end----------------------");
}
});
t.setName(i + "");
t.start();
Thread.sleep((i + 1) * 1000);
}
System.out.println(Thread.currentThread().getName() + "---------------------- end----------------------");
Thread.sleep(1000000);
}
通過運行上面代碼,并沒有像我們所期望那樣,超過連接的最大限制1,通過RemoveAbandoned=true和RemoveAbandonedTimeout=1 應(yīng)該已經(jīng)觸發(fā)了連接池的Abandoned機(jī)制,但是都阻塞到下面的一行代碼,DelegatingStatement的close方法
/**
* Close this DelegatingStatement, and close
* any ResultSets that were not explicitly closed.
*/
public void close() throws SQLException {
try {
try {
if (_conn != null) {
_conn.removeTrace(this);
_conn = null;
}
// The JDBC spec requires that a statment close any open
// ResultSet's when it is closed.
// FIXME The PreparedStatement we're wrApping should handle this for us.
// See bug 17301 for what could happen when ResultSets are closed twice.
List resultSets = getTrace();
if( resultSets != null) {
ResultSet[] set = (ResultSet[]) resultSets.toArray(new ResultSet[resultSets.size()]);
for (int i = 0; i < set.length; i++) {
set[i].close();
}
clearTrace();
}
//阻塞地方
_stmt.close();
}
catch (SQLException e) {
handleException(e);
}
}
finally {
_closed = true;
}
}
當(dāng)我們在url設(shè)置socketTimeout=1000,這時候阻塞的地方成功執(zhí)行完成,超過數(shù)據(jù)庫出現(xiàn)大量的連接數(shù)1,問題重現(xiàn)!同時出現(xiàn)典型的日志The last packet successfully received from the server was 1,001 milliseconds ago. The last packet sent successfully to the server was 1,001 milliseconds ago.
當(dāng)設(shè)置socketTimeout=1000,RemoveAbandoned=false ,并沒有重現(xiàn)問題,且都是大量的等待連接超時
原因是下面的代碼差異,開啟RemoveAbandoned,使用的是AbandonedObjectPool,而不開啟是默認(rèn)的GenericObjectPool,AbandonedObjectPool增加了Abandoned邏輯
*/
protected void createConnectionPool() {
// Create an object pool to contain our active connections
GenericObjectPool gop;
if ((abandonedConfig != null) && (abandonedConfig.getRemoveAbandoned())) {
gop = new AbandonedObjectPool(null,abandonedConfig);
}
else {
gop = new GenericObjectPool();
}
gop.setMaxActive(maxActive);
gop.setMaxIdle(maxIdle);
gop.setMinIdle(minIdle);
gop.setMaxWait(maxWait);
gop.setTestOnBorrow(testOnBorrow);
gop.setTestOnReturn(testOnReturn);
gop.setTimeBetweenEvictionRunsMillis(timeBetweenEvictionRunsMillis);
gop.setNumTestsPerEvictionRun(numTestsPerEvictionRun);
gop.setMinEvictableIdleTimeMillis(minEvictableIdleTimeMillis);
gop.setTestWhileIdle(testWhileIdle);
connectionPool = gop;
}
總結(jié)
當(dāng)RemoveAbandoned=true, 且執(zhí)行時間超過socketTimeout ,達(dá)到RemoveAbandonedTimeout的觸發(fā)點時,就會導(dǎo)致數(shù)據(jù)庫連接數(shù)超過連接池的限制,注意這種情況關(guān)閉模塊是沒用的,sql還在數(shù)據(jù)庫中執(zhí)行,應(yīng)該直接kill或者切庫操作!!
優(yōu)化方案
1、關(guān)閉RemoveAbandoned設(shè)置,目前wms系統(tǒng)的還是AP和TP混合型,有很大的幾率觸發(fā)這種情況
2、經(jīng)過邱玉堃復(fù)核模塊測試,需要關(guān)閉socketTimeout參數(shù),引用的1.4.jar版本與源碼的1.4版本存在差異
3、 注意當(dāng)mysql-connector-JAVA 版本小于5.1.45就會出現(xiàn)這個bug