作者：valar

前言

長文預警。該文主要介紹因線上OOM而引發的問題定位、分析問題的原因、以及如何解決問題。在分析問題原因時候為了能更詳細的呈現出引發問題的原因，去翻了hdfs 提供的JAVA Api主要的類FileSystem的部分代碼。由于這部分源代碼的分析實在是太太太長了，可以直接跳過看最后的結論，當然有興趣的可以看下。

風起

一日，突然收到若干線上告警。于是趕緊查看日志，在日志中大量線程報出OOM錯誤：

Exception in thread "http-nio-8182-exec-29" java.lang.OutOfMemoryError: Java heap space

于是使用jstat命令查看該進程內存使用情況：jstat -gcutil 12492 1000 100

 S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
 0.00 0.00 100.00 99.89 96.78 94.41 200 1.272 2925 328.850 330.122
 0.00 0.00 99.89 99.89 96.78 94.41 200 1.272 2935 329.908 331.180
 0.00 0.00 100.00 99.89 96.78 94.41 200 1.272 2944 330.853 332.125
 0.00 0.00 99.89 99.89 96.78 94.41 200 1.272 2955 332.002 333.274
 0.00 0.00 100.00 99.89 96.78 94.41 200 1.272 2964 332.940 334.212
 0.00 0.00 100.00 99.89 96.78 94.41 200 1.272 2973 333.924 335.196

可以看出，該進程老年代內存耗盡，導致OOM，且引發了頻繁的FGC。而在對堆參數配置中是完全能滿足項目運行的，于是查看了其他幾個節點的內存使用情況，老年代使用率都高達98以上且FGC次數也在增加。

由于線上環境影響業務，便dump出內存快照，然后臨時重啟了節點，重啟之后查看內存使用情況： jstat -gcutil 18190 1000 10

 S0 S1 E O M CCS YGC YGCT FGC FGCT GCT 
 1.04 0.00 50.39 22.87 95.96 93.41 1680 20.542 4 0.136 20.679
 1.04 0.00 50.39 22.87 95.96 93.41 1680 20.542 4 0.136 20.679
 1.04 0.00 50.39 22.87 95.96 93.41 1680 20.542 4 0.136 20.679

雖然暫時業務恢復，但該問題還是需要解決的。從上能初步分析出問題是由于內存泄漏，導致在運行一段時間之后OOM。

定位

在將dump出的快照導入MAT中查看，并沒有找到特別大的對象，但是看見很多個org.Apache.hadoop.conf.Configuration實例。在代碼中使用了hdfs的API操作hdfs，該類為連接hdfs的配置類。如下：

于是在本地debug啟動一個與線上相同代碼的進程，并dump出該內存快照。在MAT中查看該Configuration類的實例，僅一個實例。到此，差不多能定位是通過Java Api與hdfs交互時，導致某些對象不能回收出現的問題。

然后在本地編寫測試接口，通過測試接口訪問hdfs，發現該Configuration類實例在增加，且在執行GC的時候并不能回收。

至此，內存泄漏的源頭可以說找到了，至于為什么會出現問題則需要查看這段代碼了。

原因

大致能確認，導致內存泄漏的原因是與hdfs交互時某段代碼bug。于是翻開了項目中與hdfs交互的類，發現了等價于下面的代碼的訪問hdfs代碼：

 public Path createDir(String name) throws IOException, InterruptedException {
 Path path = new Path(name);
 Configuration configuration = new Configuration();
 FileSystem fileSystem = FileSystem.get(URI.create("hdfs://***:8020"), configuration, "hdfs");;
 if (fileSystem.mkdirs(path)) {
 return path;
 }
 return null;
 }

也就是說，在每次與hdfs交互時，都會與hdfs建立一次連接，并創建一個FileSystem對象。但在使用完之后并未調用close()方法釋放連接。
此處可能會有疑問，此處的Configuration實例和FileSystem實例都是局部變量，在該方法執行完成之后，這兩個對象都應該是會被回收的，怎么會導致內存泄漏呢？

FileSystem是怎樣獲取的

在此，如果想知道該問題，就需要去翻FileSystem類的代碼了。FileSystem的get方法如下：

 public static FileSystem get(URI uri, Configuration conf) throws IOException {
 String scheme = uri.getScheme();
 String authority = uri.getAuthority();

 if (scheme == null && authority == null) { // use default FS
 return get(conf);
 }

 if (scheme != null && authority == null) { // no authority
 URI defaultUri = getDefaultUri(conf);
 if (scheme.equals(defaultUri.getScheme()) // if scheme matches default
 && defaultUri.getAuthority() != null) { // & default has authority
 return get(defaultUri, conf); // return default
 }
 }
 
 String disableCacheName = String.format("fs.%s.impl.disable.cache", scheme);
 if (conf.getBoolean(disableCacheName, false)) {
 return createFileSystem(uri, conf);
 }

 return CACHE.get(uri, conf);
 }

重點看一下最后的6行代碼，其中String.format("fs.%s.impl.disable.cache", scheme)在連接hdfs時候該參數名為fs.hdfs.impl.disable.cache，可以從倒數第5行代碼看出該參數默認值為false。也就是默認情況下會通過CACHE對象返回FileSystem。

那接下來看一下CACHE.get方法：

 FileSystem get(URI uri, Configuration conf) throws IOException{
 Key key = new Key(uri, conf);
 return getInternal(uri, conf, key);
 }
 
 private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{
 FileSystem fs;
 synchronized (this) {
 fs = map.get(key);
 }
 if (fs != null) {
 return fs;
 }

 fs = createFileSystem(uri, conf);
 synchronized (this) { // refetch the lock again
 FileSystem oldfs = map.get(key);
 if (oldfs != null) { // a file system is created while lock is releasing
 fs.close(); // close the new file system
 return oldfs; // return the old file system
 }
 
 // now insert the new file system into the map
 if (map.isEmpty()
 && !ShutdownHookManager.get().isShutdownInProgress()) {
 ShutdownHookManager.get().addShutdownHook(clientFinalizer, SHUTDOWN_HOOK_PRIORITY);
 }
 fs.key = key;
 map.put(key, fs);
 if (conf.getBoolean("fs.automatic.close", true)) {
 toAutoClose.add(key);
 }
 return fs;
 }
 }

從這段代碼中可以看出：

在Cache類內部維護了一個Map，該Map用于緩存已經連接好的FileSystem對象，Map的Kep為Cache.Key對象。每次都會通過Cache.Key獲取FileSystem，如果未獲取到，才會繼續創建的流程。
在Cache類內部維護了一個Set（toAutoClose），該Set用于存放需自動關閉的連接。在客戶端關閉時會自動關閉該集合中的連接。

在看完了上面的代碼之后，在看一下CACHE這個變量在FileSystem中是怎樣引用的：

 /** FileSystem cache */
 static final Cache CACHE = new Cache();

也就是說，該CACHE對象會一直存在不會被回收。而每次創建的FileSystem都會以Cache.Key為key，FileSystem為Value存儲在Cache類中的Map中。那至于在緩存時候是否對于相同hdfs URI是否會存在多次緩存，就需要查看一下Cache.Key的hashCode方法了，如下：

 @Override
 public int hashCode() {
 return (scheme + authority).hashCode() + ugi.hashCode() + (int)unique;
 }

可見，schema和authority變量為String類型，如果在相同的URI情況下，其hashCode是一致。unique在FilSystem.getApi下也不用關心，因為每次該參數的值都是0。那么此處需要重點關注一下ugi.hashCode()。

至此，來小結一下：

在獲取FileSystem時，FileSystem內置了一個static的Cache，該Cache內部有一個Map，用于緩存已經獲取的FileSystem連接。
參數fs.hdfs.impl.disable.cache，用于控制FileSystem是否需要緩存，默認情況下是false，即緩存。
Cache中的Map，Key為Cache.Key類，該類通過schem，authority，UserGroupInformation，unique 4個參數來確定一個Key，如上Cache.Key的hashCode方法。

但還有一個問題，既然FileSystem提供了Cache來緩存，那么在本例中對于相同的hdfs連接是不會出現每次獲取FileSystem都往Cache的Map中添加一個新的FileSystem。唯一的解釋是Cache.key的hashCode每次計算出來了不一樣的值，在Cache.Key的hashCode方法中決定相同的hdfs URI計算hashCode是否一致是由UserGroupInformation的hashCode方法決定的，接下來看一下該方法。

UserGroupInformation.hashCode

其方法定義如下：

 @Override
 public int hashCode() {
 return System.identityHashCode(subject);
 }

該方法調用了本地方法identityHashCode，identityHashCod方法對不同的對象返回的hashCode將會不一樣，即使是實現了hashCode()的類。那么此處問題關鍵就轉化為UserGroupInformation類的subject是否在每次計算hashCode的時候是同一個對象。
由于該hashCode是計算Cache.key的hashCode時調用的，因此需要看Cache.Key初始化時候，是如何初始化UserGroupInformation該對象的，如下：

 Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null ?
 "" : StringUtils.toLowerCase(uri.getScheme());
 authority = uri.getAuthority()==null ?
 "" : StringUtils.toLowerCase(uri.getAuthority());
 this.unique = unique;
 
 this.ugi = UserGroupInformation.getCurrentUser();
 }

繼續看UserGroupInformation的getCurrentUser()方法，如下：

 public static AccessControlContext getContext()
 {
 AccessControlContext acc = getStackAccessControlContext();
 if (acc == null) {
 // all we had was privileged system code. We don't want
 // to return null though, so we construct a real ACC.
 return new AccessControlContext(null, true);
 } else {
 return acc.optimize();
 }
 }

其中比較關鍵的是getStackAccessControlContext方法，該方法調用了Native方法，如下：

 private static native AccessControlContext getStackAccessControlContext();

該方法會返回當前堆棧的保護域權限的AccessControlContext對象。（關于該方法更多細節未深究，懂的大佬可指出來一下）

那么此處為什么會返回不同的Subject對象呢？由于在本例中是通過get(final URI uri, final Configuration conf,final String user) Api獲取的，因此折回去看一下這個方法，如下：

 public static FileSystem get(final URI uri, final Configuration conf,
 final String user) throws IOException, InterruptedException {
 String ticketCachePath =
 conf.get(CommonConfigurationKeys.KERBEROS_TICKET_CACHE_PATH);
 UserGroupInformation ugi =
 UserGroupInformation.getBestUGI(ticketCachePath, user);
 return ugi.doAs(new PrivilegedExceptionAction<FileSystem>() {
 @Override
 public FileSystem run() throws IOException {
 return get(uri, conf);
 }
 });
 }

在該方法中，先通過UserGroupInformation.getBestUGI方法獲取了一個UserGroupInformation對象，然后在通過UserGroupInformation的doAs方法去調用了get(URI uri, Configuration conf)方法。

先看一下UserGroupInformation.getBestUGI方法的實現，此處關注一下傳入的兩個參數ticketCachePath，user。ticketCachePath是獲取配置hadoop.security.kerberos.ticket.cache.path的值，在本例中該參數未配置，因此ticketCachePath為空。user參數由于是本例中傳入的用戶名，因此該參數不會為空。實現如下：

 public static UserGroupInformation getBestUGI(
 String ticketCachePath, String user) throws IOException {
 if (ticketCachePath != null) {
 return getUGIFromTicketCache(ticketCachePath, user);
 } else if (user == null) {
 return getCurrentUser();
 } else {
 return createRemoteUser(user);
 } 
 }

getBestUGI參數的兩個參數，如上所分析ticketCachePath為空，user不為空，因此最終會執行createRemoteUser方法。實現如下：

 public static UserGroupInformation createRemoteUser(String user) {
 return createRemoteUser(user, AuthMethod.SIMPLE);
 }
 
 public static UserGroupInformation createRemoteUser(String user, AuthMethod authMethod) {
 if (user == null || user.isEmpty()) {
 throw new IllegalArgumentException("Null user");
 }
 Subject subject = new Subject();
 subject.getPrincipals().add(new User(user));
 UserGroupInformation result = new UserGroupInformation(subject);
 result.setAuthenticationMethod(authMethod);
 return result;
 }

從代碼中，可以看出會通過createRemoteUser方法，來創建一個UserGroupInformation對象。在createRemoteUser方法中，創建了一個新的Subject對象，并通過該對象創建了UserGroupInformation對象。至此，UserGroupInformation.getBestUGI方法執行完成。

接下來看一下UserGroupInformation.doAs方法（FileSystem.get(final URI uri, final Configuration conf, final String user)執行的最后一個方法），如下：

 public <T> T doAs(PrivilegedExceptionAction<T> action
 ) throws IOException, InterruptedException {
 try {
 logPrivilegedAction(subject, action);
 return Subject.doAs(subject, action);
 ………… 省略多余的

然后在調用Subject.doAs方法，如下：

 public static <T> T doAs(final Subject subject,
 final java.security.PrivilegedExceptionAction<T> action)
 throws java.security.PrivilegedActionException {

 java.lang.SecurityManager sm = System.getSecurityManager();
 if (sm != null) {
 sm.checkPermission(AuthPermissionHolder.DO_AS_PERMISSION);
 }

 if (action == null)
 throw new NullPointerException
 (ResourcesMgr.getString("invalid.null.action.provided"));

 // set up the new Subject-based AccessControlContext for doPrivileged
 final AccessControlContext currentAcc = AccessController.getContext();

 // call doPrivileged and push this new context on the stack
 return java.security.AccessController.doPrivileged
 (action,
 createContext(subject, currentAcc));
 }

最后在調用AccessController.doPrivileged方法，如下：

 public static native <T> T
 doPrivileged(PrivilegedExceptionAction<T> action,
 AccessControlContext context)
 throws PrivilegedActionException;

該方法為Native方法，該方法會使用指定的AccessControlContext來執行PrivilegedExceptionAction，也就是調用該實現的run方法。即FileSystem.get(uri, conf)方法。

至此，就能夠解釋在本例中，通過get(final URI uri, final Configuration conf,final String user) 方法創建FileSystem時，每次存入FileSystem的Cache中的Cache.key的hashCode都不一致的情況了，小結一下：

在通過get(final URI uri, final Configuration conf,final String user)方法創建FileSystem時，由于每次都會創建新的UserGroupInformation和Subject對象。
在Cache.Key對象計算hashCode時，影響計算結果的是調用了UserGroupInformation.hashCode方法。
UserGroupInformation.hashCode方法，計算為：System.identityHashCode(subject)。即如果Subject是同一個對象則返回相同的hashCode，由于在本例中每次都不一樣，因此計算的hashCode不一致。
綜上，就導致每次計算Cache.key的hashCode不一致，便會重復寫入FileSystem的Cache。

FileSystem的兩個get方法

在FileSystem中，有兩個重載的get方法，如下：

 public static FileSystem get(final URI uri, final Configuration conf,
 final String user) 
 
 public static FileSystem get(URI uri, Configuration conf)

在前面已經詳細的解讀了第一個方法，從代碼中可以看第一個最終還是會調用第二個方法。唯一不同的地方就是在初始化Cache.key獲取UserGroupInformation對象的時候，如下：

 Key(URI uri, Configuration conf, long unique) throws IOException {
 scheme = uri.getScheme()==null ?
 "" : StringUtils.toLowerCase(uri.getScheme());
 authority = uri.getAuthority()==null ?
 "" : StringUtils.toLowerCase(uri.getAuthority());
 this.unique = unique;
 
 this.ugi = UserGroupInformation.getCurrentUser();
 }

該方法會調用UserGroupInformation.getCurrentUser方法，如下：

 public synchronized
 static UserGroupInformation getCurrentUser() throws IOException {
 AccessControlContext context = AccessController.getContext();
 Subject subject = Subject.getSubject(context);
 if (subject == null || subject.getPrincipals(User.class).isEmpty()) {
 return getLoginUser();
 } else {
 return new UserGroupInformation(subject);
 }
 }

在直接調用get(URI uri, Configuration conf)方法時，由于未像get(final URI uri, final Configuration conf, final String user)方法創建Subject對象，因此此處Subject會返回空，會繼續執行getLoginUser方法。如下：

 public synchronized 
 static UserGroupInformation getLoginUser() throws IOException {
 if (loginUser == null) {
 loginUserFromSubject(null);
 }
 return loginUser;
 }

由代碼可見，loginUser成員變量是關鍵，查看一下該成員定義，如下：

 /**
 * Information about the logged in user.
 */
 private static UserGroupInformation loginUser = null;

也就是說，一旦該loginUser對象初始化成功，那么后續會一直使用該對象。如上一節所示，UserGroupInformation.hashCode方法將會返回一樣的hashCode值。也就是能成功的使用到緩存在FileSystem的Cache。

解決

使用public static FileSystem get(URI uri, Configuration conf)：該方法是能夠使用到FileSystem的Cache的，也就是說對于同一個hdfs URI是只會有一個FileSystem連接對象的。使用此Api可通過System.setProperty("HADOOP_USER_NAME", "hive")方式設置訪問用戶。（如果有更優雅方式，望大佬指出）默認情況下fs.automatic.close=true，即所有的連接都會通過ShutdownHook關閉。
使用public static FileSystem get(final URI uri, final Configuration conf, final String user)：該方法如上分析，會導致FileSystem的Cache失效，且每次都會添加至Cache的Map中，導致不能被回收。在使用時，一種方案是：保證對于同一個hdfs URI只會存在一個FileSystem連接對象。另一種方案是：在每次使用完FileSystem之后，調用close方法，該方法會將Cache中的FileSystem刪除。

在FileSystem中，還提供了了newInstance等Api。該系列Api每次都會返回一個新的FileSystem，具體實現參見FileSystem代碼。