Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.12.0
-
None
-
CentOS 7.6 - 3.10.0-957.1.3.el7_lustre.x86_64
-
3
-
9223372036854775807
Description
We just got some kind of deadlock on fir-md1-s2 that serves MDT0001 and MDT0003.
I took a crash dump because the MDS was not usable and filesystem was hanging
Attaching foreach bt in bt.all and vmcore-dmesg.txt
Also, this is from the crash:
crash> foreach bt >bt.all crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 65891316 251.4 GB ---- FREE 13879541 52.9 GB 21% of TOTAL MEM USED 52011775 198.4 GB 78% of TOTAL MEM SHARED 32692923 124.7 GB 49% of TOTAL MEM BUFFERS 32164243 122.7 GB 48% of TOTAL MEM CACHED 781150 3 GB 1% of TOTAL MEM SLAB 13801776 52.6 GB 20% of TOTAL MEM TOTAL HUGE 0 0 ---- HUGE FREE 0 0 0% of TOTAL HUGE TOTAL SWAP 1048575 4 GB ---- SWAP USED 0 0 0% of TOTAL SWAP SWAP FREE 1048575 4 GB 100% of TOTAL SWAP COMMIT LIMIT 33994233 129.7 GB ---- COMMITTED 228689 893.3 MB 0% of TOTAL LIMIT crash> ps | grep ">" > 0 0 0 ffffffffb4218480 RU 0.0 0 0 [swapper/0] > 0 0 1 ffff8ec129410000 RU 0.0 0 0 [swapper/1] > 0 0 2 ffff8ed129f10000 RU 0.0 0 0 [swapper/2] > 0 0 3 ffff8ee129ebe180 RU 0.0 0 0 [swapper/3] > 0 0 4 ffff8eb1a9ba6180 RU 0.0 0 0 [swapper/4] > 0 0 5 ffff8ec129416180 RU 0.0 0 0 [swapper/5] > 0 0 6 ffff8ed129f16180 RU 0.0 0 0 [swapper/6] > 0 0 7 ffff8ee129eba080 RU 0.0 0 0 [swapper/7] > 0 0 8 ffff8eb1a9ba30c0 RU 0.0 0 0 [swapper/8] > 0 0 9 ffff8ec129411040 RU 0.0 0 0 [swapper/9] > 0 0 10 ffff8ed129f11040 RU 0.0 0 0 [swapper/10] > 0 0 11 ffff8ee129ebd140 RU 0.0 0 0 [swapper/11] > 0 0 12 ffff8eb1a9ba5140 RU 0.0 0 0 [swapper/12] > 0 0 13 ffff8ec129415140 RU 0.0 0 0 [swapper/13] > 0 0 14 ffff8ed129f15140 RU 0.0 0 0 [swapper/14] > 0 0 15 ffff8ee129ebb0c0 RU 0.0 0 0 [swapper/15] > 0 0 16 ffff8eb1a9ba4100 RU 0.0 0 0 [swapper/16] > 0 0 17 ffff8ec129412080 RU 0.0 0 0 [swapper/17] > 0 0 19 ffff8ee129ebc100 RU 0.0 0 0 [swapper/19] > 0 0 20 ffff8eb1a9408000 RU 0.0 0 0 [swapper/20] > 0 0 21 ffff8ec129414100 RU 0.0 0 0 [swapper/21] > 0 0 22 ffff8ed129f14100 RU 0.0 0 0 [swapper/22] > 0 0 23 ffff8ee129f38000 RU 0.0 0 0 [swapper/23] > 0 0 24 ffff8eb1a940e180 RU 0.0 0 0 [swapper/24] > 0 0 25 ffff8ec1294130c0 RU 0.0 0 0 [swapper/25] > 0 0 26 ffff8ed129f130c0 RU 0.0 0 0 [swapper/26] > 0 0 27 ffff8ee129f3e180 RU 0.0 0 0 [swapper/27] > 0 0 28 ffff8eb1a9409040 RU 0.0 0 0 [swapper/28] > 0 0 29 ffff8ec129430000 RU 0.0 0 0 [swapper/29] > 0 0 30 ffff8ed129f50000 RU 0.0 0 0 [swapper/30] > 0 0 31 ffff8ee129f39040 RU 0.0 0 0 [swapper/31] > 0 0 32 ffff8eb1a940d140 RU 0.0 0 0 [swapper/32] > 0 0 33 ffff8ec129436180 RU 0.0 0 0 [swapper/33] > 0 0 34 ffff8ed129f56180 RU 0.0 0 0 [swapper/34] > 0 0 35 ffff8ee129f3d140 RU 0.0 0 0 [swapper/35] > 0 0 36 ffff8eb1a940a080 RU 0.0 0 0 [swapper/36] > 0 0 37 ffff8ec129431040 RU 0.0 0 0 [swapper/37] > 0 0 38 ffff8ed129f51040 RU 0.0 0 0 [swapper/38] > 0 0 39 ffff8ee129f3a080 RU 0.0 0 0 [swapper/39] > 0 0 40 ffff8eb1a940c100 RU 0.0 0 0 [swapper/40] > 0 0 41 ffff8ec129435140 RU 0.0 0 0 [swapper/41] > 0 0 42 ffff8ed129f55140 RU 0.0 0 0 [swapper/42] > 0 0 43 ffff8ee129f3c100 RU 0.0 0 0 [swapper/43] > 0 0 44 ffff8eb1a940b0c0 RU 0.0 0 0 [swapper/44] > 0 0 45 ffff8ec129432080 RU 0.0 0 0 [swapper/45] > 0 0 46 ffff8ed129f52080 RU 0.0 0 0 [swapper/46] > 0 0 47 ffff8ee129f3b0c0 RU 0.0 0 0 [swapper/47] > 109549 109543 18 ffff8ee05406c100 RU 0.0 115440 2132 bash
I noticed a lot of threads blocked on quota commands.
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34926/
Subject:
LU-12178osd: do not rebalance quota under memory pressureProject: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 8c0b1c9af812140bde14180a318ace834d077d4b