Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.12.0
-
None
-
CentOS 7.6 - 3.10.0-957.1.3.el7_lustre.x86_64
-
3
-
9223372036854775807
Description
We just got some kind of deadlock on fir-md1-s2 that serves MDT0001 and MDT0003.
I took a crash dump because the MDS was not usable and filesystem was hanging
Attaching foreach bt in bt.all and vmcore-dmesg.txt
Also, this is from the crash:
crash> foreach bt >bt.all
crash> kmem -i
PAGES TOTAL PERCENTAGE
TOTAL MEM 65891316 251.4 GB ----
FREE 13879541 52.9 GB 21% of TOTAL MEM
USED 52011775 198.4 GB 78% of TOTAL MEM
SHARED 32692923 124.7 GB 49% of TOTAL MEM
BUFFERS 32164243 122.7 GB 48% of TOTAL MEM
CACHED 781150 3 GB 1% of TOTAL MEM
SLAB 13801776 52.6 GB 20% of TOTAL MEM
TOTAL HUGE 0 0 ----
HUGE FREE 0 0 0% of TOTAL HUGE
TOTAL SWAP 1048575 4 GB ----
SWAP USED 0 0 0% of TOTAL SWAP
SWAP FREE 1048575 4 GB 100% of TOTAL SWAP
COMMIT LIMIT 33994233 129.7 GB ----
COMMITTED 228689 893.3 MB 0% of TOTAL LIMIT
crash> ps | grep ">"
> 0 0 0 ffffffffb4218480 RU 0.0 0 0 [swapper/0]
> 0 0 1 ffff8ec129410000 RU 0.0 0 0 [swapper/1]
> 0 0 2 ffff8ed129f10000 RU 0.0 0 0 [swapper/2]
> 0 0 3 ffff8ee129ebe180 RU 0.0 0 0 [swapper/3]
> 0 0 4 ffff8eb1a9ba6180 RU 0.0 0 0 [swapper/4]
> 0 0 5 ffff8ec129416180 RU 0.0 0 0 [swapper/5]
> 0 0 6 ffff8ed129f16180 RU 0.0 0 0 [swapper/6]
> 0 0 7 ffff8ee129eba080 RU 0.0 0 0 [swapper/7]
> 0 0 8 ffff8eb1a9ba30c0 RU 0.0 0 0 [swapper/8]
> 0 0 9 ffff8ec129411040 RU 0.0 0 0 [swapper/9]
> 0 0 10 ffff8ed129f11040 RU 0.0 0 0 [swapper/10]
> 0 0 11 ffff8ee129ebd140 RU 0.0 0 0 [swapper/11]
> 0 0 12 ffff8eb1a9ba5140 RU 0.0 0 0 [swapper/12]
> 0 0 13 ffff8ec129415140 RU 0.0 0 0 [swapper/13]
> 0 0 14 ffff8ed129f15140 RU 0.0 0 0 [swapper/14]
> 0 0 15 ffff8ee129ebb0c0 RU 0.0 0 0 [swapper/15]
> 0 0 16 ffff8eb1a9ba4100 RU 0.0 0 0 [swapper/16]
> 0 0 17 ffff8ec129412080 RU 0.0 0 0 [swapper/17]
> 0 0 19 ffff8ee129ebc100 RU 0.0 0 0 [swapper/19]
> 0 0 20 ffff8eb1a9408000 RU 0.0 0 0 [swapper/20]
> 0 0 21 ffff8ec129414100 RU 0.0 0 0 [swapper/21]
> 0 0 22 ffff8ed129f14100 RU 0.0 0 0 [swapper/22]
> 0 0 23 ffff8ee129f38000 RU 0.0 0 0 [swapper/23]
> 0 0 24 ffff8eb1a940e180 RU 0.0 0 0 [swapper/24]
> 0 0 25 ffff8ec1294130c0 RU 0.0 0 0 [swapper/25]
> 0 0 26 ffff8ed129f130c0 RU 0.0 0 0 [swapper/26]
> 0 0 27 ffff8ee129f3e180 RU 0.0 0 0 [swapper/27]
> 0 0 28 ffff8eb1a9409040 RU 0.0 0 0 [swapper/28]
> 0 0 29 ffff8ec129430000 RU 0.0 0 0 [swapper/29]
> 0 0 30 ffff8ed129f50000 RU 0.0 0 0 [swapper/30]
> 0 0 31 ffff8ee129f39040 RU 0.0 0 0 [swapper/31]
> 0 0 32 ffff8eb1a940d140 RU 0.0 0 0 [swapper/32]
> 0 0 33 ffff8ec129436180 RU 0.0 0 0 [swapper/33]
> 0 0 34 ffff8ed129f56180 RU 0.0 0 0 [swapper/34]
> 0 0 35 ffff8ee129f3d140 RU 0.0 0 0 [swapper/35]
> 0 0 36 ffff8eb1a940a080 RU 0.0 0 0 [swapper/36]
> 0 0 37 ffff8ec129431040 RU 0.0 0 0 [swapper/37]
> 0 0 38 ffff8ed129f51040 RU 0.0 0 0 [swapper/38]
> 0 0 39 ffff8ee129f3a080 RU 0.0 0 0 [swapper/39]
> 0 0 40 ffff8eb1a940c100 RU 0.0 0 0 [swapper/40]
> 0 0 41 ffff8ec129435140 RU 0.0 0 0 [swapper/41]
> 0 0 42 ffff8ed129f55140 RU 0.0 0 0 [swapper/42]
> 0 0 43 ffff8ee129f3c100 RU 0.0 0 0 [swapper/43]
> 0 0 44 ffff8eb1a940b0c0 RU 0.0 0 0 [swapper/44]
> 0 0 45 ffff8ec129432080 RU 0.0 0 0 [swapper/45]
> 0 0 46 ffff8ed129f52080 RU 0.0 0 0 [swapper/46]
> 0 0 47 ffff8ee129f3b0c0 RU 0.0 0 0 [swapper/47]
> 109549 109543 18 ffff8ee05406c100 RU 0.0 115440 2132 bash
I noticed a lot of threads blocked on quota commands.