Details
-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
Lustre 2.12.0
-
None
-
CentOS 7.6 Lustre 2.12.0+patches
-
3
-
9223372036854775807
Description
On our 2.12 Fir filesystem, it looks like a directory is not accessible anymore, it's hosted on MDT0000:
/fir/users/bjing/caspposes/CASP11/taskdir1
FID: 0x200029d02:0x1b59c:0x0
[root@fir-rbh01 ~]# lfs fid2path /fir 0x200029d02:0x1b59c:0x0 /fir/users/bjing/caspposes/CASP11/taskdir1 [root@fir-rbh01 ~]# lfs getdirstripe /fir/users/bjing/caspposes/CASP11/taskdir1 lmv_stripe_count: 0 lmv_stripe_offset: 0 lmv_hash_type: none
strace of ls:
stat("/fir/users/bjing/caspposes/CASP11/", {st_mode=S_IFDIR|S_ISGID|0775, st_size=12288, ...}) = 0
openat(AT_FDCWD, "/fir/users/bjing/caspposes/CASP11/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3,
Logs showing the FID on the MDS of MDT0000:
[root@fir-md1-s1 ~]# journalctl -n 100000 -k | grep 0x200029d02:0x1b59c:0x0 Aug 11 19:29:07 fir-md1-s1 kernel: LustreError: 20378:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 57s: evicting client at 10.8.26.28@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8f32cb2f7740/0x5d9ee6c5054b1779 lrc: 4/0,0 mode: PR/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 40 type: IBT flags: 0x60200400000020 nid: 10.8.26.28@o2ib6 remote: 0xff0b1f607b1120a1 expref: 3155 pid: 97646 timeout: 4692007 lvb_type: 0 Aug 11 19:29:47 fir-md1-s1 kernel: LustreError: 23597:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565576897, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3483a0ec00/0x5d9ee6c5055796b8 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 34 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23597 timeout: 0 lvb_type: 0 Aug 11 19:30:50 fir-md1-s1 kernel: LustreError: 21003:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1565576960, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f3245e35580/0x5d9ee6c5057392f2 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 33 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21003 timeout: 0 lvb_type: 0 Aug 27 13:10:04 fir-md1-s1 kernel: LustreError: 21452:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566936514, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2d9e083180/0x5d9ee6e65c38e54b lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 28 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21452 timeout: 0 lvb_type: 0 Aug 27 14:34:43 fir-md1-s1 kernel: LustreError: 23645:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566941593, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f28934e18c0/0x5d9ee6e686d8d27d lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 30 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23645 timeout: 0 lvb_type: 0 Aug 27 14:35:13 fir-md1-s1 kernel: LustreError: 50442:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566941623, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2ee5499f80/0x5d9ee6e686e6af45 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 30 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 50442 timeout: 0 lvb_type: 0 Aug 27 22:18:01 fir-md1-s1 kernel: LustreError: 10504:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566969391, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f1275822640/0x5d9ee6e70bada6a3 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 31 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 10504 timeout: 0 lvb_type: 0 Aug 27 22:20:00 fir-md1-s1 kernel: LustreError: 20457:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566969510, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f07c5f1ba80/0x5d9ee6e70bdaec42 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 33 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 20457 timeout: 0 lvb_type: 0 Aug 27 22:20:30 fir-md1-s1 kernel: LustreError: 23607:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1566969540, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2ac3114a40/0x5d9ee6e70be681a2 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 33 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23607 timeout: 0 lvb_type: 0 Aug 28 10:28:39 fir-md1-s1 kernel: LustreError: 21681:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567013229, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f4136e41b00/0x5d9ee6e786713af9 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 35 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21681 timeout: 0 lvb_type: 0 Aug 28 10:29:09 fir-md1-s1 kernel: LustreError: 23681:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567013259, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2ace24c140/0x5d9ee6e78694ca10 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 35 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23681 timeout: 0 lvb_type: 0 Aug 28 10:57:40 fir-md1-s1 kernel: LustreError: 23603:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567014969, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2ab0598480/0x5d9ee6e78df1fd5f lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 37 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 23603 timeout: 0 lvb_type: 0 Aug 28 10:58:10 fir-md1-s1 kernel: LustreError: 50447:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567014999, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8f2920b8cec0/0x5d9ee6e78e1a0d54 lrc: 3/1,0 mode: --/PR res: [0x200029d02:0x1b59c:0x0].0x0 bits 0x13/0x0 rrc: 37 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 50447 timeout: 0 lvb_type: 0
My guess is that a client is still holding the lock on it. It there a way to know which client (knowing the FID)?
Thanks!
Stephane