Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.7.0, Lustre 2.5.4
-
None
-
3
-
16571
Description
Bug hit with soak tests while testing MDT failover:
LDISKFS-fs warning (device dm-2): ldiskfs_multi_mount_protect: MMP interval 42 higher than expected, please wait. LDISKFS-fs (dm-2): recovery complete LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: soaked-MDT0000: used disk, loading LustreError: 11-0: soaked-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11. Lustre: soaked-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 LDISKFS-fs (dm-3): recovery complete LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: home-MDT0001: used disk, loading Lustre: home-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 Lustre: soaked-MDT0000: Will be in recovery for at least 2:30, or until 9 clients reconnect LustreError: 4283:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff88081283c800 x14844821187 66556/t0(77329207333) o101->e0f06817-48a4-23ba-7529-dacbe1c5c96b@192.168.1.116@o2ib100:0/0 lens 1408/3512 e 0 to 0 dl 1416337962 ref 1 fl Interpret:/4/0 rc 0/0 Lustre: home-MDT0001: Will be in recovery for at least 2:30, or until 9 clients reconnect LustreError: 4653:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff8804122ef800 x14844820751 33836/t0(77310248349) o101->8fe2e201-b336-c2b3-1690-2b8671ccb3ac@192.168.1.121@o2ib100:0/0 lens 568/3512 e 0 to 0 dl 1416337966 ref 1 fl Interpret:/4/0 rc 0/0 LustreError: 4653:0:(mdt_open.c:1652:mdt_reint_open()) Skipped 6 previous similar messages LustreError: 4275:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff88041ec27c00 x14844821336 81580/t0(77313286391) o101->5af8cf20-f59d-b674-4342-95d4f9187d5e@192.168.1.126@o2ib100:0/0 lens 1744/3512 e 0 to 0 dl 1416337976 ref 1 fl Interpret:/4/0 rc 0/0 LustreError: 4275:0:(mdt_open.c:1652:mdt_reint_open()) Skipped 26 previous similar messages LustreError: 4284:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff880836688000 x14844821165 12776/t0(77315789759) o101->0a6d6736-23b0-a177-bd78-0dc387bbcfed@192.168.1.124@o2ib100:0/0 lens 1792/3512 e 0 to 0 dl 1416337994 ref 1 fl Interpret:/4/0 rc 0/0 LustreError: 4284:0:(mdt_open.c:1652:mdt_reint_open()) Skipped 2 previous similar messages LustreError: 4689:0:(mdt_open.c:1652:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff880410c2b400 x14844823276 10436/t0(77382041571) o101->c9895ebb-5dd8-614d-e10a-fe88fed518aa@192.168.1.127@o2ib100:0/0 lens 808/3512 e 0 to 0 dl 1416337998 ref 1 fl Interpret:/4/0 rc 0/0 LustreError: 4689:0:(mdt_open.c:1652:mdt_reint_open()) Skipped 8 previous similar messages Lustre: home-MDT0001: Recovery over after 0:39, of 9 clients 9 recovered and 0 were evicted. Lustre: soaked-MDT0000: Recovery over after 1:24, of 9 clients 9 recovered and 0 were evicted. LustreError: 0:0:(ldlm_lockd.c:344:waiting_locks_callback()) ### lock callback timer expired after 151s: evicting client at 192.168.1 .124@o2ib100 ns: mdt-soaked-MDT0000_UUID lock: ffff88055ce43700/0xeb9796ef206ff833 lrc: 4/0,0 mode: CR/CR res: [0x2ac0000bd6:0xe1db: 0x0].0 bits 0x2 rrc: 6 type: IBT flags: 0x60200000000020 nid: 192.168.1.124@o2ib100 remote: 0x55621c70e3b62760 expref: 76246 pid: 469 4 timeout: 4340788641 lvb_type: 0 LustreError: 4725:0:(mdt_handler.c:4078:mdt_intent_reint()) ASSERTION( rc == 0 ) failed: Error occurred but lock handle is still in use, rc = -2 LustreError: 4725:0:(mdt_handler.c:4078:mdt_intent_reint()) LBUG Pid: 4725, comm: mdt03_011 Call Trace: [<ffffffffa0550895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0550e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa103d79a>] mdt_intent_reint+0x51a/0x520 [mdt] [<ffffffffa103ac4e>] mdt_intent_policy+0x3ae/0x770 [mdt] [<ffffffffa07ed2e5>] ldlm_lock_enqueue+0x135/0x980 [ptlrpc] [<ffffffffa0816dab>] ldlm_handle_enqueue0+0x51b/0x10c0 [ptlrpc] [<ffffffff81069f15>] ? enqueue_entity+0x125/0x450 [<ffffffffa103b116>] mdt_enqueue+0x46/0xe0 [mdt] [<ffffffffa103fcda>] mdt_handle_common+0x52a/0x1470 [mdt] [<ffffffffa107d455>] mds_regular_handle+0x15/0x20 [mdt] [<ffffffffa08470d5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc] [<ffffffffa05514ce>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa05623b5>] ? lc_watchdog_touch+0x65/0x170 [libcfs] [<ffffffffa083e779>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90 [<ffffffffa084843d>] ptlrpc_main+0xaed/0x1770 [ptlrpc] [<ffffffffa0847950>] ? ptlrpc_main+0x0/0x1770 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Kernel panic - not syncing: LBUG
Please note that LU-5930 was already created for the "OPEN & CREAT not in open replay/by_fid" message.
Attachments
Issue Links
- duplicates
-
LU-5930 mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid
-
- Closed
-
- is duplicated by
-
LU-6706 :mdt_intent_reint()) ASSERTION( rc == 0 ) failed: Error occurred but lock handle is still in use, rc = -116
-
- Resolved
-
- is related to
-
LU-6189 LustreError: (mdt_handler.c:4078:mdt_intent_reint()) ASSERTION( rc == 0 ) failed: Error occurred but lock handle is still in use, rc = -116
-
- Resolved
-