[LU-13540] MDS "lock on destroyed export", "lock timed out", client hangs Created: 08/May/20 Updated: 02/Feb/22 Resolved: 02/Feb/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Michael Sternberg | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | None | ||
| Environment: |
OS: CentOS 7.7.1908 * 2 MDS (as 1 HA pair), 6 OSS (as 3 HA pairs), serving 2 Lustre file systems: carbofs2, sandbox2.
|
||
| Attachments: |
|
| Epic/Theme: | Lustre-2.12 |
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
We've been struggling for several weeks with recurring issues where at a high turnover rates of files I get MDS hangs like the following: May 8 08:36:16 mds12 kernel: LustreError: 161992:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 172.17.2.4@o2ib ns: mdt-carbofs2-MDT0001_UUID lock: ffff90d827671b00/0xf489323b9f7cfa63 lrc: 3/0,0 mode: PR/PR res: [0x24000b480:0xc5b6:0x0].0x0 bits 0x40/0x0 rrc: 229 type: IBT flags: 0x60200400000020 nid: 172.17.2.4@o2ib remote: 0x819f729a7001b5e5 expref: 74 pid: 162427 timeout: 1597115 lvb_type: 0 May 8 08:36:16 mds12 kernel: LustreError: 162425:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) ### lock on destroyed export ffff90d7f7331c00 ns: mdt-carbofs2-MDT0001_UUID lock: ffff90d827673cc0/0xf489323b9f7cfdf8 lrc: 3/0,0 mode: PW/PW res: [0x24000b480:0xc5b6:0x0].0x0 bits 0x40/0x0 rrc: 207 type: IBT flags: 0x50200400000020 nid: 172.17.2.2@o2ib remote: 0x3006f447d61f79eb expref: 26 pid: 162425 timeout: 0 lvb_type: 0 May 8 08:36:16 mds12 kernel: LustreError: 162425:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) Skipped 195 previous similar messages May 8 08:36:16 mds12 kernel: Lustre: carbofs2-MDT0001: Connection restored to ab1e3886-7712-33ee-ea50-ce69a69a491d (at 172.17.2.4@o2ib) May 8 08:36:16 mds12 kernel: Lustre: Skipped 454 previous similar messages May 8 08:38:04 mds12 kernel: LustreError: 161992:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 172.17.1.221@o2ib ns: mdt-carbofs2-MDT0001_UUID lock: ffff90d7f6fbee40/0xf489323b9f82b37e lrc: 3/0,0 mode: PR/PR res: [0x24000b480:0xc5d4:0x0].0x0 bits 0x40/0x0 rrc: 211 type: IBT flags: 0x60200400000020 nid: 172.17.1.221@o2ib remote: 0x85f23451b667ea0d expref: 138 pid: 162424 timeout: 1597223 lvb_type: 0 May 8 08:38:04 mds12 kernel: LustreError: 161992:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 8 previous similar messages May 8 08:38:54 mds12 kernel: LustreError: 161992:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 172.17.2.5@o2ib ns: mdt-carbofs2-MDT0001_UUID lock: ffff90d61bebe9c0/0xf489323b9f82b53e lrc: 3/0,0 mode: PR/PR res: [0x24000b480:0xc5ca:0x0].0x0 bits 0x40/0x0 rrc: 197 type: IBT flags: 0x60200400000020 nid: 172.17.2.5@o2ib remote: 0xe98766af8c44579e expref: 71 pid: 162423 timeout: 1597273 lvb_type: 0 May 8 08:38:54 mds12 kernel: LustreError: 162415:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-172.17.1.221@o2ib: deadline 138:11s ago#012 req@ffff90ced5611f80 x1662838202338496/t0(0) o101->454af10f-c871-de3c-e61f-fa5345d10e3a@172.17.1.221@o2ib:58/0 lens 480/0 e 0 to 0 dl 1588945123 ref 1 fl Interpret:/0/ffffffff rc 0/-1 May 8 08:38:54 mds12 kernel: LustreError: 180178:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) ### lock on destroyed export ffff90d819744000 ns: mdt-carbofs2-MDT0001_UUID lock: ffff90d276622ac0/0xf489323b9f82ba1c lrc: 3/0,0 mode: PR/PR res: [0x24000b480:0xc5ca:0x0].0x0 bits 0x40/0x0 rrc: 185 type: IBT flags: 0x50200400000020 nid: 172.17.2.5@o2ib remote: 0xe98766af8c4457ac expref: 15 pid: 180178 timeout: 0 lvb_type: 0 May 8 08:38:54 mds12 kernel: LustreError: 180178:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) Skipped 50 previous similar messages May 8 08:38:54 mds12 kernel: Lustre: 162443:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (138:11s); client may timeout. req@ffff90ced5616300 x1662838202338560/t0(0) o101->454af10f-c871-de3c-e61f-fa5345d10e3a@172.17.1.221@o2ib:58/0 lens 480/0 e 0 to 0 dl 1588945123 ref 1 fl Interpret:/0/ffffffff rc 0/-1 May 8 08:38:54 mds12 kernel: Lustre: 162443:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1754 previous similar messages May 8 08:38:54 mds12 kernel: Lustre: carbofs2-MDT0001: Connection restored to 454af10f-c871-de3c-e61f-fa5345d10e3a (at 172.17.1.221@o2ib) May 8 08:38:54 mds12 kernel: LustreError: 162415:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 7 previous similar messages May 8 08:39:34 mds12 kernel: Lustre: carbofs2-MDT0001: Client 454af10f-c871-de3c-e61f-fa5345d10e3a (at 172.17.1.221@o2ib) reconnecting May 8 08:39:34 mds12 kernel: Lustre: Skipped 8 previous similar messages May 8 08:39:34 mds12 kernel: Lustre: carbofs2-MDT0001: Connection restored to 454af10f-c871-de3c-e61f-fa5345d10e3a (at 172.17.1.221@o2ib) May 8 08:39:34 mds12 kernel: Lustre: Skipped 4 previous similar messages May 8 08:39:58 mds12 kernel: LNet: Service thread pid 180635 was inactive for 212.81s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: May 8 08:39:58 mds12 kernel: LNet: Skipped 2 previous similar messages May 8 08:39:58 mds12 kernel: Pid: 180635, comm: mdt00_041 3.10.0-1062.9.1.el7_lustre.x86_64 #1 SMP Tue Feb 11 19:11:56 UTC 2020 May 8 08:39:58 mds12 kernel: Call Trace: May 8 08:39:58 mds12 kernel: [<ffffffffc135ac90>] ldlm_completion_ast+0x430/0x860 [ptlrpc] May 8 08:39:58 mds12 kernel: [<ffffffffc135b7b1>] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] May 8 08:39:58 mds12 kernel: [<ffffffffc17ad5cb>] mdt_object_local_lock+0x50b/0xb20 [mdt] May 8 08:39:58 mds12 kernel: [<ffffffffc17adc50>] mdt_object_lock_internal+0x70/0x360 [mdt] May 8 08:39:58 mds12 kernel: [<ffffffffc17adf60>] mdt_object_lock+0x20/0x30 [mdt] May 8 08:39:58 mds12 kernel: [<ffffffffc17ed0eb>] mdt_brw_enqueue+0x44b/0x760 [mdt] May 8 08:39:58 mds12 kernel: [<ffffffffc179b4bf>] mdt_intent_brw+0x1f/0x30 [mdt] May 8 08:39:58 mds12 kernel: [<ffffffffc17b3d35>] mdt_intent_policy+0x435/0xd80 [mdt] May 8 08:39:58 mds12 kernel: [<ffffffffc1341e06>] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] May 8 08:39:58 mds12 kernel: [<ffffffffc136a516>] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] May 8 08:39:58 mds12 kernel: [<ffffffffc13f2bb2>] tgt_enqueue+0x62/0x210 [ptlrpc] May 8 08:39:58 mds12 kernel: [<ffffffffc13f99ca>] tgt_request_handle+0xada/0x1570 [ptlrpc] May 8 08:39:58 mds12 kernel: [<ffffffffc139e47b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 8 08:39:58 mds12 kernel: [<ffffffffc13a1de4>] ptlrpc_main+0xb34/0x1470 [ptlrpc] May 8 08:39:58 mds12 kernel: [<ffffffff928c61f1>] kthread+0xd1/0xe0 May 8 08:39:58 mds12 kernel: [<ffffffff92f8dd37>] ret_from_fork_nospec_end+0x0/0x39 May 8 08:39:58 mds12 kernel: [<ffffffffffffffff>] 0xffffffffffffffff May 8 08:39:58 mds12 kernel: LustreError: dumping log to /tmp/lustre-log.1588945198.180635 May 8 08:39:59 mds12 kernel: Pid: 180647, comm: mdt00_053 3.10.0-1062.9.1.el7_lustre.x86_64 #1 SMP Tue Feb 11 19:11:56 UTC 2020 May 8 08:39:59 mds12 kernel: Call Trace: May 8 08:39:59 mds12 kernel: [<ffffffffc135ac90>] ldlm_completion_ast+0x430/0x860 [ptlrpc] May 8 08:39:59 mds12 kernel: [<ffffffffc135b7b1>] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] May 8 08:39:59 mds12 kernel: [<ffffffffc17ad5cb>] mdt_object_local_lock+0x50b/0xb20 [mdt] May 8 08:39:59 mds12 kernel: [<ffffffffc17adc50>] mdt_object_lock_internal+0x70/0x360 [mdt] May 8 08:39:59 mds12 kernel: [<ffffffffc17adf60>] mdt_object_lock+0x20/0x30 [mdt] May 8 08:39:59 mds12 kernel: [<ffffffffc17ed0eb>] mdt_brw_enqueue+0x44b/0x760 [mdt] May 8 08:39:59 mds12 kernel: [<ffffffffc179b4bf>] mdt_intent_brw+0x1f/0x30 [mdt] May 8 08:39:59 mds12 kernel: [<ffffffffc17b3d35>] mdt_intent_policy+0x435/0xd80 [mdt] May 8 08:39:59 mds12 kernel: [<ffffffffc1341e06>] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] May 8 08:39:59 mds12 kernel: [<ffffffffc136a516>] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] May 8 08:39:59 mds12 kernel: [<ffffffffc13f2bb2>] tgt_enqueue+0x62/0x210 [ptlrpc] May 8 08:39:59 mds12 kernel: [<ffffffffc13f99ca>] tgt_request_handle+0xada/0x1570 [ptlrpc] May 8 08:39:59 mds12 kernel: [<ffffffffc139e47b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 8 08:39:59 mds12 kernel: [<ffffffffc13a1de4>] ptlrpc_main+0xb34/0x1470 [ptlrpc] May 8 08:39:59 mds12 kernel: [<ffffffff928c61f1>] kthread+0xd1/0xe0 May 8 08:39:59 mds12 kernel: [<ffffffff92f8dd37>] ret_from_fork_nospec_end+0x0/0x39 May 8 08:39:59 mds12 kernel: [<ffffffffffffffff>] 0xffffffffffffffff May 8 08:39:59 mds12 kernel: Pid: 180247, comm: mdt00_035 3.10.0-1062.9.1.el7_lustre.x86_64 #1 SMP Tue Feb 11 19:11:56 UTC 2020 May 8 08:39:59 mds12 kernel: Call Trace: It's been frustratingly difficult to pinpoint a single cause and reproduce those hangs. The stacktrace led me to the following possibly related issues: Thus, I've tried:
There are no lustre-log dumps on the first MDS mds11, and the following on the second MDS mds12. Counts of entry time stamps near the crash time are: 1588930000 = 2020-05-08 04:26:40 CDT -0500 1588940000 = 2020-05-08 07:13:20 CDT -0500 1588950000 = 2020-05-08 10:00:00 CDT -0500 # grep -E -c ':15889[345]....\.' /tmp/lustre-log.15889*[0-9].dump /tmp/lustre-log.1588945198.180635.dump:168133 /tmp/lustre-log.1588945199.180195.dump:131 /tmp/lustre-log.1588945459.92545.dump:2283 /tmp/lustre-log.1588945461.162415.dump:31 /tmp/lustre-log.1588948374.168310.dump:293947 /tmp/lustre-log.1588948526.162419.dump:218 /tmp/lustre-log.1588949132.162453.dump:3906 /tmp/lustre-log.1588949280.92548.dump:7764 /tmp/lustre-log.1588949296.180233.dump:51 /tmp/lustre-log.1588949447.180235.dump:396 /tmp/lustre-log.1588949599.180642.dump:365 /tmp/lustre-log.1588949734.180631.dump:395 /tmp/lustre-log.1588949886.180651.dump:339 /tmp/lustre-log.1588950037.24710.dump:662 /tmp/lustre-log.1588950488.180208.dump:6239 /tmp/lustre-log.1588950643.162456.dump:4 /tmp/lustre-log.1588950795.180657.dump:243 /tmp/lustre-log.1588951549.180643.dump:1670 /tmp/lustre-log.1588952274.180185.dump:2231 /tmp/lustre-log.1588952278.168309.dump:12 /tmp/lustre-log.1588952282.180237.dump:26 /tmp/lustre-log.1588952425.162437.dump:788 /tmp/lustre-log.1588952433.24712.dump:20 /tmp/lustre-log.1588952581.180200.dump:384 /tmp/lustre-log.1588952585.162224.dump:14 /tmp/lustre-log.1588952737.180196.dump:478 /tmp/lustre-log.1588952909.180649.dump:1548 Excerpt from near the crash time, around first error: # grep -E ':15889[345]....\.' /tmp/lustre-log.15889*[0-9].dump | sort -t: -k5,7n ... /tmp/lustre-log.1588945198.180635.dump:00000004:00080000:3.0:1588944970.333403:0:122652:0:(osp_object.c:1592:osp_create()) carbofs2-OST0005-osc-MDT0001: Wrote last used FID: [0x380000400:0xf75cd:0x0], index 5: 0 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944970.334436:0:161976:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) carbofs2-MDT0001: transno 193291912093 is committed /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:3.0:1588944970.334684:0:122652:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 193291912094 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:3.0:1588944970.335904:0:122652:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 193291912095 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:7.0:1588944970.375259:0:168303:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 197573715030 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:7.0:1588944971.150971:0:168303:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 197573715038 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944971.151852:0:161974:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) sandbox2-MDT0001: transno 197573715038 is committed /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944971.151861:0:161974:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) sandbox2-MDT0001: transno 197573715033 is committed /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944971.151867:0:161974:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) sandbox2-MDT0001: transno 197573715026 is committed /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:7.0:1588944971.966343:0:92555:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 193291912098 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:7.0:1588944971.968021:0:92555:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 193291912100 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:7.0:1588944971.969193:0:168303:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 193291912102 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:7.0:1588944972.404425:0:168303:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 197573715052 /tmp/lustre-log.1588945198.180635.dump:00000004:00080000:7.0:1588944972.955662:0:92555:0:(osp_object.c:1592:osp_create()) carbofs2-OST0008-osc-MDT0001: Wrote last used FID: [0x480000400:0xf780d:0x0], index 8: 0 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944972.957507:0:92555:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 193291912104 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944972.958735:0:92555:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 193291912105 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:7.0:1588944974.376516:0:168303:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 197573715066 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944974.595200:0:92555:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 193291912108 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944974.596092:0:92555:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 193291912110 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:7.0:1588944974.597258:0:168303:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 193291912112 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:7.0:1588944975.133486:0:168303:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 197573715072 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944975.486761:0:161976:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) carbofs2-MDT0001: transno 193291912113 is committed /tmp/lustre-log.1588945198.180635.dump:00000004:00080000:0.0:1588944975.565523:0:92555:0:(osp_object.c:1592:osp_create()) carbofs2-OST0002-osc-MDT0001: Wrote last used FID: [0x340000400:0xf75cd:0x0], index 2: 0 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:1.0:1588944975.567362:0:92555:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 193291912115 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:1.0:1588944975.568690:0:92555:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 193291912116 /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944976.478896:0:161974:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) sandbox2-MDT0001: transno 197573715077 is committed /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944976.478911:0:161974:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) sandbox2-MDT0001: transno 197573715064 is committed /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944976.478915:0:161974:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) sandbox2-MDT0001: transno 197573715062 is committed /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944976.478920:0:161974:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) sandbox2-MDT0001: transno 197573715057 is committed /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944976.478922:0:161974:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) sandbox2-MDT0001: transno 197573715055 is committed /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:0.0:1588944976.478933:0:161974:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) sandbox2-MDT0001: transno 197573715042 is committed /tmp/lustre-log.1588945198.180635.dump:00000001:00080000:7.0:1588944976.494952:0:168303:0:(tgt_lastrcvd.c:1873:tgt_txn_stop_cb()) More than one transaction 197573715078 /tmp/lustre-log.1588945198.180635.dump:00010000:00020000:8.0:1588944976.790931:0:161992:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 172.17.2.4@o2ib ns: mdt-carbofs2-MDT0001_UUID lock: ffff90d827671b00/0xf489323b9f7cfa63 lr c: 3/0,0 mode: PR/PR res: [0x24000b480:0xc5b6:0x0].0x0 bits 0x40/0x0 rrc: 229 type: IBT flags: 0x60200400000020 nid: 172.17.2.4@o2ib remote: 0x819f729a7001b5e5 expref: 74 pid: 162427 timeout: 1597115 lvb_type: 0 /tmp/lustre-log.1588945198.180635.dump:00000020:00080000:8.0:1588944976.795299:0:161992:0:(genops.c:1662:class_fail_export()) disconnecting export ffff90d80e74e400/89c3daaf-89ab-87e8-0414-0b4c5a62ec8e /tmp/lustre-log.1588945198.180635.dump:00000020:00000080:8.0:1588944976.795310:0:161992:0:(genops.c:1501:class_disconnect()) disconnect: cookie 0xf489323b396e171f /tmp/lustre-log.1588945198.180635.dump:00000020:00080000:8.0:1588944976.795456:0:161992:0:(genops.c:1679:class_fail_export()) disconnected export ffff90d80e74e400/89c3daaf-89ab-87e8-0414-0b4c5a62ec8e /tmp/lustre-log.1588945198.180635.dump:00010000:00020000:8.0:1588944976.795466:0:161992:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 172.17.2.5@o2ib ns: mdt-carbofs2-MDT0001_UUID lock: ffff90d5dfcf9d40/0xf489323b9f7cfb35 lrc: 3/0,0 mode: PR/PR res: [0x24000b480:0xc5b6:0x0].0x0 bits 0x40/0x0 rrc: 225 type: IBT flags: 0x60200400000020 nid: 172.17.2.5@o2ib remote: 0xe98766af8c442ce1 expref: 396 pid: 180230 timeout: 1597115 lvb_type: 0 /tmp/lustre-log.1588945198.180635.dump:00000020:00080000:8.0:1588944976.795489:0:161992:0:(genops.c:1662:class_fail_export()) disconnecting export ffff90d7f7330c00/7685d096-da00-7900-01cd-d51973277575 /tmp/lustre-log.1588945198.180635.dump:00000020:00000080:8.0:1588944976.795492:0:161992:0:(genops.c:1501:class_disconnect()) disconnect: cookie 0xf489323b396e1589 /tmp/lustre-log.1588945198.180635.dump:00000020:00080000:8.0:1588944976.795644:0:161992:0:(genops.c:1679:class_fail_export()) disconnected export ffff90d7f7330c00/7685d096-da00-7900-01cd-d51973277575 /tmp/lustre-log.1588945198.180635.dump:00010000:00020000:8.0:1588944976.795652:0:161992:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 172.17.1.227@o2ib ns: mdt-carbofs2-MDT0001_UUID lock: ffff90cfe0a10fc0/0xf489323b9f7cfb58 lrc: 3/0,0 mode: PR/PR res: [0x24000b480:0xc5b6:0x0].0x0 bits 0x40/0x0 rrc: 220 type: IBT flags: 0x60200400000020 nid: 172.17.1.227@o2ib remote: 0xf0c3059052dfd0f8 expref: 423 pid: 162415 timeout: 1597115 lvb_type: 0 /tmp/lustre-log.1588945198.180635.dump:00000020:00080000:8.0:1588944976.795663:0:161992:0:(genops.c:1662:class_fail_export()) disconnecting export ffff90d3dd5f4400/e294594d-e357-7d83-07c6-94cee77191e2 /tmp/lustre-log.1588945198.180635.dump:00000020:00000080:8.0:1588944976.795666:0:161992:0:(genops.c:1501:class_disconnect()) disconnect: cookie 0xf489323b396e144e /tmp/lustre-log.1588945198.180635.dump:00000020:00080000:8.0:1588944976.795715:0:161992:0:(genops.c:1679:class_fail_export()) disconnected export ffff90d3dd5f4400/e294594d-e357-7d83-07c6-94cee77191e2 /tmp/lustre-log.1588945198.180635.dump:00010000:00020000:8.0:1588944976.795722:0:161992:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 172.17.1.227@o2ib ns: mdt-carbofs2-MDT0001_UUID lock: ffff90ce9bb90000/0xf489323b9f7cfb66 lrc: 3/0,0 mode: PR/PR res: [0x24000b480:0xc5b6:0x0].0x0 bits 0x40/0x0 rrc: 220 type: IBT flags: 0x60200400000020 nid: 172.17.1.227@o2ib remote: 0xf0c3059052dfd0f1 expref: 420 pid: 162417 timeout: 1597115 lvb_type: 0 /tmp/lustre-log.1588945198.180635.dump:00000020:00080000:8.0:1588944976.795732:0:161992:0:(genops.c:1657:class_fail_export()) disconnecting dead export ffff90d3dd5f4400/e294594d-e357-7d83-07c6-94cee77191e2; skipping /tmp/lustre-log.1588945198.180635.dump:00010000:00020000:8.0:1588944976.795738:0:161992:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 172.17.1.227@o2ib ns: mdt-carbofs2-MDT0001_UUID lock: ffff90cd4ae01b00/0xf489323b9f7cfba5 lrc: 3/0,0 mode: PR/PR res: [0x24000b480:0xc5b6:0x0].0x0 bits 0x40/0x0 rrc: 220 type: IBT flags: 0x60200400000020 nid: 172.17.1.227@o2ib remote: 0xf0c3059052dfd0ff expref: 420 pid: 162220 timeout: 1597115 lvb_type: 0 /tmp/lustre-log.1588945198.180635.dump:00000020:00080000:8.0:1588944976.795748:0:161992:0:(genops.c:1657:class_fail_export()) disconnecting dead export ffff90d3dd5f4400/e294594d-e357-7d83-07c6-94cee77191e2; skipping /tmp/lustre-log.1588945198.180635.dump:00010000:00020000:8.0:1588944976.795754:0:161992:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 172.17.2.2@o2ib ns: mdt-carbofs2-MDT0001_UUID lock: ffff90d5dfcfec00/0xf489323b9f7cfc70 lrc: 3/0,0 mode: PR/PR res: [0x24000b480:0xc5b6:0x0].0x0 bits 0x40/0x0 rrc: 220 type: IBT flags: 0x60200400000020 nid: 172.17.2.2@o2ib remote: 0x3006f447d61f79e4 expref: 78 pid: 180225 timeout: 1597115 lvb_type: 0 /tmp/lustre-log.1588945198.180635.dump:00000020:00080000:8.0:1588944976.795765:0:161992:0:(genops.c:1662:class_fail_export()) disconnecting export ffff90d7f7331c00/644262f6-0235-7572-f194-d4689a69670b /tmp/lustre-log.1588945198.180635.dump:00000020:00000080:8.0:1588944976.795767:0:161992:0:(genops.c:1501:class_disconnect()) disconnect: cookie 0xf489323b396e15cf /tmp/lustre-log.1588945198.180635.dump:00000020:00080000:8.0:1588944976.795814:0:161992:0:(genops.c:1679:class_fail_export()) disconnected export ffff90d7f7331c00/644262f6-0235-7572-f194-d4689a69670b /tmp/lustre-log.1588945198.180635.dump:00010000:00020000:8.0:1588944976.795822:0:161992:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 172.17.2.10@o2ib ns: mdt-carbofs2-MDT0001_UUID lock: ffff90d5cee0ba80/0xf489323b9f7cf77d lrc: 3/0,0 mode: PR/PR res: [0x24000b480:0xc5c0:0x0].0x0 bits 0x40/0x0 rrc: 204 type: IBT flags: 0x60200400000020 nid: 172.17.2.10@o2ib remote: 0x70316307a2048b75 expref: 66 pid: 180231 timeout: 1597115 lvb_type: 0 /tmp/lustre-log.1588945198.180635.dump:00000020:00080000:8.0:1588944976.795861:0:161992:0:(genops.c:1662:class_fail_export()) disconnecting export ffff90d80cbae400/629b4428-94ac-83ba-f61c-69aa684bee44 /tmp/lustre-log.1588945198.180635.dump:00000020:00000080:8.0:1588944976.795865:0:161992:0:(genops.c:1501:class_disconnect()) disconnect: cookie 0xf489323b396e1703 /tmp/lustre-log.1588945198.180635.dump:00000020:00080000:8.0:1588944976.795920:0:161992:0:(genops.c:1679:class_fail_export()) disconnected export ffff90d80cbae400/629b4428-94ac-83ba-f61c-69aa684bee44 /tmp/lustre-log.1588945198.180635.dump:00010000:00020000:8.0:1588944976.795926:0:161992:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 172.17.2.10@o2ib ns: mdt-carbofs2-MDT0001_UUID lock: ffff90d51f3cb180/0xf489323b9f7cf78b lrc: 3/0,0 mode: PR/PR res: [0x24000b480:0xc5c0:0x0].0x0 bits 0x40/0x0 rrc: 204 type: IBT flags: 0x60200400000020 nid: 172.17.2.10@o2ib remote: 0x70316307a2048b7c expref: 63 pid: 162423 timeout: 1597115 lvb_type: 0 /tmp/lustre-log.1588945198.180635.dump:00000020:00080000:8.0:1588944976.795936:0:161992:0:(genops.c:1657:class_fail_export()) disconnecting dead export ffff90d80cbae400/629b4428-94ac-83ba-f61c-69aa684bee44; skipping /tmp/lustre-log.1588945198.180635.dump:00010000:00020000:8.0:1588944976.795943:0:161992:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 172.17.1.224@o2ib ns: mdt-carbofs2-MDT0001_UUID lock: ffff90d36f8bad00/0xf489323b9f7cf784 lrc: 3/0,0 mode: PR/PR res: [0x24000b480:0xc5c0:0x0].0x0 bits 0x40/0x0 rrc: 204 type: IBT flags: 0x60200400000020 nid: 172.17.1.224@o2ib remote: 0xed5de96130f26ae expref: 108 pid: 180199 timeout: 1597115 lvb_type: 0 /tmp/lustre-log.1588945198.180635.dump:00000020:00080000:8.0:1588944976.795952:0:161992:0:(genops.c:1662:class_fail_export()) disconnecting export ffff90d7f7334000/30180198-3c05-91ca-6f14-1fc4bc7a164e ... |
| Comments |
| Comment by Michael Sternberg [ 08/May/20 ] |
|
PS: Similar to several of the LU issues referenced, stopping and restarting the MDT mounts will usually "unwedge" the file system. |
| Comment by Oleg Drokin [ 02/Feb/22 ] |
|
this is a dup of |