Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for S Buisson <sbuisson@ddn.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/65516983-67f2-4883-b7ab-1cb5abdf58e5
test_230n failed with the following error:
Mirroring failed
Test output is:
== sanity test 230n: Dir migration with mirrored file ==== 22:36:31 (1671057391) lfs mirror mirror: cannot get UNLOCK lease, ext 4: No such file or directory (2) error: lfs mirror extend: /mnt/lustre/d230n.sanity/f230n.sanity: cannot merge layout: No such file or directory sanity test_230n: @@@@@@ FAIL: Mirroring failed
In client dmesg, we can see the client was evicted by the MDS:
[13212.828154] LustreError: 866765:0:(file.c:242:ll_close_inode_openhandle()) lustre-clilmv-ffff93f6048ef000: inode [0x200002b15:0x5b9b:0x0] mdc close failed: rc = -2 [13212.831301] LustreError: 11-0: lustre-MDT0000-mdc-ffff93f6048ef000: operation ldlm_enqueue to node 10.240.38.63@tcp failed: rc = -107 [13212.831308] Lustre: lustre-MDT0000-mdc-ffff93f6048ef000: Connection to lustre-MDT0000 (at 10.240.38.63@tcp) was lost; in progress operations using this service will wait for recovery to complete [13212.833555] LustreError: Skipped 1 previous similar message [13212.838262] LustreError: 167-0: lustre-MDT0000-mdc-ffff93f6048ef000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. [13212.847340] Lustre: lustre-MDT0000-mdc-ffff93f6048ef000: Connection restored to 10.240.38.63@tcp (at 10.240.38.63@tcp) [13213.270161] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_230n: @@@@@@ FAIL: Mirroring failed
The MDS claims the client was not responsive to the lock request:
[13026.421567] Lustre: DEBUG MARKER: == sanity test 230n: Dir migration with mirrored file ==== 03:41:59 (1679283719) [13070.199369] Lustre: mdt_rdpg00_003: service thread pid 541635 was inactive for 42.970 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [13070.203029] Pid: 541635, comm: mdt_rdpg00_003 4.18.0-348.23.1.el8_lustre.x86_64 #1 SMP Thu Mar 2 00:54:25 UTC 2023 [13070.204965] Call Trace TBD: [13070.206179] ldlm_completion_ast+0x7ac/0x900 [ptlrpc] [13070.207308] ldlm_cli_enqueue_local+0x307/0x860 [ptlrpc] [13070.208596] mdt_object_local_lock+0x509/0xb30 [mdt] [13070.209666] mdt_object_lock_internal+0x18d/0x4a0 [mdt] [13070.210781] mdt_object_lock+0x1b/0x20 [mdt] [13070.211735] mdt_close_handle_layouts+0x935/0x10b0 [mdt] [13070.212865] mdt_mfd_close+0x510/0xbc0 [mdt] [13070.213811] mdt_close_internal+0xcc/0x250 [mdt] [13070.214833] mdt_close+0x2c0/0x8b0 [mdt] [13070.215745] tgt_request_handle+0xc8c/0x1950 [ptlrpc] [13070.216850] ptlrpc_server_handle_request+0x31d/0xbc0 [ptlrpc] [13070.218096] ptlrpc_main+0xc4e/0x1510 [ptlrpc] [13127.540308] LustreError: 482890:0:(ldlm_lockd.c:261:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.240.38.60@tcp ns: mdt-lustre-MDT0000_UUID lock: 0000000001a8df2b/0x3c9ac870599a2eb7 lrc: 3/0,0 mode: CR/CR res: [0x200002b15:0x5b9b:0x0].0x0 bits 0xd/0x0 rrc: 4 type: IBT gid 0 flags: 0x60200400000020 nid: 10.240.38.60@tcp remote: 0x5d158cc36e2e425f expref: 49 pid: 537456 timeout: 13128 lvb_type: 0 [13128.435958] Lustre: DEBUG MARKER: sanity test_230n: @@@@@@ FAIL: Mirroring failed
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_230n - Mirroring failed
Attachments
Issue Links
- is related to
-
LU-16633 obd_get_mod_rpc_slot() is vulnerable to races
- Resolved