[LU-16511] recovery-mds-scale test_failover_ost: client was evicted by lustre-MDT0000: lock callback timer expired Created: 29/Jan/23 Updated: 29/Jan/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for Elena <elena.gryaznova@hpe.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/fb180efc-20f3-44e8-9ce3-5eaae34faed0 test_failover_ost failed with the following error: test_failover_ost returned 1 Test session details: recovery-mds-scale.test_failover_ost.dmesg.trevis-101vm5.1674753520.log: [ 573.886865] LNet: Added LNI 10.240.44.244@tcp [8/256/0/180] [ 573.888136] LNet: Accept all, port 7988 [ 574.964677] Lustre: Mounted lustre-client recovery-mds-scale.test_failover_ost.dmesg.trevis-101vm5.1674753520.log: [58558.816712] Lustre: DEBUG MARKER: ==== Check clients loads AFTER failover -- failure NOT OK [58562.233744] Lustre: DEBUG MARKER: ps auxwww | grep -v grep | grep -q run_IOR.sh [58563.354293] Lustre: DEBUG MARKER: /usr/sbin/lctl mark ost7 failed over 8 times, and counting... [58563.979740] Lustre: DEBUG MARKER: ost7 failed over 8 times, and counting... [59372.896018] LustreError: 11-0: lustre-MDT0000-mdc-ffff9b0de7ee6800: operation mds_close to node 10.240.44.248@tcp failed: rc = -107 [59372.897999] Lustre: lustre-MDT0000-mdc-ffff9b0de7ee6800: Connection to lustre-MDT0000 (at 10.240.44.248@tcp) was lost; in progress operations using this service will wait for recovery to complete [59372.900740] Lustre: Skipped 2 previous similar messages [59372.901933] LustreError: 167-0: lustre-MDT0000-mdc-ffff9b0de7ee6800: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. [59372.904876] LustreError: 17673:0:(file.c:242:ll_close_inode_openhandle()) lustre-clilmv-ffff9b0de7ee6800: inode [0x200000bd3:0x4d8d:0x0] mdc close failed: rc = -5 [59372.910935] LustreError: 2562353:0:(file.c:5188:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -108 [59372.911729] Lustre: lustre-MDT0000-mdc-ffff9b0de7ee6800: Connection restored to 10.240.44.248@tcp (at 10.240.44.248@tcp) [59372.914684] Lustre: Skipped 2 previous similar messages console.trevis-101vm9.log: [58245.625662] Lustre: DEBUG MARKER: ==== Check clients loads AFTER failover -- failure NOT OK [58250.145970] Lustre: DEBUG MARKER: /usr/sbin/lctl mark ost7 failed over 8 times, and counting... [58250.950319] Lustre: DEBUG MARKER: ost7 failed over 8 times, and counting... [59018.942424] Lustre: mdt00_008: service thread pid 9989 was inactive for 62.213 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [59018.945437] Pid: 9989, comm: mdt00_008 4.18.0-348.23.1.el8_lustre.x86_64 #1 SMP Wed Jan 4 16:53:58 UTC 2023 [59018.947001] Call Trace TBD: [59018.947614] [<0>] ldlm_completion_ast+0x7ac/0x900 [ptlrpc] [59018.948578] [<0>] ldlm_cli_enqueue_local+0x307/0x860 [ptlrpc] [59018.949573] [<0>] mdt_object_local_lock+0x506/0xb30 [mdt] [59018.950486] [<0>] mdt_object_lock_internal+0x18d/0x4a0 [mdt] [59018.951439] [<0>] mdt_reint_object_lock+0x27/0x60 [mdt] [59018.952329] [<0>] mdt_reint_striped_lock+0x67/0x490 [mdt] [59018.953237] [<0>] mdt_reint_unlink+0xac0/0x1580 [mdt] [59018.954097] [<0>] mdt_reint_rec+0x117/0x270 [mdt] [59018.954911] [<0>] mdt_reint_internal+0x4bc/0x7d0 [mdt] [59018.955784] [<0>] mdt_reint+0x5d/0x110 [mdt] [59018.956553] [<0>] tgt_request_handle+0xc8c/0x19c0 [ptlrpc] [59018.957506] [<0>] ptlrpc_server_handle_request+0x31d/0xbc0 [ptlrpc] [59018.958575] [<0>] ptlrpc_main+0xc48/0x1540 [ptlrpc] [59018.959401] [<0>] kthread+0x116/0x130 [59018.960033] [<0>] ret_from_fork+0x35/0x40 [59059.902193] LustreError: 8396:0:(ldlm_lockd.c:261:expired_lock_main()) ### lock callback timer expired after 103s: evicting client at 10.240.44.244@tcp ns: mdt-lustre-MDT0000_UUID lock: 000000003bcad40d/0x760476a02b02eeb5 lrc: 3 /0,0 mode: PR/PR res: [0x200000bd3:0x4d8d:0x0].0x0 bits 0x12/0x0 rrc: 4 type: IBT gid 0 flags: 0x60200400000020 nid: 10.240.44.244@tcp remote: 0x56d2fb307219d267 expref: 12 pid: 10042 timeout: 59058 lvb_type: 0 [59311.538342] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Duration: 82800 VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |