Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Elena <elena.gryaznova@hpe.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/fb180efc-20f3-44e8-9ce3-5eaae34faed0
test_failover_ost failed with the following error:
test_failover_ost returned 1
Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/91846 - 4.18.0-348.7.1.el8_5.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/91846 - 4.18.0-348.23.1.el8_lustre.x86_64
recovery-mds-scale.test_failover_ost.dmesg.trevis-101vm5.1674753520.log:
[ 573.886865] LNet: Added LNI 10.240.44.244@tcp [8/256/0/180] [ 573.888136] LNet: Accept all, port 7988 [ 574.964677] Lustre: Mounted lustre-client
recovery-mds-scale.test_failover_ost.dmesg.trevis-101vm5.1674753520.log:
[58558.816712] Lustre: DEBUG MARKER: ==== Check clients loads AFTER failover -- failure NOT OK [58562.233744] Lustre: DEBUG MARKER: ps auxwww | grep -v grep | grep -q run_IOR.sh [58563.354293] Lustre: DEBUG MARKER: /usr/sbin/lctl mark ost7 failed over 8 times, and counting... [58563.979740] Lustre: DEBUG MARKER: ost7 failed over 8 times, and counting... [59372.896018] LustreError: 11-0: lustre-MDT0000-mdc-ffff9b0de7ee6800: operation mds_close to node 10.240.44.248@tcp failed: rc = -107 [59372.897999] Lustre: lustre-MDT0000-mdc-ffff9b0de7ee6800: Connection to lustre-MDT0000 (at 10.240.44.248@tcp) was lost; in progress operations using this service will wait for recovery to complete [59372.900740] Lustre: Skipped 2 previous similar messages [59372.901933] LustreError: 167-0: lustre-MDT0000-mdc-ffff9b0de7ee6800: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. [59372.904876] LustreError: 17673:0:(file.c:242:ll_close_inode_openhandle()) lustre-clilmv-ffff9b0de7ee6800: inode [0x200000bd3:0x4d8d:0x0] mdc close failed: rc = -5 [59372.910935] LustreError: 2562353:0:(file.c:5188:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -108 [59372.911729] Lustre: lustre-MDT0000-mdc-ffff9b0de7ee6800: Connection restored to 10.240.44.248@tcp (at 10.240.44.248@tcp) [59372.914684] Lustre: Skipped 2 previous similar messages
console.trevis-101vm9.log:
[58245.625662] Lustre: DEBUG MARKER: ==== Check clients loads AFTER failover -- failure NOT OK [58250.145970] Lustre: DEBUG MARKER: /usr/sbin/lctl mark ost7 failed over 8 times, and counting... [58250.950319] Lustre: DEBUG MARKER: ost7 failed over 8 times, and counting... [59018.942424] Lustre: mdt00_008: service thread pid 9989 was inactive for 62.213 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [59018.945437] Pid: 9989, comm: mdt00_008 4.18.0-348.23.1.el8_lustre.x86_64 #1 SMP Wed Jan 4 16:53:58 UTC 2023 [59018.947001] Call Trace TBD: [59018.947614] [<0>] ldlm_completion_ast+0x7ac/0x900 [ptlrpc] [59018.948578] [<0>] ldlm_cli_enqueue_local+0x307/0x860 [ptlrpc] [59018.949573] [<0>] mdt_object_local_lock+0x506/0xb30 [mdt] [59018.950486] [<0>] mdt_object_lock_internal+0x18d/0x4a0 [mdt] [59018.951439] [<0>] mdt_reint_object_lock+0x27/0x60 [mdt] [59018.952329] [<0>] mdt_reint_striped_lock+0x67/0x490 [mdt] [59018.953237] [<0>] mdt_reint_unlink+0xac0/0x1580 [mdt] [59018.954097] [<0>] mdt_reint_rec+0x117/0x270 [mdt] [59018.954911] [<0>] mdt_reint_internal+0x4bc/0x7d0 [mdt] [59018.955784] [<0>] mdt_reint+0x5d/0x110 [mdt] [59018.956553] [<0>] tgt_request_handle+0xc8c/0x19c0 [ptlrpc] [59018.957506] [<0>] ptlrpc_server_handle_request+0x31d/0xbc0 [ptlrpc] [59018.958575] [<0>] ptlrpc_main+0xc48/0x1540 [ptlrpc] [59018.959401] [<0>] kthread+0x116/0x130 [59018.960033] [<0>] ret_from_fork+0x35/0x40 [59059.902193] LustreError: 8396:0:(ldlm_lockd.c:261:expired_lock_main()) ### lock callback timer expired after 103s: evicting client at 10.240.44.244@tcp ns: mdt-lustre-MDT0000_UUID lock: 000000003bcad40d/0x760476a02b02eeb5 lrc: 3 /0,0 mode: PR/PR res: [0x200000bd3:0x4d8d:0x0].0x0 bits 0x12/0x0 rrc: 4 type: IBT gid 0 flags: 0x60200400000020 nid: 10.240.44.244@tcp remote: 0x56d2fb307219d267 expref: 12 pid: 10042 timeout: 59058 lvb_type: 0 [59311.538342] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Duration: 82800
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
recovery-mds-scale test_failover_ost - test_failover_ost returned 1