[LU-15488] conf-sanity test_6: client_up failed (MDS hangs cv_wait_common) Created: 27/Jan/22 Updated: 27/Jan/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for eaujames <eaujames@ddn.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/2df62cb6-8f14-49b5-a26f-3c278cefa18b test_6 failed with the following error: client_up failed This seems to be link to ZFS with b2_12 branch: Client: [ 2474.629870] LustreError: 11-0: lustre-MDT0003-mdc-ffffa0037a786000: operation mds_connect to node 10.240.22.128@tcp failed: rc = -114 [ 2479.642804] LustreError: 11-0: lustre-MDT0003-mdc-ffffa0037a786000: operation mds_connect to node 10.240.22.128@tcp failed: rc = -114 [ 2479.644961] LustreError: Skipped 2 previous similar messages [ 2484.650711] LustreError: 11-0: lustre-MDT0002-mdc-ffffa0037a786000: operation mds_connect to node 10.240.22.127@tcp failed: rc = -114 [ 2484.652857] LustreError: Skipped 2 previous similar messages [ 2489.658307] LustreError: 11-0: lustre-MDT0003-mdc-ffffa0037a786000: operation mds_connect to node 10.240.22.128@tcp failed: rc = -114 [ 2489.660527] LustreError: Skipped 2 previous similar messages [ 2495.674146] LustreError: 11-0: lustre-MDT0001-mdc-ffffa0037a786000: operation mds_connect to node 10.240.22.128@tcp failed: rc = -114 [ 2495.676310] LustreError: Skipped 6 previous similar messages [ 2505.689787] LustreError: 11-0: lustre-MDT0000-mdc-ffffa0037a786000: operation mds_connect to node 10.240.22.127@tcp failed: rc = -114 MDS hangs on txg_wait_synced: [ 2381.629691] Lustre: lustre-MDT0003: Export ffff902853b4b000 already connecting from 10.240.24.235@tcp [ 2391.652112] Lustre: lustre-MDT0003: Export ffff902853b4b000 already connecting from 10.240.24.235@tcp [ 2391.653888] Lustre: Skipped 1 previous similar message [ 2407.683636] Lustre: lustre-MDT0001: Export ffff9027c0051c00 already connecting from 10.240.24.235@tcp [ 2407.685496] Lustre: Skipped 6 previous similar messages [ 2416.767892] LNet: Service thread pid 23710 was inactive for 40.13s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 2416.770934] Pid: 23710, comm: mdt00_002 3.10.0-1160.49.1.el7_lustre.x86_64 #1 SMP Thu Dec 2 08:52:07 UTC 2021 [ 2416.772714] Call Trace: [ 2416.773264] [<ffffffffc058b2d5>] cv_wait_common+0x125/0x150 [spl] [ 2416.774498] [<ffffffffc058b315>] __cv_wait+0x15/0x20 [spl] [ 2416.775568] [<ffffffffc09992ff>] txg_wait_synced+0xef/0x140 [zfs] [ 2416.776933] [<ffffffffc141a94b>] osd_trans_stop+0x53b/0x5e0 [osd_zfs] [ 2416.778189] [<ffffffffc1238051>] tgt_server_data_update+0x201/0x510 [ptlrpc] [ 2416.779819] [<ffffffffc1239144>] tgt_client_new+0x494/0x610 [ptlrpc] [ 2416.781095] [<ffffffffc1552495>] mdt_obd_connect+0x465/0x850 [mdt] [ 2416.782353] [<ffffffffc119d49b>] target_handle_connect+0xecb/0x2b60 [ptlrpc] [ 2416.783748] [<ffffffffc124690a>] tgt_request_handle+0x4fa/0x1570 [ptlrpc] [ 2416.785101] [<ffffffffc11ebbcb>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [ 2416.786551] [<ffffffffc11ef534>] ptlrpc_main+0xb34/0x1470 [ptlrpc] [ 2416.787749] [<ffffffffa46c5e61>] kthread+0xd1/0xe0 [ 2416.788754] [<ffffffffa4d95df7>] ret_from_fork_nospec_end+0x0/0x39 [ 2416.789990] [<ffffffffffffffff>] 0xffffffffffffffff ... [ 2427.026534] LNet: Service thread pid 23710 completed after 50.39s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). This seems similar to VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |