Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15488

conf-sanity test_6: client_up failed (MDS hangs cv_wait_common)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for eaujames <eaujames@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/2df62cb6-8f14-49b5-a26f-3c278cefa18b

      test_6 failed with the following error:

      client_up failed
      

      This seems to be link to ZFS with b2_12 branch:

      Client:

      [ 2474.629870] LustreError: 11-0: lustre-MDT0003-mdc-ffffa0037a786000: operation mds_connect to node 10.240.22.128@tcp failed: rc = -114
      [ 2479.642804] LustreError: 11-0: lustre-MDT0003-mdc-ffffa0037a786000: operation mds_connect to node 10.240.22.128@tcp failed: rc = -114
      [ 2479.644961] LustreError: Skipped 2 previous similar messages
      [ 2484.650711] LustreError: 11-0: lustre-MDT0002-mdc-ffffa0037a786000: operation mds_connect to node 10.240.22.127@tcp failed: rc = -114
      [ 2484.652857] LustreError: Skipped 2 previous similar messages
      [ 2489.658307] LustreError: 11-0: lustre-MDT0003-mdc-ffffa0037a786000: operation mds_connect to node 10.240.22.128@tcp failed: rc = -114
      [ 2489.660527] LustreError: Skipped 2 previous similar messages
      [ 2495.674146] LustreError: 11-0: lustre-MDT0001-mdc-ffffa0037a786000: operation mds_connect to node 10.240.22.128@tcp failed: rc = -114
      [ 2495.676310] LustreError: Skipped 6 previous similar messages
      [ 2505.689787] LustreError: 11-0: lustre-MDT0000-mdc-ffffa0037a786000: operation mds_connect to node 10.240.22.127@tcp failed: rc = -114
      

      MDS hangs on txg_wait_synced:

      [ 2381.629691] Lustre: lustre-MDT0003: Export ffff902853b4b000 already connecting from 10.240.24.235@tcp
      [ 2391.652112] Lustre: lustre-MDT0003: Export ffff902853b4b000 already connecting from 10.240.24.235@tcp
      [ 2391.653888] Lustre: Skipped 1 previous similar message
      [ 2407.683636] Lustre: lustre-MDT0001: Export ffff9027c0051c00 already connecting from 10.240.24.235@tcp
      [ 2407.685496] Lustre: Skipped 6 previous similar messages
      [ 2416.767892] LNet: Service thread pid 23710 was inactive for 40.13s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [ 2416.770934] Pid: 23710, comm: mdt00_002 3.10.0-1160.49.1.el7_lustre.x86_64 #1 SMP Thu Dec 2 08:52:07 UTC 2021
      [ 2416.772714] Call Trace:
      [ 2416.773264]  [<ffffffffc058b2d5>] cv_wait_common+0x125/0x150 [spl]
      [ 2416.774498]  [<ffffffffc058b315>] __cv_wait+0x15/0x20 [spl]
      [ 2416.775568]  [<ffffffffc09992ff>] txg_wait_synced+0xef/0x140 [zfs]
      [ 2416.776933]  [<ffffffffc141a94b>] osd_trans_stop+0x53b/0x5e0 [osd_zfs]
      [ 2416.778189]  [<ffffffffc1238051>] tgt_server_data_update+0x201/0x510 [ptlrpc]
      [ 2416.779819]  [<ffffffffc1239144>] tgt_client_new+0x494/0x610 [ptlrpc]
      [ 2416.781095]  [<ffffffffc1552495>] mdt_obd_connect+0x465/0x850 [mdt]
      [ 2416.782353]  [<ffffffffc119d49b>] target_handle_connect+0xecb/0x2b60 [ptlrpc]
      [ 2416.783748]  [<ffffffffc124690a>] tgt_request_handle+0x4fa/0x1570 [ptlrpc]
      [ 2416.785101]  [<ffffffffc11ebbcb>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [ 2416.786551]  [<ffffffffc11ef534>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
      [ 2416.787749]  [<ffffffffa46c5e61>] kthread+0xd1/0xe0
      [ 2416.788754]  [<ffffffffa4d95df7>] ret_from_fork_nospec_end+0x0/0x39
      [ 2416.789990]  [<ffffffffffffffff>] 0xffffffffffffffff
      ...
      [ 2427.026534] LNet: Service thread pid 23710 completed after 50.39s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      

      This seems similar to LU-12510 or LU-10223

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      conf-sanity test_6 - client_up failed

      Attachments

        Activity

          People

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: