Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12442

recovery-small test_136: mounts stuck in lnet_discover_peer_locked()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for wangshilong <wshilong@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/d8642aba-90b2-11e9-a77a-52540065bddc

      test_136 failed with the following error:

      Timeout occurred after 387 mins, last suite running was recovery-small, restarting cluster to continue tests
      

      [ 236.158316] Pid: 4733, comm: mdt_out00_001 3.10.0-957.12.2.el7_lustre.x86_64 #1 SMP Wed Jun 5 06:59:00 UTC 2019
      [ 236.159304] Call Trace:
      [ 236.159604] [<ffffffffc0c34894>] lnet_discover_peer_locked+0x124/0x3d0 [lnet]
      [ 236.160389] [<ffffffffc0c34bb0>] LNetPrimaryNID+0x70/0x1a0 [lnet]
      [ 236.161207] [<ffffffffc0fce5fe>] ptlrpc_connection_get+0x3e/0x450 [ptlrpc]
      [ 236.162038] [<ffffffffc0fd2664>] ptlrpc_send_reply+0x394/0x840 [ptlrpc]
      [ 236.162790] [<ffffffffc0fd2bdb>] ptlrpc_send_error+0x9b/0x1b0 [ptlrpc]
      [ 236.163596] [<ffffffffc0fd2d00>] ptlrpc_error+0x10/0x20 [ptlrpc]
      [ 236.164310] [<ffffffffc1041898>] tgt_request_handle+0xad8/0x15c0 [ptlrpc]
      [ 236.165230] [<ffffffffc0fe57ee>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
      [ 236.166085] [<ffffffffc0fe92dc>] ptlrpc_main+0xbac/0x1560 [ptlrpc]
      [ 236.166803] [<ffffffff8c0c1d21>] kthread+0xd1/0xe0
      [ 236.167414] [<ffffffff8c775c37>] ret_from_fork_nospec_end+0x0/0x39
      [ 236.168091] [<ffffffffffffffff>] 0xffffffffffffffff
      [ 240.225660] INFO: task mount.lustre:4609 blocked for more than 120 seconds.
      [ 240.226393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 240.227196] mount.lustre D ffff9be33654b0c0 0 4609 4608 0x00000080
      [ 240.228094] Call Trace:
      [ 240.228362] [<ffffffff8c768e19>] schedule+0x29/0x70
      [ 240.228886] [<ffffffff8c766921>] schedule_timeout+0x221/0x2d0
      [ 240.229600] [<ffffffff8c0d31e2>] ? check_preempt_curr+0x92/0xa0
      [ 240.230222] [<ffffffff8c0d3209>] ? ttwu_do_wakeup+0x19/0xe0
      [ 240.230868] [<ffffffff8c7691cd>] wait_for_completion+0xfd/0x140
      [ 240.231490] [<ffffffff8c0d6ae0>] ? wake_up_state+0x20/0x20
      [ 240.232086] [<ffffffffc0cb0ae4>] llog_process_or_fork+0x244/0x450 [obdclass]
      [ 240.232883] [<ffffffffc0cb0d04>] llog_process+0x14/0x20 [obdclass]
      [ 240.233558] [<ffffffffc0ce3ca5>] class_config_parse_llog+0x125/0x350 [obdclass]
      [ 240.234309] [<ffffffffc0f61fd0>] mgc_process_cfg_log+0x790/0xc40 [mgc]
      [ 240.235079] [<ffffffffc0f654b9>] mgc_process_log+0x3d9/0x8f0 [mgc]
      [ 240.235763] [<ffffffffc0f6614f>] ? config_recover_log_add+0x13f/0x280 [mgc]
      [ 240.236494] [<ffffffffc0cebf00>] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass]
      [ 240.237269] [<ffffffffc0f66b1b>] mgc_process_config+0x88b/0x13f0 [mgc]
      [ 240.238017] [<ffffffffc0cefb18>] lustre_process_log+0x2d8/0xad0 [obdclass]
      [ 240.238783] [<ffffffffc0b891a7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [ 240.239494] [<ffffffffc0cda839>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      [ 240.240253] [<ffffffffc0d1e924>] server_start_targets+0x13a4/0x2a20 [obdclass]
      [ 240.241058] [<ffffffffc0cda961>] ? lprocfs_counter_sub+0xc1/0x130 [obdclass]
      [ 240.241835] [<ffffffffc0cebf00>] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass]
      [ 240.242689] [<ffffffffc0d2106c>] server_fill_super+0x10cc/0x1890 [obdclass]
      [ 240.243430] [<ffffffffc0b891a7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [ 240.244122] [<ffffffffc0cf5798>] lustre_fill_super+0x328/0x950 [obdclass]
      [ 240.244905] [<ffffffffc0cf5470>] ? lustre_common_put_super+0x270/0x270 [obdclass]
      [ 240.245701] [<ffffffff8c2457cf>] mount_nodev+0x4f/0xb0
      [ 240.246283] [<ffffffffc0ced968>] lustre_mount+0x38/0x60 [obdclass]
      [ 240.246925] [<ffffffff8c24634e>] mount_fs+0x3e/0x1b0
      [ 240.247505] [<ffffffff8c263ec7>] vfs_kern_mount+0x67/0x110
      [ 240.248084] [<ffffffff8c2664ef>] do_mount+0x1ef/0xce0
      [ 240.248670] [<ffffffff8c23e7aa>] ? __check_object_size+0x1ca/0x250
      [ 240.249377] [<ffffffff8c21caec>] ? kmem_cache_alloc_trace+0x3c/0x200
      [ 240.250037] [<ffffffff8c267323>] SyS_mount+0x83/0xd0
      [ 240.250599] [<ffffffff8c775ddb>] system_call_fastpath+0x22/0x27
      [ 240.251232] [<ffffffff8c775d21>] ? system_call_after_swapgs+0xae/0x146

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      recovery-small test_136 - Timeout occurred after 387 mins, last suite running was recovery-small, restarting cluster to continue tests

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: