Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5797

Hard Failover replay-dual test_17: OST hung during mounting

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.7.0
    • client and server: lustre-master build #2695
      server is zfs
    • 3
    • 16261

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/230cf764-598f-11e4-9a49-5254006e85c2.

      The sub-test test_17 failed with the following error:

      test failed to respond and timed out
      

      OST dmesg

      Lustre: 2795:0:(client.c:1934:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
      INFO: task mount.lustre:3370 blocked for more than 120 seconds.
            Tainted: P           ---------------    2.6.32-431.29.2.el6_lustre.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      mount.lustre  D 0000000000000001     0  3370   3369 0x00000080
       ffff88006f53b718 0000000000000082 0000000000000000 ffff88007d793500
       ffff88006f53b698 ffffffff81055783 ffff88007e4c2ad8 ffff880002316880
       ffff88007d793ab8 ffff88006f53bfd8 000000000000fbc8 ffff88007d793ab8
      Call Trace:
       [<ffffffff81055783>] ? set_next_buddy+0x43/0x50
       [<ffffffff8152a5b5>] schedule_timeout+0x215/0x2e0
       [<ffffffff81069f15>] ? enqueue_entity+0x125/0x450
       [<ffffffff8152a233>] wait_for_common+0x123/0x180
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa08f75a0>] ? client_lwp_config_process+0x0/0x1978 [obdclass]
       [<ffffffff8152a34d>] wait_for_completion+0x1d/0x20
       [<ffffffffa087eb74>] llog_process_or_fork+0x354/0x540 [obdclass]
       [<ffffffffa087ed74>] llog_process+0x14/0x30 [obdclass]
       [<ffffffffa08ae7f4>] class_config_parse_llog+0x1e4/0x330 [obdclass]
       [<ffffffffa104b3e2>] mgc_process_log+0xeb2/0x1970 [mgc]
       [<ffffffffa1045260>] ? mgc_blocking_ast+0x0/0x810 [mgc]
       [<ffffffffa0ad1700>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc]
       [<ffffffffa104cdb8>] mgc_process_config+0x658/0x1210 [mgc]
       [<ffffffffa08be3cf>] lustre_process_log+0x20f/0xad0 [obdclass]
       [<ffffffffa0772181>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa08bb44f>] ? server_name2fsname+0x6f/0x90 [obdclass]
       [<ffffffffa08f2416>] server_start_targets+0x12b6/0x1af0 [obdclass]
       [<ffffffffa076c3a8>] ? libcfs_log_return+0x28/0x40 [libcfs]
       [<ffffffffa08c1bf6>] ? lustre_start_mgc+0x4b6/0x1e00 [obdclass]
       [<ffffffffa0772181>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa08b9950>] ? class_config_llog_handler+0x0/0x18c0 [obdclass]
       [<ffffffffa08f6ad8>] server_fill_super+0xc58/0x1720 [obdclass]
       [<ffffffffa076c3a8>] ? libcfs_log_return+0x28/0x40 [libcfs]
       [<ffffffffa08c3718>] lustre_fill_super+0x1d8/0x550 [obdclass]
       [<ffffffffa08c3540>] ? lustre_fill_super+0x0/0x550 [obdclass]
       [<ffffffff8118c58f>] get_sb_nodev+0x5f/0xa0
       [<ffffffffa08bb315>] lustre_get_sb+0x25/0x30 [obdclass]
       [<ffffffff8118bbeb>] vfs_kern_mount+0x7b/0x1b0
       [<ffffffff8118bd92>] do_kern_mount+0x52/0x130
       [<ffffffff8119e992>] ? vfs_ioctl+0x22/0xa0
       [<ffffffff811ad76b>] do_mount+0x2fb/0x930
       [<ffffffff811ade30>] sys_mount+0x90/0xe0
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      LustreError: 137-5: lustre-OST0006_UUID: not available for connect from 10.1.5.17@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
      LustreError: Skipped 321 previous similar messages
      Lustre: 2795:0:(client.c:1934:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1413918696/real 1413918696]  req@ffff880070cc4400 x1482600826798928/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.1.5.16@tcp:12/10 lens 400/544 e 0 to 1 dl 1413918721 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 2795:0:(client.c:1934:ptlrpc_expire_one_request()) Skipped 15 previous similar messages
      INFO: task mount.lustre:3370 blocked for more than 120 seconds.
            Tainted: P           ---------------    2.6.32-431.29.2.el6_lustre.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      mount.lustre  D 0000000000000001     0  3370   3369 0x00000080
       ffff88006f53b718 0000000000000082 0000000000000000 ffff88007d793500
       ffff88006f53b698 ffffffff81055783 ffff88007e4c2ad8 ffff880002316880
       ffff88007d793ab8 ffff88006f53bfd8 000000000000fbc8 ffff88007d793ab8
      

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: