Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6273

Hard Failover replay-dual test_17: Failover OST mount hang

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.8.0
    • Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0
    • client and server: lustre-master build # 2856
      zfs
    • 3
    • 17590

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/0429703c-ba58-11e4-8053-5254006e85c2.

      The sub-test test_17 failed with the following error:

      test failed to respond and timed out
      

      ost dmesg

      Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
      LustreError: 137-5: lustre-OST0002_UUID: not available for connect from 10.2.4.161@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
      LustreError: Skipped 120 previous similar messages
      INFO: task mount.lustre:3630 blocked for more than 120 seconds.
            Tainted: P           ---------------    2.6.32-431.29.2.el6_lustre.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      mount.lustre  D 0000000000000000     0  3630   3629 0x00000080
       ffff88006edf9718 0000000000000082 0000000000000000 ffff88006ef82040
       ffff88006edf9698 ffffffff81055783 ffff88007e4c2ad8 ffff880002216880
       ffff88006ef825f8 ffff88006edf9fd8 000000000000fbc8 ffff88006ef825f8
      Call Trace:
       [<ffffffff81055783>] ? set_next_buddy+0x43/0x50
       [<ffffffff8152a595>] schedule_timeout+0x215/0x2e0
       [<ffffffff81069f15>] ? enqueue_entity+0x125/0x450
       [<ffffffff8152a213>] wait_for_common+0x123/0x180
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa090cd00>] ? client_lwp_config_process+0x0/0x1948 [obdclass]
       [<ffffffff8152a32d>] wait_for_completion+0x1d/0x20
       [<ffffffffa0898e14>] llog_process_or_fork+0x354/0x540 [obdclass]
       [<ffffffffa0899014>] llog_process+0x14/0x30 [obdclass]
       [<ffffffffa08c81d4>] class_config_parse_llog+0x1e4/0x330 [obdclass]
       [<ffffffffa10314f2>] mgc_process_log+0xeb2/0x1970 [mgc]
       [<ffffffffa102b1f0>] ? mgc_blocking_ast+0x0/0x810 [mgc]
       [<ffffffffa0ad0860>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
       [<ffffffffa1032ef8>] mgc_process_config+0x658/0x1210 [mgc]
       [<ffffffffa08d9383>] lustre_process_log+0x7e3/0x1130 [obdclass]
       [<ffffffffa07891c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa08d514f>] ? server_name2fsname+0x6f/0x90 [obdclass]
       [<ffffffffa0907496>] server_start_targets+0x12b6/0x1af0 [obdclass]
       [<ffffffffa0783818>] ? libcfs_log_return+0x28/0x40 [libcfs]
       [<ffffffffa08dbfe6>] ? lustre_start_mgc+0x4b6/0x1e00 [obdclass]
       [<ffffffffa07891c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa08d3390>] ? class_config_llog_handler+0x0/0x1a70 [obdclass]
       [<ffffffffa090c255>] server_fill_super+0xbe5/0x1690 [obdclass]
       [<ffffffffa0783818>] ? libcfs_log_return+0x28/0x40 [libcfs]
       [<ffffffffa08dde90>] lustre_fill_super+0x560/0xa80 [obdclass]
       [<ffffffffa08dd930>] ? lustre_fill_super+0x0/0xa80 [obdclass]
       [<ffffffff8118c56f>] get_sb_nodev+0x5f/0xa0
       [<ffffffffa08d4ee5>] lustre_get_sb+0x25/0x30 [obdclass]
       [<ffffffff8118bbcb>] vfs_kern_mount+0x7b/0x1b0
       [<ffffffff8118bd72>] do_kern_mount+0x52/0x130
       [<ffffffff8119e972>] ? vfs_ioctl+0x22/0xa0
       [<ffffffff811ad74b>] do_mount+0x2fb/0x930
       [<ffffffff811ade10>] sys_mount+0x90/0xe0
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424565647/real 1424565647]  req@ffff880070449080 x1493765169611180/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.2.4.158@tcp:12/10 lens 400/544 e 0 to 1 dl 1424565672 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
      Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424565712/real 1424565712]  req@ffff880070449680 x1493765169611316/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.2.4.158@tcp:12/10 lens 400/544 e 0 to 1 dl 1424565737 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
      INFO: task mount.lustre:3630 blocked for more than 120 seconds.
            Tainted: P           ---------------    2.6.32-431.29.2.el6_lustre.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      mount.lustre  D 0000000000000000     0  3630   3629 0x00000080
       ffff88006edf9718 0000000000000082 0000000000000000 ffff88006ef82040
       ffff88006edf9698 ffffffff81055783 ffff88007e4c2ad8 ffff880002216880
       ffff88006ef825f8 ffff88006edf9fd8 000000000000fbc8 ffff88006ef825f8
      Call Trace:
       [<ffffffff81055783>] ? set_next_buddy+0x43/0x50
       [<ffffffff8152a595>] schedule_timeout+0x215/0x2e0
       [<ffffffff81069f15>] ? enqueue_entity+0x125/0x450
       [<ffffffff8152a213>] wait_for_common+0x123/0x180
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffffa090cd00>] ? client_lwp_config_process+0x0/0x1948 [obdclass]
       [<ffffffff8152a32d>] wait_for_completion+0x1d/0x20
       [<ffffffffa0898e14>] llog_process_or_fork+0x354/0x540 [obdclass]
       [<ffffffffa0899014>] llog_process+0x14/0x30 [obdclass]
       [<ffffffffa08c81d4>] class_config_parse_llog+0x1e4/0x330 [obdclass]
       [<ffffffffa10314f2>] mgc_process_log+0xeb2/0x1970 [mgc]
       [<ffffffffa102b1f0>] ? mgc_blocking_ast+0x0/0x810 [mgc]
       [<ffffffffa0ad0860>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
       [<ffffffffa1032ef8>] mgc_process_config+0x658/0x1210 [mgc]
       [<ffffffffa08d9383>] lustre_process_log+0x7e3/0x1130 [obdclass]
       [<ffffffffa07891c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa08d514f>] ? server_name2fsname+0x6f/0x90 [obdclass]
       [<ffffffffa0907496>] server_start_targets+0x12b6/0x1af0 [obdclass]
       [<ffffffffa0783818>] ? libcfs_log_return+0x28/0x40 [libcfs]
       [<ffffffffa08dbfe6>] ? lustre_start_mgc+0x4b6/0x1e00 [obdclass]
       [<ffffffffa07891c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa08d3390>] ? class_config_llog_handler+0x0/0x1a70 [obdclass]
       [<ffffffffa090c255>] server_fill_super+0xbe5/0x1690 [obdclass]
       [<ffffffffa0783818>] ? libcfs_log_return+0x28/0x40 [libcfs]
       [<ffffffffa08dde90>] lustre_fill_super+0x560/0xa80 [obdclass]
       [<ffffffffa08dd930>] ? lustre_fill_super+0x0/0xa80 [obdclass]
       [<ffffffff8118c56f>] get_sb_nodev+0x5f/0xa0
       [<ffffffffa08d4ee5>] lustre_get_sb+0x25/0x30 [obdclass]
       [<ffffffff8118bbcb>] vfs_kern_mount+0x7b/0x1b0
       [<ffffffff8118bd72>] do_kern_mount+0x52/0x130
       [<ffffffff8119e972>] ? vfs_ioctl+0x22/0xa0
       [<ffffffff811ad74b>] do_mount+0x2fb/0x930
       [<ffffffff811ade10>] sys_mount+0x90/0xe0
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      LustreError: 137-5: lustre-OST0002_UUID: not available for connect from 10.2.4.156@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
      LustreError: Skipped 304 previous similar messages
      Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424565842/real 1424565842]  req@ffff880070449c80 x1493765169611592/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.2.4.158@tcp:12/10 lens 400/544 e 0 to 1 dl 1424565867 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 3053:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 16 previous similar messages
      INFO: task mount.lustre:3630 blocked for more than 120 seconds.
      

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: