[LU-5797] Hard Failover replay-dual test_17: OST hung during mounting Created: 23/Oct/14  Updated: 22/Jul/18  Resolved: 22/Jul/18

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Mikhail Pershin
Resolution: Cannot Reproduce Votes: 0
Labels: zfs
Environment:

client and server: lustre-master build #2695
server is zfs


Severity: 3
Rank (Obsolete): 16261

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/230cf764-598f-11e4-9a49-5254006e85c2.

The sub-test test_17 failed with the following error:

test failed to respond and timed out

OST dmesg

Lustre: 2795:0:(client.c:1934:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
INFO: task mount.lustre:3370 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.29.2.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount.lustre  D 0000000000000001     0  3370   3369 0x00000080
 ffff88006f53b718 0000000000000082 0000000000000000 ffff88007d793500
 ffff88006f53b698 ffffffff81055783 ffff88007e4c2ad8 ffff880002316880
 ffff88007d793ab8 ffff88006f53bfd8 000000000000fbc8 ffff88007d793ab8
Call Trace:
 [<ffffffff81055783>] ? set_next_buddy+0x43/0x50
 [<ffffffff8152a5b5>] schedule_timeout+0x215/0x2e0
 [<ffffffff81069f15>] ? enqueue_entity+0x125/0x450
 [<ffffffff8152a233>] wait_for_common+0x123/0x180
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffffa08f75a0>] ? client_lwp_config_process+0x0/0x1978 [obdclass]
 [<ffffffff8152a34d>] wait_for_completion+0x1d/0x20
 [<ffffffffa087eb74>] llog_process_or_fork+0x354/0x540 [obdclass]
 [<ffffffffa087ed74>] llog_process+0x14/0x30 [obdclass]
 [<ffffffffa08ae7f4>] class_config_parse_llog+0x1e4/0x330 [obdclass]
 [<ffffffffa104b3e2>] mgc_process_log+0xeb2/0x1970 [mgc]
 [<ffffffffa1045260>] ? mgc_blocking_ast+0x0/0x810 [mgc]
 [<ffffffffa0ad1700>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc]
 [<ffffffffa104cdb8>] mgc_process_config+0x658/0x1210 [mgc]
 [<ffffffffa08be3cf>] lustre_process_log+0x20f/0xad0 [obdclass]
 [<ffffffffa0772181>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa08bb44f>] ? server_name2fsname+0x6f/0x90 [obdclass]
 [<ffffffffa08f2416>] server_start_targets+0x12b6/0x1af0 [obdclass]
 [<ffffffffa076c3a8>] ? libcfs_log_return+0x28/0x40 [libcfs]
 [<ffffffffa08c1bf6>] ? lustre_start_mgc+0x4b6/0x1e00 [obdclass]
 [<ffffffffa0772181>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa08b9950>] ? class_config_llog_handler+0x0/0x18c0 [obdclass]
 [<ffffffffa08f6ad8>] server_fill_super+0xc58/0x1720 [obdclass]
 [<ffffffffa076c3a8>] ? libcfs_log_return+0x28/0x40 [libcfs]
 [<ffffffffa08c3718>] lustre_fill_super+0x1d8/0x550 [obdclass]
 [<ffffffffa08c3540>] ? lustre_fill_super+0x0/0x550 [obdclass]
 [<ffffffff8118c58f>] get_sb_nodev+0x5f/0xa0
 [<ffffffffa08bb315>] lustre_get_sb+0x25/0x30 [obdclass]
 [<ffffffff8118bbeb>] vfs_kern_mount+0x7b/0x1b0
 [<ffffffff8118bd92>] do_kern_mount+0x52/0x130
 [<ffffffff8119e992>] ? vfs_ioctl+0x22/0xa0
 [<ffffffff811ad76b>] do_mount+0x2fb/0x930
 [<ffffffff811ade30>] sys_mount+0x90/0xe0
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
LustreError: 137-5: lustre-OST0006_UUID: not available for connect from 10.1.5.17@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
LustreError: Skipped 321 previous similar messages
Lustre: 2795:0:(client.c:1934:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1413918696/real 1413918696]  req@ffff880070cc4400 x1482600826798928/t0(0) o38->lustre-MDT0000-lwp-OST0001@10.1.5.16@tcp:12/10 lens 400/544 e 0 to 1 dl 1413918721 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 2795:0:(client.c:1934:ptlrpc_expire_one_request()) Skipped 15 previous similar messages
INFO: task mount.lustre:3370 blocked for more than 120 seconds.
      Tainted: P           ---------------    2.6.32-431.29.2.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount.lustre  D 0000000000000001     0  3370   3369 0x00000080
 ffff88006f53b718 0000000000000082 0000000000000000 ffff88007d793500
 ffff88006f53b698 ffffffff81055783 ffff88007e4c2ad8 ffff880002316880
 ffff88007d793ab8 ffff88006f53bfd8 000000000000fbc8 ffff88007d793ab8


 Comments   
Comment by Jodi Levi (Inactive) [ 24/Oct/14 ]

Mike,
Could you have a look at this one and comment?
Thank you!

Generated at Sat Feb 10 01:54:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.