[LU-3475] Failure on test suite lustre-rsync-test test_1: task mdt00_002:9743 blocked for more than 120 seconds Created: 14/Jun/13  Updated: 20/Nov/13  Resolved: 20/Nov/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: None
Environment:

server and client: lustre-master build #1525
client: SLES11 SP2


Severity: 3
Rank (Obsolete): 8715

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/d8d45b58-d512-11e2-b13b-52540035b04c.

The sub-test test_1 failed with the following error:

test failed to respond and timed out

MDS console shows:

19:00:05:Lustre: DEBUG MARKER: == lustre-rsync-test test 1: Simple Replication ====================================================== 19:00:00 (1371175200)
19:00:05:Lustre: DEBUG MARKER: lctl --device lustre-MDT0000 changelog_register -n
19:00:05:Lustre: lustre-MDD0000: changelog on
19:00:05:Lustre: DEBUG MARKER: lctl get_param -n mdd.lustre-MDT0000.changelog_users
19:00:05:Lustre: DEBUG MARKER: dumpe2fs -h /dev/lvm-MDS/P1 2>&1 | grep -q large_xattr
19:00:05:Lustre: DEBUG MARKER: dumpe2fs -h /dev/lvm-MDS/P1 2>&1
19:00:05:Lustre: DEBUG MARKER: dumpe2fs -h /dev/lvm-MDS/P1 2>&1 | grep -q large_xattr
19:02:07:INFO: task mdt00_002:9743 blocked for more than 120 seconds.
19:02:07:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
19:02:07:mdt00_002     D 0000000000000000     0  9743      2 0x00000080
19:02:07: ffff88007d0eda50 0000000000000046 000200000a0a04dc ffff880037d444c0
19:02:07: ffff88007d0eda00 ffffc90000938030 0000000000000246 0000000000000246
19:02:07: ffff88007d0ebaf8 ffff88007d0edfd8 000000000000fb88 ffff88007d0ebaf8
19:02:07:Call Trace:
19:02:07: [<ffffffffa05eee06>] ? htable_lookup+0x1a6/0x1c0 [obdclass]
19:02:07: [<ffffffffa04796fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
19:02:07: [<ffffffffa05ef413>] lu_object_find_at+0xb3/0x360 [obdclass]
19:02:07: [<ffffffff8127f6de>] ? number+0x2ee/0x320
19:02:07: [<ffffffff81063310>] ? default_wake_function+0x0/0x20
19:02:07: [<ffffffffa05f1aea>] dt_locate_at+0x3a/0x140 [obdclass]
19:02:07: [<ffffffffa05c846d>] llog_osd_dir_get+0xdd/0x1e0 [obdclass]
19:02:07: [<ffffffffa05cf6c7>] llog_osd_open+0x427/0xc00 [obdclass]
19:02:07: [<ffffffffa059a36a>] llog_open+0xba/0x2c0 [obdclass]
19:02:07: [<ffffffffa0790997>] llog_origin_handle_open+0x1f7/0x6f0 [ptlrpc]
19:02:07: [<ffffffffa0dcc592>] mdt_llog_create+0x32/0x50 [mdt]
19:02:07: [<ffffffffa0dd2b78>] mdt_handle_common+0x648/0x1660 [mdt]
19:02:07: [<ffffffffa0e0c205>] mds_regular_handle+0x15/0x20 [mdt]
19:02:07: [<ffffffffa07896a8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
19:02:07: [<ffffffffa04795de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
19:02:07: [<ffffffffa048adaf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
19:02:07: [<ffffffffa0780a09>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
19:02:07: [<ffffffff81055ab3>] ? __wake_up+0x53/0x70
19:02:07: [<ffffffffa078aa3e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
19:02:07: [<ffffffffa0789f70>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
19:02:07: [<ffffffff8100c0ca>] child_rip+0xa/0x20
19:02:07: [<ffffffffa0789f70>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
19:02:07: [<ffffffffa0789f70>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
19:02:07: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20


 Comments   
Comment by Peter Jones [ 27/Aug/13 ]

Emoly

Could you please look at this one?

Thanks

Peter

Comment by Emoly Liu [ 20/Nov/13 ]

We haven't seen this error for several months, so close it, and we can reopen it if we see it again.

Generated at Sat Feb 10 01:34:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.