[LU-4696] Test timeout on sanity test_51ba: nlink before: 70002, created before: 70000 Created: 03/Mar/14  Updated: 09/Jan/20  Resolved: 09/Jan/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: zfs

Issue Links:
Duplicate
is duplicated by LU-4697 Test timeout on sanity test_51ba: nli... Closed
Related
is related to LU-2600 lustre metadata performance is very s... Resolved
Severity: 3
Rank (Obsolete): 12908

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/27fddb2c-a070-11e3-9f3a-52540035b04c.

The sub-test test_51ba failed with the following error:

test failed to respond and timed out

Info required for matching: sanity 51ba



 Comments   
Comment by Andreas Dilger [ 12/Mar/14 ]

It looks like the client is doing an unlink:

16:39:05:unlinkmany
16:39:05:Call Trace:
16:39:05: [<ffffffff81528a42>] schedule_timeout+0x192/0x2e0
16:39:05: [<ffffffffa0777a2a>] ptlrpc_set_wait+0x2da/0x860 [ptlrpc]
16:39:05: [<ffffffffa0778037>] ptlrpc_queue_wait+0x87/0x220 [ptlrpc]
16:39:05: [<ffffffffa09c4bf5>] mdc_reint+0x75/0x3b0 [mdc]
16:39:05: [<ffffffffa09c5c30>] mdc_unlink+0x1b0/0x500 [mdc]
16:39:05: [<ffffffffa09783ca>] lmv_unlink+0x31a/0x8b0 [lmv]
16:39:05: [<ffffffffa0b7b6db>] ll_rmdir+0x15b/0x5d0 [lustre]
16:39:05: [<ffffffff81197ab0>] vfs_rmdir+0xc0/0xf0

and the server is doing an unlink:

16:39:06:mdt00_004
16:39:06:Call Trace:
16:39:06: [<ffffffff81528823>] io_schedule+0x73/0xc0
16:39:06: [<ffffffffa0142e7c>] cv_wait_common+0x8c/0x100 [spl]
16:39:06: [<ffffffffa0142f08>] __cv_wait_io+0x18/0x20 [spl]
16:39:06: [<ffffffffa02864ab>] zio_wait+0xfb/0x1b0 [zfs]
16:39:06: [<ffffffffa01f2fdd>] dbuf_read+0x3fd/0x750 [zfs]
16:39:06: [<ffffffffa01f34b9>] __dbuf_hold_impl+0x189/0x480 [zfs]
16:39:06: [<ffffffffa01f382f>] dbuf_hold_impl+0x7f/0xb0 [zfs]
16:39:06: [<ffffffffa01f48e0>] dbuf_hold+0x20/0x30 [zfs]
16:39:06: [<ffffffffa01fa6e7>] dmu_buf_hold+0x97/0x1d0 [zfs]
16:39:06: [<ffffffffa0255e17>] zap_lockdir+0x57/0x730 [zfs]
16:39:06: [<ffffffffa02579a4>] zap_cursor_retrieve+0x1e4/0x2f0 [zfs]
16:39:06: [<ffffffffa0e91278>] osd_index_retrieve_skip_dots+0x28/0x60 [osd_zfs]
16:39:06: [<ffffffffa0e91888>] osd_dir_it_next+0x98/0x120 [osd_zfs]
16:39:06: [<ffffffffa1081431>] lod_it_next+0x21/0x90 [lod]
16:39:06: [<ffffffffa10d5fe9>] mdd_may_delete+0x519/0x9f0 [mdd]
16:39:06: [<ffffffffa10d6505>] mdd_unlink_sanity_check+0x45/0x100 [mdd]
16:39:06: [<ffffffffa10dc453>] mdd_unlink+0x233/0xd00 [mdd]
16:39:06: [<ffffffffa0fbb4e8>] mdo_unlink+0x18/0x50 [mdt]
16:39:06: [<ffffffffa0fc1470>] mdt_reint_unlink+0xa10/0x1170 [mdt]
16:39:06: [<ffffffffa0fbb1e1>] mdt_reint_rec+0x41/0xe0 [mdt]
16:39:06: [<ffffffffa0fa0e13>] mdt_reint_internal+0x4c3/0x780 [mdt]
16:39:06: [<ffffffffa0fa165b>] mdt_reint+0x6b/0x120 [mdt]
16:39:06: [<ffffffffa09ef43c>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
16:39:06: [<ffffffffa099e6ea>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
16:39:05: [<ffffffff8119a9b4>] do_rmdir+0x184/0x1f0
16:39:05: [<ffffffff8119aa76>] sys_rmdir+0x16/0x20

It isn't clear if this is ZFS just being slow, or if these threads are hung.

Comment by Andreas Dilger [ 12/Mar/14 ]

This may just be caused by ZFS slowness.

Comment by Andreas Dilger [ 09/Jan/20 ]

Close old bug

Generated at Sat Feb 10 01:45:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.