[LU-4427] performance-sanity test_6: mdsrate hung (sys_mknod) Created: 03/Jan/14  Updated: 21/Nov/16  Resolved: 21/Nov/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.1, Lustre 2.4.3, Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Jian Yu Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: zfs
Environment:

Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/5/
FSTYPE=zfs


Issue Links:
Duplicate
is duplicated by LU-4408 parallel-scale test statahead hung Closed
Related
is related to LU-3786 performance-sanity test_6: mkdir hung... Resolved
Severity: 3
Rank (Obsolete): 12167

 Description   

performance-sanity test_6 hung as follows on client:

13:08:35:Lustre: DEBUG MARKER: ===== mdsrate-lookup-10dirs.sh Test preparation: creating 10 dirs with 26778 files.
13:08:35:INFO: task mdsrate:19365 blocked for more than 120 seconds.
13:08:35:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
13:08:35:mdsrate       D 0000000000000001     0 19365  19361 0x00000080
13:08:35: ffff88005eba1ac8 0000000000000082 0000000000000000 ffff88006e406800
13:08:35: ffff88005eba1a88 ffffffffa0681629 ffff88007a0aee00 0000000000000003
13:08:35: ffff88006dfd2638 ffff88005eba1fd8 000000000000fb88 ffff88006dfd2638
13:08:35:Call Trace:
13:08:35: [<ffffffffa0681629>] ? __ptlrpc_request_bufs_pack+0x349/0x3c0 [ptlrpc]
13:08:35: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180
13:08:35: [<ffffffff8150f62b>] mutex_lock+0x2b/0x50
13:08:35: [<ffffffffa084a8dc>] mdc_reint+0x3c/0x3b0 [mdc]
13:08:35: [<ffffffffa084beae>] mdc_create+0x20e/0x780 [mdc]
13:08:35: [<ffffffffa0ae6196>] lmv_create+0x316/0x700 [lmv]
13:08:35: [<ffffffffa09dbf46>] ll_new_node+0x176/0x640 [lustre]
13:08:35: [<ffffffffa09dc578>] ll_mknod_generic+0x168/0x280 [lustre]
13:08:35: [<ffffffff8118e0e3>] ? generic_permission+0x23/0xb0
13:08:35: [<ffffffffa09e2ef1>] ll_create_nd+0x971/0xe80 [lustre]
13:08:35: [<ffffffff8118e4c2>] ? __lookup_hash+0x102/0x160
13:08:35: [<ffffffff8118fbd4>] vfs_create+0xb4/0xe0
13:08:35: [<ffffffff81192800>] sys_mknodat+0x280/0x2a0
13:08:35: [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
13:08:35: [<ffffffff8119283a>] sys_mknod+0x1a/0x20
13:08:35: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Maloo report: https://maloo.whamcloud.com/test_sets/fc281616-73c4-11e3-b4ff-52540035b04c



 Comments   
Comment by Jian Yu [ 03/Jan/14 ]

By searching on Maloo, I found that performance-sanity passed on Lustre b2_5 build #3 (2.5.0) with FSTYPE=zfs:
https://maloo.whamcloud.com/test_sets/dc6ee504-37f4-11e3-8bc4-52540035b04c

Comment by Jian Yu [ 06/Jan/14 ]

A second run of performance-sanity test 6 with FSTYPE=zfs hit LU-3786.

Comment by Jian Yu [ 06/Jan/14 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/5/
FSTYPE=zfs

parallel-scale test statahead hit the same failure:
https://maloo.whamcloud.com/test_sets/74e5fee4-76b6-11e3-8c14-52540035b04c

Comment by Jian Yu [ 09/Mar/14 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/39/ (2.5.1 RC1)
Distro/Arch: RHEL6.5/x86_64
FSTYPE=zfs

The same failure occurred:
https://maloo.whamcloud.com/test_sets/7a0a591c-a607-11e3-8a1b-52540035b04c

Comment by Jian Yu [ 17/Mar/14 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/73/ (2.4.3 RC1)
Distro/Arch: RHEL6.4/x86_64
FSTYPE=zfs

https://maloo.whamcloud.com/test_sets/e70e3c7c-ac60-11e3-81d7-52540035b04c

Comment by Jian Yu [ 24/Apr/14 ]

Lustre build: http://build.whamcloud.com/job/lustre-b2_5/47/
FSTYPE=zfs

The same failure occurred:
https://maloo.whamcloud.com/test_sets/99e3e576-cb70-11e3-95c9-52540035b04c

Comment by Peter Jones [ 24/Apr/14 ]

Nathaniel

Could you please look into this one?

Thanks

Peter

Comment by Jian Yu [ 05/Jun/14 ]

Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/61/
FSTYPE=zfs

The same failure occurred: https://maloo.whamcloud.com/test_sets/50a6f7f0-ebe5-11e3-82b2-52540035b04c

Comment by Nathaniel Clark [ 12/Jun/14 ]

performance-sanity/6 hangs seems to fall into two catagories.
1) performance-sanity/5 fails (LU-4428 or LU-5146) then mdsrate hangs (this bug)
2) test_5 passes and mkdir hangs (LU-3786)

Comment by Jian Yu [ 19/Jan/15 ]

FYI, by searching on Maloo, I found the failure did not occur on master branch. I also tried to reproduce the failure on master build #2820 but failed to hit the issue.

Here are the test reports on master build #2820:

performance-sanity:
https://testing.hpdd.intel.com/test_sets/b60591f8-9d6d-11e4-9d48-5254006e85c2 (ldiskfs)
https://testing.hpdd.intel.com/test_sets/b915d632-9d6d-11e4-9d48-5254006e85c2 (ldiskfs)
https://testing.hpdd.intel.com/test_sets/34af27f2-9fe5-11e4-a43c-5254006e85c2 (zfs)
https://testing.hpdd.intel.com/test_sets/98bf8674-9fe5-11e4-a43c-5254006e85c2 (zfs)

parallel-scale:
https://testing.hpdd.intel.com/test_sets/bc82d59a-9d6d-11e4-9d48-5254006e85c2 (ldiskfs)
https://testing.hpdd.intel.com/test_sets/c5ccc930-9d6d-11e4-9d48-5254006e85c2 (ldiskfs)
https://testing.hpdd.intel.com/test_sets/7af16a0c-9fc4-11e4-a450-5254006e85c2 (zfs)
https://testing.hpdd.intel.com/test_sets/a98eeeb6-9fc4-11e4-a450-5254006e85c2 (zfs)
https://testing.hpdd.intel.com/test_sets/bffa1554-9fc4-11e4-a450-5254006e85c2 (zfs)
https://testing.hpdd.intel.com/test_sets/f702fb8e-9ef6-11e4-a23e-5254006e85c2 (zfs)
https://testing.hpdd.intel.com/test_sets/fd7dc124-9ef6-11e4-a23e-5254006e85c2 (zfs)

Comment by Sarah Liu [ 02/Jul/15 ]

another instance:
https://testing.hpdd.intel.com/test_sets/5ccde712-1251-11e5-bec9-5254006e85c2

Comment by James A Simmons [ 27/Aug/15 ]

Does this still happen?

Comment by Jian Yu [ 01/Sep/15 ]

Does this still happen?

By searching on Maloo, I found the latest performance-sanity test 6 time out failures were all LU-3786.

Comment by Alex Zhuravlev [ 23/May/16 ]

I can't find any problem in the report above. can it be that ZFS is just making progress too slow?

Comment by Nathaniel Clark [ 20/Jul/16 ]

The most recent failure of this type is from 2015-03-24 (all other perf-sanity/6 timeouts are LU-3786)
https://testing.hpdd.intel.com/test_sets/d817106c-d325-11e4-a357-5254006e85c2

Generated at Sat Feb 10 01:42:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.