[LU-4968] Test failure sanity test_132: umount /mnt/ost2 Created: 28/Apr/14  Updated: 02/Jul/14  Resolved: 12/May/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: revzfs, zfs

Issue Links:
Related
is related to LU-2600 lustre metadata performance is very s... Resolved
is related to LU-5242 Test hang sanity test_132, test_133: ... Resolved
is related to LU-5277 sanity test_132: mdt_build_target_lis... Resolved
Severity: 3
Rank (Obsolete): 13757

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/4b88f43c-cdf2-11e3-a23e-52540035b04c.

The sub-test test_132 failed with the following error:

test failed to respond and timed out

Info required for matching: sanity 132

Umount on OST is hung up in zfs:

umount        D 0000000000000001     0 30525  30524 0x00000080
 ffff88007b7af8b8 0000000000000082 0000000000000000 ffff88007b7af87c
 0000000000000282 0000000000000286 ffff88007b7af858 ffffffff8108410c
 ffff88006e0c05f8 ffff88007b7affd8 000000000000fbc8 ffff88006e0c05f8
Call Trace:
 [<ffffffff8108410c>] ? lock_timer_base+0x3c/0x70
 [<ffffffff815291c2>] schedule_timeout+0x192/0x2e0
 [<ffffffff81084220>] ? process_timeout+0x0/0x10
 [<ffffffff8152932e>] schedule_timeout_uninterruptible+0x1e/0x20
 [<ffffffffa020adea>] dnode_special_close+0x2a/0x60 [zfs]
 [<ffffffffa01ff652>] dmu_objset_evict+0x92/0x400 [zfs]
 [<ffffffffa0210c50>] dsl_dataset_evict+0x30/0x1b0 [zfs]
 [<ffffffffa01f0dd9>] dbuf_evict_user+0x49/0x80 [zfs]
 [<ffffffffa01f2087>] dbuf_rele_and_unlock+0xf7/0x1e0 [zfs]
 [<ffffffffa01f24e0>] dmu_buf_rele+0x30/0x40 [zfs]
 [<ffffffffa0216170>] dsl_dataset_disown+0xb0/0x1d0 [zfs]
 [<ffffffffa01fe751>] dmu_objset_disown+0x11/0x20 [zfs]
 [<ffffffffa0e8290e>] udmu_objset_close+0x2e/0x40 [osd_zfs]
 [<ffffffffa0e80f86>] osd_device_fini+0x366/0x5c0 [osd_zfs]
 [<ffffffffa073dba3>] class_cleanup+0x573/0xd30 [obdclass]
 [<ffffffffa0714826>] ? class_name2dev+0x56/0xe0 [obdclass]
 [<ffffffffa073f8ca>] class_process_config+0x156a/0x1ad0 [obdclass]
 [<ffffffffa05ea4e8>] ? libcfs_log_return+0x28/0x40 [libcfs]
 [<ffffffffa0737be2>] ? lustre_cfg_new+0x312/0x6e0 [obdclass]
 [<ffffffffa073ffa9>] class_manual_cleanup+0x179/0x6f0 [obdclass]
 [<ffffffffa05ea4e8>] ? libcfs_log_return+0x28/0x40 [libcfs]
 [<ffffffffa0e8012d>] osd_obd_disconnect+0x1bd/0x1c0 [osd_zfs]
 [<ffffffffa074258b>] lustre_put_lsi+0x1ab/0x11a0 [obdclass]
 [<ffffffffa074ab48>] lustre_common_put_super+0x5d8/0xbe0 [obdclass]
 [<ffffffffa0778c70>] server_put_super+0x180/0xe40 [obdclass]
 [<ffffffff8118b31b>] generic_shutdown_super+0x5b/0xe0
 [<ffffffff8118b406>] kill_anon_super+0x16/0x60
 [<ffffffffa0741e66>] lustre_kill_super+0x36/0x60 [obdclass]
 [<ffffffff8118bba7>] deactivate_super+0x57/0x80
 [<ffffffff811aabdf>] mntput_no_expire+0xbf/0x110
 [<ffffffff811ab72b>] sys_umount+0x7b/0x3a0
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b


 Comments   
Comment by Nathaniel Clark [ 28/Apr/14 ]

another possible occurance: lustre-rsync-test/tear-down
https://maloo.whamcloud.com/test_sets/7719844a-cd93-11e3-a3da-52540035b04c

Comment by Andreas Dilger [ 28/Apr/14 ]

At first glance these test failures appear to be unrelated to the patches being tested (two different patches on two different bugs).

Comment by James Nunez (Inactive) [ 28/Apr/14 ]

Here's another case of sanity test 132 timeout: https://maloo.whamcloud.com/test_sets/c9ce6e18-cd24-11e3-a3da-52540035b04c

Comment by Nathaniel Clark [ 05/May/14 ]

Another case sanity-lfsck/18d but during mds umount:
https://maloo.whamcloud.com/test_sets/8b4be268-d393-11e3-8935-52540035b04c

Comment by Nathaniel Clark [ 08/May/14 ]

Affected Tests:
sanity/132
sanity-quota/7a
lustre-rsync-test/- (cleanup/teardown)

Comment by Nathaniel Clark [ 08/May/14 ]

Tracing back through test results, this regression seems to have been introduced around Apr 22, 2014. While the instance with the earliest parent commit I can find is for "5565877 LU-3953 build: cache the results of configure between runs" I suspect this was introduced by that commit's parent "72accbe LU-2600 osd-zfs: batched object accounting" which adjusts object accounting in osd-zfs but never seems to run review-zfs.

Comment by Jodi Levi (Inactive) [ 08/May/14 ]

Oleg is reverting Change, 7157

Comment by Oleg Drokin [ 08/May/14 ]

I am reverting lu2600 patch: http://review.whamcloud.com/#/c/10271/

Comment by Nathaniel Clark [ 09/May/14 ]

Revert not yet merged

Comment by Alex Zhuravlev [ 12/May/14 ]

I'm discussing this with Brian B...

Comment by Nathaniel Clark [ 12/May/14 ]

Patch laned to master

Generated at Sat Feb 10 01:47:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.