[LU-5105] Test failure sanity-lfsck test_18d: umount mds hung Created: 27/May/14 Updated: 23/Oct/15 Resolved: 23/Oct/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Nathaniel Clark | Assignee: | nasf (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | zfs | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 14085 |
| Description |
|
This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/6c3d597e-e351-11e3-93d9-52540035b04c. The sub-test test_18d failed with the following error:
Info required for matching: sanity-lfsck 18d MDS syslog: umount D 0000000000000000 0 19510 19509 0x00000080 ffff880051eaf8b8 0000000000000082 0000000000000000 ffff880051eaf87c 0000000000000282 0000000000000282 ffff880051eaf858 ffffffff8108410c ffff8800554de5f8 ffff880051eaffd8 000000000000fbc8 ffff8800554de5f8 Call Trace: [<ffffffff8108410c>] ? lock_timer_base+0x3c/0x70 [<ffffffff815291c2>] schedule_timeout+0x192/0x2e0 [<ffffffff81084220>] ? process_timeout+0x0/0x10 [<ffffffff8152932e>] schedule_timeout_uninterruptible+0x1e/0x20 [<ffffffffa123ddea>] dnode_special_close+0x2a/0x60 [zfs] [<ffffffffa1232652>] dmu_objset_evict+0x92/0x400 [zfs] [<ffffffffa1243c50>] dsl_dataset_evict+0x30/0x1b0 [zfs] [<ffffffffa1223dd9>] dbuf_evict_user+0x49/0x80 [zfs] [<ffffffffa1225087>] dbuf_rele_and_unlock+0xf7/0x1e0 [zfs] [<ffffffffa12254e0>] dmu_buf_rele+0x30/0x40 [zfs] [<ffffffffa1249170>] dsl_dataset_disown+0xb0/0x1d0 [zfs] [<ffffffffa1231751>] dmu_objset_disown+0x11/0x20 [zfs] [<ffffffffa18f690e>] udmu_objset_close+0x2e/0x40 [osd_zfs] [<ffffffffa18f4f86>] osd_device_fini+0x366/0x5c0 [osd_zfs] [<ffffffffa0d9dd53>] class_cleanup+0x573/0xd30 [obdclass] [<ffffffffa0d757a6>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa0d9fa7a>] class_process_config+0x156a/0x1ad0 [obdclass] [<ffffffffa0d97d53>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass] [<ffffffffa0da0159>] class_manual_cleanup+0x179/0x6f0 [obdclass] [<ffffffffa0d73c7b>] ? class_export_put+0x10b/0x2c0 [obdclass] [<ffffffffa18f412d>] osd_obd_disconnect+0x1bd/0x1c0 [osd_zfs] [<ffffffffa0da273b>] lustre_put_lsi+0x1ab/0x11a0 [obdclass] [<ffffffffa0daacf8>] lustre_common_put_super+0x5d8/0xbe0 [obdclass] [<ffffffffa0dd8c70>] server_put_super+0x180/0xe40 [obdclass] [<ffffffff8118b31b>] generic_shutdown_super+0x5b/0xe0 [<ffffffff8118b406>] kill_anon_super+0x16/0x60 [<ffffffffa0da2016>] lustre_kill_super+0x36/0x60 [obdclass] [<ffffffff8118bba7>] deactivate_super+0x57/0x80 [<ffffffff811aabdf>] mntput_no_expire+0xbf/0x110 [<ffffffff811ab72b>] sys_umount+0x7b/0x3a0 |
| Comments |
| Comment by nasf (Inactive) [ 09/Oct/14 ] |
|
Another failure instance: |
| Comment by nasf (Inactive) [ 07/Jan/15 ] |
|
For the failures in https://testing.hpdd.intel.com/test_sets/adb9bee6-4b17-11e4-941e-5254006e85c2: |
| Comment by nasf (Inactive) [ 08/Jan/15 ] |
|
Alex, do you have any idea about umount MDS hung for ZFS based backend? |
| Comment by Alex Zhuravlev [ 08/Jan/15 ] |
|
well, in that specific case it looks like some dnode was still referenced: so the metadnode can't go blocking umount. but this seem to be some old version? we don't have udmu wrappers anymore. |
| Comment by nasf (Inactive) [ 08/Jan/15 ] |
|
Yes, there seems no way to know which dnode still be referenced. The original issue was hit by the patch http://review.whamcloud.com/#/c/10223/. I am not sure whether it is such patch special or not. But since such patch has been landed to master, there should be similar trouble on master branch. But it is also possible that such trouble has been fixed by other patch occasionally. |
| Comment by Alex Zhuravlev [ 11/Jan/15 ] |
|
a patch to dump referenced dnodes. |
| Comment by nasf (Inactive) [ 23/Oct/15 ] |
|
Close it since the issue only has been reported on very old version. |