[LU-5161] replay-single test_42: OBD refcount is 5 Created: 09/Jun/14 Updated: 29/Jul/16 Resolved: 29/Jul/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 14231 | ||||||||
| Description |
|
This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com> This issue relates to the following test suite run: The sub-test test_42 failed with the following error:
Info required for matching: replay-single 42 OST Console log: 20:09:19:Lustre: DEBUG MARKER: umount -d /mnt/ost1 20:09:19:Lustre: Failing over lustre-OST0000 20:09:19:Lustre: lustre-OST0000: Not available for connect from 10.1.4.231@tcp (stopping) 20:09:19:Lustre: Skipped 2 previous similar messages 20:09:19:Lustre: lustre-OST0000: Not available for connect from 10.1.4.224@tcp (stopping) 20:09:19:Lustre: Skipped 1 previous similar message 20:09:19:Lustre: lustre-OST0000: Not available for connect from 10.1.4.225@tcp (stopping) 20:09:19:Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck? 20:09:19:Lustre: lustre-OST0000: Not available for connect from 10.1.4.231@tcp (stopping) 20:09:19:Lustre: Skipped 4 previous similar messages 20:09:19:Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 5. Is it stuck? 20:09:19:Lustre: lustre-OST0000: Not available for connect from 10.1.4.225@tcp (stopping) 20:09:19:Lustre: Skipped 9 previous similar messages 20:09:19:Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck? 20:09:19:Lustre: lustre-OST0000: Not available for connect from 10.1.4.231@tcp (stopping) 20:09:19:Lustre: Skipped 19 previous similar messages 20:20:16:Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 5. Is it stuck? 20:20:16:Lustre: lustre-OST0000: Not available for connect from 10.1.4.231@tcp (stopping) 20:20:16:Lustre: Skipped 38 previous similar messages 20:20:16:Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 5. Is it stuck? 20:20:16:Lustre: lustre-OST0000: Not available for connect from 10.1.4.231@tcp (stopping) 20:20:16:Lustre: Skipped 77 previous similar messages 20:20:16:INFO: task umount:32610 blocked for more than 120 seconds. 20:20:16: Tainted: P W --------------- 2.6.32-431.17.1.el6_lustre.g357ff2e.x86_64 #1 20:20:16:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 20:20:16:umount D 0000000000000001 0 32610 32609 0x00000080 20:20:16: ffff88001f005aa8 0000000000000082 ffff88001f005a08 ffff880074636800 20:20:16: ffffffffa078dbf7 0000000000000000 ffff88001bc2cfd4 ffffffffa078dbf7 20:20:16: ffff880021917ab8 ffff88001f005fd8 000000000000fbc8 ffff880021917ab8 20:20:16:Call Trace: 20:20:16: [<ffffffff81528e82>] schedule_timeout+0x192/0x2e0 20:20:16: [<ffffffff81083e90>] ? process_timeout+0x0/0x10 20:20:16: [<ffffffffa0710a5b>] obd_exports_barrier+0xab/0x180 [obdclass] 20:20:16: [<ffffffffa0f5e46f>] ofd_device_fini+0x5f/0x260 [ofd] 20:20:16: [<ffffffffa0737443>] class_cleanup+0x573/0xd30 [obdclass] 20:20:16: [<ffffffffa0712836>] ? class_name2dev+0x56/0xe0 [obdclass] 20:20:16: [<ffffffffa073916a>] class_process_config+0x156a/0x1ad0 [obdclass] 20:20:16: [<ffffffffa073150b>] ? lustre_cfg_new+0x2cb/0x680 [obdclass] 20:20:16: [<ffffffffa0739849>] class_manual_cleanup+0x179/0x6f0 [obdclass] 20:20:16: [<ffffffffa0712836>] ? class_name2dev+0x56/0xe0 [obdclass] 20:20:16: [<ffffffffa0777039>] server_put_super+0x8f9/0xe50 [obdclass] 20:20:16: [<ffffffff8118af0b>] generic_shutdown_super+0x5b/0xe0 20:20:16: [<ffffffff8118aff6>] kill_anon_super+0x16/0x60 20:20:16: [<ffffffffa073b726>] lustre_kill_super+0x36/0x60 [obdclass] 20:20:16: [<ffffffff8118b797>] deactivate_super+0x57/0x80 20:20:16: [<ffffffff811aa79f>] mntput_no_expire+0xbf/0x110 20:20:16: [<ffffffff811ab2eb>] sys_umount+0x7b/0x3a0 20:20:16: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b |
| Comments |
| Comment by James Nunez (Inactive) [ 02/Mar/16 ] |
|
There's another possible instance of this error seen on replay-single test_120. The MDS is the server that cannot unmount. Logs are at https://testing.hpdd.intel.com/test_sets/55181848-dfc5-11e5-9400-5254006e85c2 |
| Comment by Andreas Dilger [ 29/Jul/16 ] |
|
Given the lack of activity on this ticket, and the fact there is a fix for a similar problem on |