[LU-7022] recovery-small test_100: hung on umount Created: 19/Aug/15 Updated: 16/Nov/17 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for Bob Glossman <bob.glossman@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/cbf89e7c-4675-11e5-bedf-5254006e85c2. The sub-test test_100 failed with the following error: test failed to respond and timed out syslog from ost shows: obd refcount = 4. Is it stuck? Aug 19 01:10:10 onyx-30vm4 kernel: Lustre: lustre-OST0000: Not available for connect from 10.2.4.97@tcp (stopping) Aug 19 01:10:10 onyx-30vm4 kernel: Lustre: Skipped 77 previous similar messages Aug 19 01:13:53 onyx-30vm4 kernel: INFO: task umount:9683 blocked for more than 120 seconds. Aug 19 01:13:53 onyx-30vm4 kernel: Tainted: P --------------- 2.6.32-504.30.3.el6_lustre.g107be2b.x86_64 #1 Aug 19 01:13:53 onyx-30vm4 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 19 01:13:53 onyx-30vm4 kernel: umount D 0000000000000001 0 9683 9682 0x00000080 Aug 19 01:13:53 onyx-30vm4 kernel: ffff880043057a78 0000000000000082 0000000000000000 ffff880043057a18 Aug 19 01:13:53 onyx-30vm4 kernel: ffff8800430579d8 ffffffffa21c8983 0000100ed9610762 0000000000000000 Aug 19 01:13:53 onyx-30vm4 kernel: ffff8800657dd044 000000010108d4b9 ffff88006308e5f8 ffff880043057fd8 Aug 19 01:13:53 onyx-30vm4 kernel: Call Trace: Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffff8152b222>] schedule_timeout+0x192/0x2e0 Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffff81087540>] ? process_timeout+0x0/0x10 Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffffa2157c66>] obd_exports_barrier+0xb6/0x190 [obdclass] Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffffa0a4556f>] ofd_device_fini+0x5f/0x250 [ofd] Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffffa21747b2>] class_cleanup+0x572/0xd30 [obdclass] Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffffa2154726>] ? class_name2dev+0x56/0xe0 [obdclass] Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffffa2176e06>] class_process_config+0x1e96/0x2800 [obdclass] Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffffa1e11c01>] ? libcfs_debug_msg+0x41/0x50 [libcfs] Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffffa2177c2f>] class_manual_cleanup+0x4bf/0x8e0 [obdclass] Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffffa2154726>] ? class_name2dev+0x56/0xe0 [obdclass] Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffffa21b10b2>] server_put_super+0x9e2/0xeb0 [obdclass] Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffff811ac776>] ? invalidate_inodes+0xf6/0x190 Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffff81190b7b>] generic_shutdown_super+0x5b/0xe0 Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffff81190c66>] kill_anon_super+0x16/0x60 Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffffa217aae6>] lustre_kill_super+0x36/0x60 [obdclass] Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffff81191407>] deactivate_super+0x57/0x80 Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffff811b10df>] mntput_no_expire+0xbf/0x110 Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffff811b1c2b>] sys_umount+0x7b/0x3a0 Aug 19 01:13:53 onyx-30vm4 kernel: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b Aug 19 01:14:25 onyx-30vm4 kernel: Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 256 seconds. The obd refcount = 4. Is it stuck? Info required for matching: recovery-small 100 |
| Comments |
| Comment by Bob Glossman (Inactive) [ 17/Oct/17 ] |
|
another on master: from OST console log: [18891.658372] Lustre: lustre-OST0001: Export ffff880067fdd800 already connecting from 10.2.8.140@tcp [18891.662227] Lustre: Skipped 51 previous similar messages [18960.200464] INFO: task umount:12854 blocked for more than 120 seconds. [18960.201372] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18960.202152] umount D ffffffffc0ede848 0 12854 12853 0x00000080 [18960.202932] ffff88005e097b20 0000000000000086 ffff880079aacf10 ffff88005e097fd8 [18960.203750] ffff88005e097fd8 ffff88005e097fd8 ffff880079aacf10 ffffffffc0ede840 [18960.204544] ffffffffc0ede844 ffff880079aacf10 00000000ffffffff ffffffffc0ede848 [18960.205363] Call Trace: [18960.205887] [<ffffffff816aa3e9>] schedule_preempt_disabled+0x29/0x70 [18960.206664] [<ffffffff816a8317>] __mutex_lock_slowpath+0xc7/0x1d0 [18960.207297] [<ffffffff816a772f>] mutex_lock+0x1f/0x2f [18960.208775] [<ffffffffc0e4a04d>] nm_config_file_deregister_tgt+0x3d/0x1f0 [ptlrpc] [18960.209658] [<ffffffffc10fc66e>] ofd_device_fini+0xce/0x2d0 [ofd] [18960.210740] [<ffffffffc0b7f4dc>] class_cleanup+0x86c/0xc40 [obdclass] [18960.211411] [<ffffffffc0b818b6>] class_process_config+0x1996/0x23e0 [obdclass] [18960.212279] [<ffffffffc05ddba7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [18960.212951] [<ffffffffc0b824c6>] class_manual_cleanup+0x1c6/0x710 [obdclass] [18960.213746] [<ffffffffc0bb203e>] server_put_super+0x8de/0xcd0 [obdclass] [18960.214470] [<ffffffff81203722>] generic_shutdown_super+0x72/0x100 [18960.215086] [<ffffffff81203af2>] kill_anon_super+0x12/0x20 [18960.215648] [<ffffffffc0b84dc2>] lustre_kill_super+0x32/0x50 [obdclass] [18960.216305] [<ffffffff81203ea9>] deactivate_locked_super+0x49/0x60 [18960.216910] [<ffffffff81204616>] deactivate_super+0x46/0x60 [18960.217475] [<ffffffff8122185f>] cleanup_mnt+0x3f/0x80 [18960.217991] [<ffffffff812218f2>] __cleanup_mnt+0x12/0x20 [18960.218582] [<ffffffff810ad265>] task_work_run+0xc5/0xf0 [18960.219146] [<ffffffff8102ab62>] do_notify_resume+0x92/0xb0 [18960.219722] [<ffffffff816b527d>] int_signal+0x12/0x17 [18994.563233] Lustre: lustre-OST0000: Not available for connect from 10.2.8.134@tcp (stopping) [18994.564200] Lustre: Skipped 77 previous similar messages |
| Comment by Jinshan Xiong (Inactive) [ 16/Nov/17 ] |
|
happened again at: https://testing.hpdd.intel.com/sub_tests/77f3d8ac-cad3-11e7-8027-52540065bddc |