[LU-9113] insanity test_0 umount fails for /mnt/lustre-mds1, "Fail all nodes" test can't start Created: 14/Feb/17 Updated: 12/Apr/17 Resolved: 07/Apr/17 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Casper | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
onyx-30vm1-3/7/8, Full Group test, |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
https://testing.hpdd.intel.com/test_sets/3c80e50a-efe9-11e6-8c0d-5254006e85c2 Client tries multiple times (unsuccessfully) to unmount mds1, but eventually times out. From MDS console: 02:50:15:[ 4080.084137] INFO: task umount:19374 blocked for more than 120 seconds. 02:50:15:[ 4080.086174] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 02:50:15:[ 4080.088282] umount D ffff8800793b7fc0 0 19374 19373 0x00000080 02:50:15:[ 4080.090408] ffff880056d43bd0 0000000000000086 ffff8800422ebec0 ffff880056d43fd8 02:50:15:[ 4080.092523] ffff880056d43fd8 ffff880056d43fd8 ffff8800422ebec0 ffff8800793b7fb8 02:50:15:[ 4080.094612] ffff8800793b7fbc ffff8800422ebec0 00000000ffffffff ffff8800793b7fc0 02:50:15:[ 4080.096724] Call Trace: 02:50:15:[ 4080.098391] [<ffffffff8168cad9>] schedule_preempt_disabled+0x29/0x70 02:50:15:[ 4080.100447] [<ffffffff8168a735>] __mutex_lock_slowpath+0xc5/0x1c0 02:50:15:[ 4080.102429] [<ffffffff81689b9f>] mutex_lock+0x1f/0x2f 02:50:15:[ 4080.104322] [<ffffffffa0ce6a56>] mgc_process_config+0x7d6/0x1400 [mgc] 02:50:15:[ 4080.106336] [<ffffffff810bc064>] ? __wake_up+0x44/0x50 02:50:15:[ 4080.108272] [<ffffffffa0b37225>] obd_process_config.constprop.14+0x85/0x2d0 [obdclass] 02:50:15:[ 4080.110413] [<ffffffffa0b375f0>] ? lustre_cfg_new+0x180/0x400 [obdclass] 02:50:15:[ 4080.112481] [<ffffffffa0b39440>] lustre_end_log+0xf0/0x5c0 [obdclass] 02:50:15:[ 4080.114533] [<ffffffffa0b61d2e>] server_put_super+0x7de/0xcd0 [obdclass] 02:50:15:[ 4080.116595] [<ffffffff81200802>] generic_shutdown_super+0x72/0xf0 02:50:15:[ 4080.118594] [<ffffffff81200bd2>] kill_anon_super+0x12/0x20 02:50:15:[ 4080.120545] [<ffffffffa0b36db2>] lustre_kill_super+0x32/0x50 [obdclass] 02:50:15:[ 4080.122589] [<ffffffff81200f89>] deactivate_locked_super+0x49/0x60 02:50:15:[ 4080.124609] [<ffffffff81201586>] deactivate_super+0x46/0x60 02:50:15:[ 4080.126559] [<ffffffff8121e9c5>] mntput_no_expire+0xc5/0x120 02:50:15:[ 4080.128491] [<ffffffff8121fb00>] SyS_umount+0xa0/0x3b0 02:50:15:[ 4080.130375] [<ffffffff81696949>] system_call_fastpath+0x16/0x1b |
| Comments |
| Comment by James Casper [ 03/Apr/17 ] |
|
Just saw this with a patch test that was trying to run replay-dual five times in a row: https://testing.hpdd.intel.com/test_sessions/a32a6368-1702-4ce5-a99b-a7375a0aea8b Replay-dual passes consistently following lustre init (when 21b is excepted), but not when it follows a passing replay-dual test set. |
| Comment by James Casper [ 07/Apr/17 ] |
|
Looked at the traces below the one pasted above. They contain the following: mgs_ir_fini_fs+0x27e/0x2ec [mgs] Closing this ticket as a dupe of |
| Comment by James Casper [ 07/Apr/17 ] |
|
Dupe of |