[LU-6705] MDT hung at umount under DNE mode Created: 10/Jun/15 Updated: 14/Jul/15 Resolved: 14/Jul/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | nasf (Inactive) | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
It can be reproduced via the following steps: Then hung at the last umount. |
| Comments |
| Comment by Di Wang [ 10/Jun/15 ] |
|
Just discussed with Fan Yong, it seems umount thread is blocked because it can not stop the recovery update thread. And the recovery update thread is trying to retrieve the update records from the remote MDT. and that MDT has been shutdown. update recovery thread [<ffffffffa0526c18>] ? ptlrpc_set_wait+0x188/0x900 [ptlrpc] [<ffffffffa051c5b0>] ? ptlrpc_interrupted_set+0x0/0x110 [ptlrpc] [<ffffffffa051da74>] ? ptlrpc_request_pack+0x24/0x70 [ptlrpc] [<ffffffffa0527411>] ? ptlrpc_queue_wait+0x81/0x220 [ptlrpc] [<ffffffffa073655b>] ? fld_client_rpc+0x15b/0x510 [fld] [<ffffffffa073c42e>] ? fld_server_lookup+0x14e/0x330 [fld] [<ffffffffa0c804ff>] ? lod_fld_lookup+0x34f/0x520 [lod] [<ffffffff8116fef2>] ? kmem_cache_alloc+0x182/0x190 [<ffffffffa0c958e3>] ? lod_object_init+0x103/0x3c0 [lod] [<ffffffffa1294298>] ? lu_object_alloc+0xd8/0x320 [obdclass] [<ffffffffa12957a1>] ? lu_object_find_try+0x151/0x260 [obdclass] [<ffffffffa1295961>] ? lu_object_find_at+0xb1/0xe0 [obdclass] [<ffffffffa0d06fc2>] ? dt_update_request_destroy+0x1c2/0x270 [osp] [<ffffffffa1296ebc>] ? dt_locate_at+0x1c/0xa0 [obdclass] [<ffffffffa125346f>] ? llog_osd_open+0xdf/0xde0 [obdclass] [<ffffffffa124a375>] ? llog_open+0x145/0x470 [obdclass] [<ffffffffa0caa82e>] ? lod_sub_prep_llog+0x19e/0x7a0 [lod] [<ffffffffa0c8088e>] ? lod_sub_recovery_thread+0x1be/0x980 [lod] [<ffffffff81061c62>] ? default_wake_function+0x12/0x20 [<ffffffffa0c806d0>] ? lod_sub_recovery_thread+0x0/0x980 [lod] [<ffffffff8109ab46>] ? kthread+0x96/0xa0 [<ffffffff8100c20a>] ? child_rip+0xa/0x20 [<ffffffff8109aab0>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 So we need to stop the recovery thread during umount, static void mdt_fini()
{
.....
target_recovery_fini(obd);
ping_evictor_stop();
mdt_stack_pre_fini(env, m, md2lu_dev(m->mdt_child));
......
Simply, if we move mdt_stack_pre_fini before target_recovery_fini, the problem should go away, but that will cause other problem. |
| Comment by Gerrit Updater [ 11/Jun/15 ] |
|
wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15210 |
| Comment by Gerrit Updater [ 08/Jul/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15210/ |