[LU-2665] LBUG while unmounting client Created: 22/Jan/13 Updated: 22/Nov/13 Resolved: 18/Sep/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.1.3 |
| Fix Version/s: | Lustre 2.5.0, Lustre 2.4.2 |
| Type: | Bug | Priority: | Major |
| Reporter: | Sebastien Buisson (Inactive) | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mn1, ptr | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 6217 | ||||||||
| Description |
|
Hi, When trying to unmount a Lustre client, we got the following problem: Lustre: DEBUG MARKER: Wed Nov 21 06:25:01 2012 LustreError: 11559:0:(ldlm_lock.c:1697:ldlm_lock_cancel()) ### lock still has references ns: ptmp-MDT0000-mdc-ffff88030871bc00 lock: ffff88060dbd2d80/0x4618f3ec8d79d8be lrc: 4/0,1 mode: PW/PW res: 8590405073/266 rrc: 2 type: FLK pid: 4414 [0->551] flags: 0x22002890 remote: 0xc8980c051f8f6afd expref: -99 pid: 4414 timeout: 0 LustreError: 11559:0:(ldlm_lock.c:1698:ldlm_lock_cancel()) LBUG Pid: 11559, comm: umount Call Trace: [<ffffffffa040d7f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa040de07>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa063343d>] ldlm_lock_cancel+0x1ad/0x1b0 [ptlrpc] [<ffffffffa064d245>] ldlm_cli_cancel_local+0xb5/0x380 [ptlrpc] [<ffffffffa06510b8>] ldlm_cli_cancel+0x58/0x3b0 [ptlrpc] [<ffffffffa063ae18>] cleanup_resource+0x168/0x300 [ptlrpc] [<ffffffffa063afda>] ldlm_resource_clean+0x2a/0x50 [ptlrpc] [<ffffffffa041e28f>] cfs_hash_for_each_relax+0x17f/0x380 [libcfs] [<ffffffffa063afb0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc] [<ffffffffa063afb0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc] [<ffffffffa041fcaf>] cfs_hash_for_each_nolock+0x7f/0x1c0 [libcfs] [<ffffffffa0637a69>] ldlm_namespace_cleanup+0x29/0xb0 [ptlrpc] [<ffffffffa0638adb>] __ldlm_namespace_free+0x4b/0x540 [ptlrpc] [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc] [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc] [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc] [<ffffffffa041fcb7>] ? cfs_hash_for_each_nolock+0x87/0x1c0 [libcfs] [<ffffffffa063903f>] ldlm_namespace_free_prior+0x6f/0x230 [ptlrpc] [<ffffffffa063fc4c>] client_disconnect_export+0x23c/0x460 [ptlrpc] [<ffffffffa0b42a44>] lmv_disconnect+0x644/0xc70 [lmv] [<ffffffffa0a470bc>] client_common_put_super+0x46c/0xe80 [lustre] [<ffffffffa0a47ba0>] ll_put_super+0xd0/0x360 [lustre] [<ffffffff8117e01c>] ? dispose_list+0x11c/0x140 [<ffffffff8117e4a8>] ? invalidate_inodes+0x158/0x1a0 [<ffffffff8116542b>] generic_shutdown_super+0x5b/0x110 [<ffffffff81165546>] kill_anon_super+0x16/0x60 [<ffffffffa050897a>] lustre_kill_super+0x4a/0x60 [obdclass] [<ffffffff811664e0>] deactivate_super+0x70/0x90 [<ffffffff811826bf>] mntput_no_expire+0xbf/0x110 [<ffffffff81183188>] sys_umount+0x78/0x3c0 [<ffffffff810030f2>] system_call_fastpath+0x16/0x1b Kernel panic - not syncing: LBUG Pid: 11559, comm: umount Not tainted 2.6.32-220.23.1.bl6.Bull.28.8.x86_64 #1 Call Trace: [<ffffffff81484650>] ? panic+0x78/0x143 [<ffffffffa040de5b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [<ffffffffa063343d>] ? ldlm_lock_cancel+0x1ad/0x1b0 [ptlrpc] [<ffffffffa064d245>] ? ldlm_cli_cancel_local+0xb5/0x380 [ptlrpc] [<ffffffffa06510b8>] ? ldlm_cli_cancel+0x58/0x3b0 [ptlrpc] [<ffffffffa063ae18>] ? cleanup_resource+0x168/0x300 [ptlrpc] [<ffffffffa063afda>] ? ldlm_resource_clean+0x2a/0x50 [ptlrpc] [<ffffffffa041e28f>] ? cfs_hash_for_each_relax+0x17f/0x380 [libcfs] [<ffffffffa063afb0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc] [<ffffffffa063afb0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc] [<ffffffffa041fcaf>] ? cfs_hash_for_each_nolock+0x7f/0x1c0 [libcfs] [<ffffffffa0637a69>] ? ldlm_namespace_cleanup+0x29/0xb0 [ptlrpc] [<ffffffffa0638adb>] ? __ldlm_namespace_free+0x4b/0x540 [ptlrpc] [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc] [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc] [<ffffffffa06502d0>] ? ldlm_cli_hash_cancel_unused+0x0/0xa0 [ptlrpc] [<ffffffffa041fcb7>] ? cfs_hash_for_each_nolock+0x87/0x1c0 [libcfs] [<ffffffffa063903f>] ? ldlm_namespace_free_prior+0x6f/0x230 [ptlrpc] [<ffffffffa063fc4c>] ? client_disconnect_export+0x23c/0x460 [ptlrpc] [<ffffffffa0b42a44>] ? lmv_disconnect+0x644/0xc70 [lmv] [<ffffffffa0a470bc>] ? client_common_put_super+0x46c/0xe80 [lustre] [<ffffffffa0a47ba0>] ? ll_put_super+0xd0/0x360 [lustre] [<ffffffff8117e01c>] ? dispose_list+0x11c/0x140 [<ffffffff8117e4a8>] ? invalidate_inodes+0x158/0x1a0 [<ffffffff8116542b>] ? generic_shutdown_super+0x5b/0x110 [<ffffffff81165546>] ? kill_anon_super+0x16/0x60 [<ffffffffa050897a>] ? lustre_kill_super+0x4a/0x60 [obdclass] [<ffffffff811664e0>] ? deactivate_super+0x70/0x90 [<ffffffff811826bf>] ? mntput_no_expire+0xbf/0x110 [<ffffffff81183188>] ? sys_umount+0x78/0x3c0 [<ffffffff810030f2>] ? system_call_fastpath+0x16/0x1b This issue is exactly the same as the one described in Can you please advise? TIA, |
| Comments |
| Comment by Bruno Faccini (Inactive) [ 23/Jan/13 ] |
|
Hello Seb !! Thank's in advance. |
| Comment by Andreas Dilger [ 23/Jan/13 ] |
|
Looks like this is an flock lock, and there were several patches landed to master about fixing those. Maybe one needs to be back-ported? |
| Comment by Sebastien Buisson (Inactive) [ 24/Jan/13 ] |
|
Hi Bruno!! Antoine already sent us the content of the ldlm_lock structure: crash> p *(struct ldlm_lock *)0xffff88060dbd2d80
$3 = {
l_handle = {
h_link = {
next = 0xffffc90016ae91d8,
prev = 0xffffc90016ae91d8
},
h_cookie = 5051055179407415486,
h_addref = 0xffffffffa062f130 <lock_handle_addref>,
h_lock = {
raw_lock = {
slock = 196611
}
},
h_ptr = 0x0,
h_free_cb = 0,
h_rcu = {
next = 0x0,
func = 0
},
h_size = 0,
h_in = 1 '\001',
h_unused = "\000\000"
},
l_refc = {
counter = 4
},
l_lock = {
raw_lock = {
slock = 851980
}
},
l_resource = 0xffff8802f8a14300,
l_lru = {
next = 0xffff88060dbd2de0,
prev = 0xffff88060dbd2de0
},
l_res_link = {
next = 0xffff8802f8a14320,
prev = 0xffff8802f8a14320
},
l_tree_node = 0x0,
l_exp_hash = {
next = 0x0,
pprev = 0x0
},
l_req_mode = LCK_PW,
l_granted_mode = LCK_PW,
l_completion_ast = 0xffffffffa065ec20 <ldlm_flock_completion_ast>,
l_blocking_ast = 0,
l_glimpse_ast = 0,
l_weigh_ast = 0,
l_export = 0x0,
l_conn_export = 0xffff88030871b800,
l_remote_handle = {
cookie = 14454316220189469437
},
l_policy_data = {
l_extent = {
start = 0,
end = 551,
gid = 18446612146206140608
},
l_flock = {
start = 0,
end = 551,
owner = 18446612146206140608,
blocking_owner = 0,
blocking_export = 0x0,
pid = 4414
},
l_inodebits = {
bits = 0
}
},
l_flags = 570435728,
l_readers = 0,
l_writers = 1,
l_destroyed = 0 '\000',
l_ns_srv = 0 '\000',
l_waitq = {
lock = {
raw_lock = {
slock = 196611
}
},
task_list = {
next = 0xffff88060dbd2ea8,
prev = 0xffff88060dbd2ea8
prev = 0xffff88060dbd2ea8
}
},
l_last_activity = 1353415544,
l_last_used = 0,
l_req_extent = {
start = 0,
end = 0,
gid = 0
},
l_lvb_len = 0,
l_lvb_data = 0x0,
l_ast_data = 0xffff8805e2161c38,
l_client_cookie = 0,
l_pending_chain = {
next = 0xffff88060dbd2f00,
prev = 0xffff88060dbd2f00
},
l_callback_timeout = 0,
l_pid = 4414,
l_bl_ast = {
next = 0xffff88060dbd2f20,
prev = 0xffff88060dbd2f20
},
l_cp_ast = {
next = 0xffff88060dbd2f30,
prev = 0xffff88060dbd2f30
},
l_rk_ast = {
next = 0xffff88060dbd2f40,
prev = 0xffff88060dbd2f40
},
l_blocking_lock = 0x0,
l_bl_ast_run = 0,
l_sl_mode = {
next = 0xffff88060dbd2f60,
prev = 0xffff88060dbd2f60
},
l_sl_policy = {
next = 0xffff88060dbd2f70,
prev = 0xffff88060dbd2f70
},
l_reference = {<No data fields>},
l_exp_refs_nr = 0,
l_exp_refs_link = {
next = 0xffff88060dbd2f88,
prev = 0xffff88060dbd2f88
},
l_exp_refs_target = 0x0,
l_exp_list = {
next = 0xffff88060dbd2fa0,
prev = 0xffff88060dbd2fa0
}
}
I will ask him for the 'foreach bt' output. Cheers, |
| Comment by Bruno Faccini (Inactive) [ 24/Jan/13 ] |
|
Thank's Sebastien already. BTW, the fact a flock is involved seems to indicate to Andreas and others teammates that it could be a new situation than the one in Also, do we know what has been running on this particular node ? I mean is it a compute node or a service (login/batch/...) node ?? |
| Comment by Bruno Faccini (Inactive) [ 20/Feb/13 ] |
|
Sebastien, still no news about the "foreach bt" required infos ?? |
| Comment by Sebastien Buisson (Inactive) [ 21/Feb/13 ] |
|
Hi Bruno, Still no news, I will ping the Support team! Sebastien. |
| Comment by Sebastien Buisson (Inactive) [ 07/Mar/13 ] |
|
Hi, Here is the requested information. Inside the tarball, you have:
Sebastien. |
| Comment by Bruno Faccini (Inactive) [ 19/Mar/13 ] |
|
Sorry for the delay Seb. As the last infos I may need for this, can you or somebody from the site (I am sure Antoine kept this ...) double-check in node's syslog/dmesg if there was any previous messages with references to the lock/0xffff88060dbd2d80, the PID/4414 or the owner/task/0xffff88033c05e8c0 ?? |
| Comment by Bruno Faccini (Inactive) [ 25/Mar/13 ] |
|
Talking directly with Antoine, as I expected there may be earlier related msgs in the log he will provide/comment soon. Using these new infos we may go faster to the root cause now. Also he confirmed that there has been a few more occurrences since January with available crash-dumps. |
| Comment by Antoine Percher [ 28/Mar/13 ] |
|
Yes unfortunatly, i have lost the crash dump who produce the LBUG so LustreError: 28052:0:(file.c:158:ll_close_inode_openhandle()) inode 147840184874990214 mdc close failed: rc = -4 I can't link inode to the LBUG but that is the PID who have the lock issue LustreError: 12577:0:(ldlm_lock.c:1697:ldlm_lock_cancel()) ### lock still has references ns: ptmp2-MDT0000-mdc-ffff8804739f5c00 lock: ffff88024b7496c0/0x141b572a020ffcdd lrc: 4/0,1 mode: PW/PW res: 8811961703/31366 rrc: 2 type: FLK pid: 28052 [0->9223372036854775807] flags: 0x22002890 remote: 0xdf8e5c6f462dccb6 expref: -99 pid: 28052 timeout: 0 So maybe lustre miss to release the lock when there are an issue during I have attach the all trace in attachment ... and sorry for the delay |
| Comment by Sebastien Buisson (Inactive) [ 11/Apr/13 ] |
|
Hi Bruno, Now that Antoine provided you with additional information, are you able to investigate on this issue? Thanks, |
| Comment by Bruno Faccini (Inactive) [ 16/Apr/13 ] |
|
Yes and my current focus is to try determine how an FLock reference from a dead process can remain ... There are known quick and dirty fix like the very simple one from I will call and work from remote with Antoine in order to identify how we can fall in this situation. |
| Comment by Bruno Faccini (Inactive) [ 16/Apr/13 ] |
|
I setup what I think is a reproducer. But in case of evict event, ldlm_namespace_cleanup()/ldlm_resource_clean()/cleanup_resource() are called with LDLM_FL_LOCAL_ONLY unconditionally and then locks are treated as local which cause ldlm_cli_cancel()/ldlm_cli_cancel_local()/ldlm_lock_cancel() are not called to let the Server know of the cancel since it is assumed it did it on its own. This is also what happen in case of "umount -f", which could already be used as a work-around on-site !!... Or when mount localflock option (instead of flock) is used. |
| Comment by Bruno Faccini (Inactive) [ 17/Apr/13 ] |
|
From the crash-dumps available, all l_flags of involved ldlm_lock are set to 0x22002890 which means LDLM_FL_CBPENDING/LDLM_FL_CANCEL/LDLM_FL_FAILED/LDLM_FL_CANCELING/LDLM_FL_CLEANED/LDLM_FL_BL_DONE. All these flags should have been set as part of current cleanup_resource()/ldlm_cli_cancel[_local]() call sequence. The l_last_activity time-stamp is a long time ago but only a few minutes before the ll_close_inode_openhandle() error-message for pid owning lock, pointed by Antoine, when present. Working more with Antoine to understand if something could have been wrong on Server/MDS side, we found that at least for the last 2 occurrences of problem, MDS had crashed/rebooted at the same period than l_last_activity and ll_close_inode_openhandle() error-message. Thus, problem could only be the cause of wrong handling of MDS reboot condition during FLock unlock/clean-up. Will try to reproduce it now locally with a MDS re-boot and full traces enabled. |
| Comment by Bruno Faccini (Inactive) [ 25/Apr/13 ] |
|
Antoine just found a new occurrence that happen during March. But for this new case, MDS crash/reboot did not happen around l_last_activity timing, but much more later (in fact this Client was not unmounted/stopped before Servers during a full Cluster shutdown ...). And the concerned FLock was from an application that has been killed -9 ... Thus I requested that they check if the FLock clean-up really happen during Application exit/kill, by writing a small reproducer to grant a FLock and then sleep, to kill/signal it and with full Lustre debug traces enabled, see if the automatic Unlock occurs. |
| Comment by Bruno Faccini (Inactive) [ 02/May/13 ] |
|
Good news, I am able to get the crash with a reproducer that mimics what we identified to be the possible scenario in our last findings/comments : _ run a cmd/bin doing/granting flock and then going to sleep for ever. I modified flocks_test for this. _ force MDS to crash. _ wait a bit and then kill/^C to abort cmd. We get the "(file.c:158:ll_close_inode_openhandle()) inode <INO> mdc close failed: rc = -4" msg in Client log. _ reboot MDS, re-mount MDT. _ umount lustre-FS on Client. -->> LBUG! more to come. |
| Comment by Bruno Faccini (Inactive) [ 03/May/13 ] |
|
Humm, I took a full Lustre trace during a new reproducer run and here is what I discovered : _ when the application is killed/signaled and exits, the Kernel try to do some automatic cleanup/close on _ locks_remove_flock() to terminate any current FLock will then force-create a request for an UnLock on the whole _ unfortunately, if MDS is down at this time, the UnLock request will be failed and leave the current granted _ they will even be replayed when MDS/MDT is back. _ and on even much later umount, they can still be found with references and cause the LBUG. So, how can we fix this to handle the MDS in-between crash case ? Anyway, the fix has to be as generic as possible and it may not be obvious. |
| Comment by Bruno Faccini (Inactive) [ 14/May/13 ] |
|
Having a better/detailed look of the full lustre debug traces I may have found the reason why the UNLCK request is failed+thrown, it is because it is generated automatically by the Kernel during a fatal signal processing but then, if any problem occurs during communications with MDS, Lustre interrupts and trashes RPCs/requests when it detects a signal is pending/current in ptlrpc_set_wait()/ptlrpc_check_set() !!.... Problem is that this behavior is incompatible in case of an FLock/F_UNLCK because, in any case (MDS crash/reboot + replay, or temporary comms/LNet problem) since Server/MDS will never know that covered locks can be released and also we finally end-up with the LBUG during a later umount due to these orphan granted Lustre-FLocks left. Thus, we need to find a way (keep track or retry) of FLock requests (at least F_UNLCKs) in such cases. |
| Comment by Bruno Faccini (Inactive) [ 22/May/13 ] |
|
I have a patch that fixes LBUG vs reproducer (crash of MDS when FLock set, kill/exit in-between, re-start MDT, umount on Client) and also passes auto-tests. Since problem is also in master (I ran reproducer against latest master build, ie 2.4.50 !), I will submit the same change to master and see how it runs against auto-tests full set. |
| Comment by Bruno Faccini (Inactive) [ 23/May/13 ] |
|
Ok, patch for b2_1 is at http://review.whamcloud.com/6407. Master version is at http://review.whamcloud.com/6415. |
| Comment by Cory Spitz [ 03/Jul/13 ] |
|
What's the status of #6415? Can it land? |
| Comment by Bruno Faccini (Inactive) [ 15/Jul/13 ] |
|
Hello Cory, |
| Comment by Cory Spitz [ 15/Jul/13 ] |
|
Yes, we did. We've adopted change #6415 and it appears to be the solution to our problem. |
| Comment by Bruno Faccini (Inactive) [ 15/Jul/13 ] |
|
Thank's for your feedback !! Nothing can be better for a patch than real "life"/production exposure. |
| Comment by Peter Jones [ 17/Jul/13 ] |
|
Sebastien Any word on the b2_1 version of the patch? Does this address your original concern? Can we close the ticket? Peter |
| Comment by Sebastien Buisson (Inactive) [ 25/Jul/13 ] |
|
Hi Bruno, We have retrieved the patch for b2_1. It will be rolled out at CEA's next maintenance, planned for early September. Cheers, |
| Comment by Cory Spitz [ 28/Aug/13 ] |
|
Related to |
| Comment by Andriy Skulysh [ 28/Aug/13 ] |
|
I don't like an idea of resending flocks. There is no guarantee that request will succeed. |
| Comment by Bruno Faccini (Inactive) [ 29/Aug/13 ] |
|
Cory: Thanks to add a reference to Andriy: If you read carefully history/comments for this/ |
| Comment by Andriy Skulysh [ 29/Aug/13 ] |
|
But "retry" mechanism doesn't guarantee that lock will reach MDS. cleanup_resource() should deal with orphaned locks also. |
| Comment by Bruno Faccini (Inactive) [ 29/Aug/13 ] |
|
> But "retry" mechanism doesn't guarantee that lock will reach MDS > cleanup_resource() should deal with orphaned locks also |
| Comment by Andreas Dilger [ 17/Sep/13 ] |
|
Patches are landed to b2_1 and master for 2.5.0. Can this bug be closed? |
| Comment by Sebastien Buisson (Inactive) [ 18/Sep/13 ] |
|
Sure, this bug can be closed, as additional work was carried out by Bruno in Thanks, |
| Comment by Peter Jones [ 18/Sep/13 ] |
|
ok - thanks Sebastien! |
| Comment by Jian Yu [ 22/Nov/13 ] |
|
Patch http://review.whamcloud.com/6415 was cherry-picked to Lustre b2_4 branch. |