[LU-6368] ASSERTION( cur->oe_dlmlock == victim->oe_dlmlock ) failed Created: 16/Mar/15 Updated: 19/Mar/19 Resolved: 09/May/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Frank Zago (Inactive) | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Environment: |
centos 6 + Lustre head of tree (2.7+) |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
I can consistently crash the lustre client with the reproducer attached. Info from the logs: <0>LustreError: 26474:0:(osc_cache.c:519:osc_extent_merge()) ASSERTION( cur->oe_dlmlock == victim->oe_dlmlock ) failed: <0>LustreError: 26474:0:(osc_cache.c:519:osc_extent_merge()) LBUG Stack trace from crash: crash> bt
PID: 26474 TASK: ffff88003747caa0 CPU: 3 COMMAND: "llsendfile3"
#0 [ffff88001a2835f0] machine_kexec at ffffffff81038f3b
#1 [ffff88001a283650] crash_kexec at ffffffff810c5b62
#2 [ffff88001a283720] panic at ffffffff815285a3
#3 [ffff88001a2837a0] lbug_with_loc at ffffffffa0ac8eeb [libcfs]
#4 [ffff88001a2837c0] osc_extent_merge at ffffffffa06ce57d [osc]
#5 [ffff88001a2838d0] osc_extent_release at ffffffffa06d3efb [osc]
#6 [ffff88001a283900] osc_io_end at ffffffffa06c520f [osc]
#7 [ffff88001a283920] cl_io_end at ffffffffa0dfc270 [obdclass]
#8 [ffff88001a283950] lov_io_end_wrapper at ffffffffa070f3b1 [lov]
#9 [ffff88001a283970] lov_io_call at ffffffffa070f0fe [lov]
#10 [ffff88001a2839a0] lov_io_end at ffffffffa0710fbc [lov]
#11 [ffff88001a2839c0] cl_io_end at ffffffffa0dfc270 [obdclass]
#12 [ffff88001a2839f0] cl_io_loop at ffffffffa0e00b52 [obdclass]
#13 [ffff88001a283a20] ll_file_io_generic at ffffffffa125e20c [lustre]
#14 [ffff88001a283b40] ll_file_aio_write at ffffffffa125e933 [lustre]
#15 [ffff88001a283ba0] ll_file_write at ffffffffa125edd9 [lustre]
#16 [ffff88001a283c10] vfs_write at ffffffff81188df8
#17 [ffff88001a283c50] kernel_write at ffffffff811b8ded
#18 [ffff88001a283c80] write_pipe_buf at ffffffff811b8e5a
#19 [ffff88001a283cc0] splice_from_pipe_feed at ffffffff811b7a92
#20 [ffff88001a283d10] __splice_from_pipe at ffffffff811b84ee
#21 [ffff88001a283d50] splice_from_pipe at ffffffff811b8551
#22 [ffff88001a283da0] default_file_splice_write at ffffffff811b858d
#23 [ffff88001a283dc0] do_splice_from at ffffffff811b862e
#24 [ffff88001a283e00] direct_splice_actor at ffffffff811b8680
#25 [ffff88001a283e10] splice_direct_to_actor at ffffffff811b8956
#26 [ffff88001a283e80] do_splice_direct at ffffffff811b8a9d
#27 [ffff88001a283ed0] do_sendfile at ffffffff811891fc
#28 [ffff88001a283f30] sys_sendfile64 at ffffffff81189294
#29 [ffff88001a283f80] system_call_fastpath at ffffffff8100b072
RIP: 0000003a522df7da RSP: 00007fffe6f8add8 RFLAGS: 00010206
RAX: 0000000000000028 RBX: ffffffff8100b072 RCX: 0000000000a00000
RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000004
RBP: 0000000000000004 R8: 0000003a5258f300 R9: 0000003a51a0e9f0
R10: 0000000000a00000 R11: 0000000000000206 R12: 0000000000000000
R13: 00007fffe6f8aed0 R14: 0000000000401b90 R15: 0000000000000003
ORIG_RAX: 0000000000000028 CS: 0033 SS: 002b
This is related to the group lock on the target file. If the group lock is commented out, then no crash happens. |
| Comments |
| Comment by Patrick Farrell (Inactive) [ 16/Mar/15 ] |
|
A few notes. File must be > 1 page in size, so > 4k. (Definitely happens with a 100KiB file. Haven't tested intervening sizes.) The crash happens during the write of the second page (It seems with sendfile that ll_file_write is called for each page.) I'll get some dk logs attached shortly, but here's my initial analysis. The core problem is that two separate group locks are issued And without the first lock ever being released, it gets another group lock This happens in the sendfile code after it fails to match the first group lock. I've got an instrumented version of search_queue that shows the reason for the failed match: /* llite sometimes wants to match locks that will be
* canceled when their users drop, but we allow it to match
* if it passes in CBPENDING and the lock still has users.
* this is generally only going to be used by children
* whose parents already hold a lock so forward progress
* can still happen. */
if (ldlm_is_cbpending(lock) &&
!(flags & LDLM_FL_CBPENDING)) {
reason = 3;
continue;
}
So trying to match a requested group lock against the existing one (s: centss01-OST0000-osc-ffff880139645000 lock: ffff88011286ad80/0x4166f63c6cfea1fd lrc: 4/0,1 mode: GROUP/GROUP res: [0xf2:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615] stride: 0 (req 0->18446744073709551615) flags: 0x20400020000nid: local remote: 0x82dec12a52ea2ff2 expref: -99 pid: 6805timeout: 0 lvb_type: 1), it fails with reason 3, as noted above. So at a first remove, the cause of this problem is two group locks being issued which cover the same extent. I'll give my further thoughts on this in another comment, and I'll also attach these dk logs. (I can make the dump available as well.) |
| Comment by Patrick Farrell (Inactive) [ 16/Mar/15 ] |
|
So: The first thought I have is that for a group lock, it should still be matchable if callback is pending (on the existing lock), since group locks cannot be called back. But that makes me wonder: Is it correct to allow the callback pending flag to be set on a group lock? That seems like it might be wrong as well, for the same reason as above: Group locks cannot be called back, so callback pending has no meaning. Finally, and separately, I have not yet looked in to this yet, but it seems clearly wrong that the server should issue two group locks on the same resource to the same client. Not sure how best to handle that, but it definitely shouldn't grant such a lock, unless I've really missed something. (I am not quite sure how/why the OSC layer decides to request a group lock for the second lock, rather than a normal PW lock.) |
| Comment by Patrick Farrell (Inactive) [ 16/Mar/15 ] |
|
For reference, here is dump_namespaces from the server shortly after a client crashed with this test (this is not the same run as the rest of the DK logs): |
| Comment by Gerrit Updater [ 16/Mar/15 ] |
|
Patrick Farrell (paf@cray.com) uploaded a new patch: http://review.whamcloud.com/14093 |
| Comment by Patrick Farrell (Inactive) [ 16/Mar/15 ] |
|
The patch at http://review.whamcloud.com/14093 fixes the LBUG described here by attacking my "first thought" above: It causes search_queue to ignore the CBPDENDING flag on existing group locks, and match to those locks anyway. I'm still concerned about the other two thoughts - Should the CBPENDING flag be set? Does it have any meaning for group locks? If not, should we be refusing to set it? And what about the fact this lock was granted? It's a robustness thing rather than a fix for this bug, but it seems like we should have some sanity check that prevents granting this lock. I'll take a stab at writing one in search_queue, but I'm not sure I like what I'm thinking of. |
| Comment by Patrick Farrell (Inactive) [ 16/Mar/15 ] |
|
Client dk logs with special debug added. (The debug is part of my strided lock patches and so includes a lot of extra info in other places as well.) |
| Comment by Jinshan Xiong (Inactive) [ 17/Mar/15 ] |
|
I agree that it's wrong to give up group lock therefore group lock should have never had CBPENDING bit set. |
| Comment by Patrick Farrell (Inactive) [ 17/Mar/15 ] |
|
Jinshan - That flag is being set by ldlm_lock_decref_and_cancel, which is called specifically from osc_cancel_base just for group locks, and the only difference from regular ldlm_lock_decref is that cbpending is set on the lock in decref_and_cancel. Here's the comment on it:
LASSERT(lock != NULL); LDLM_DEBUG(lock, "ldlm_lock_decref(%s)", ldlm_lockname[mode]); So it's definitely intentional that CBPENDING be set on group locks. I don't fully understand the life cycle for LDLM and OSC locks, but here's what I see... The group lock request (from the ioctl) gets a writer reference on the lock, and this reference is not released until the lock is cancelled by the usual manual cancellation request. So I've decided to try removing the usage of CBPENDING for group locks and instead make ldlm_lock_decref_internal group lock aware. It will cancel group locks once they have zero reader & writer references, whether or not they have CBPENDING set. I'll put that patch up in gerrit in a moment. (Haven't tested it yet, but will do.) |
| Comment by Gerrit Updater [ 28/Apr/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14093/ |
| Comment by Peter Jones [ 09/May/15 ] |
|
Landed for 2.8 |