Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.7.0
-
centos 6 + Lustre head of tree (2.7+)
-
3
-
9223372036854775807
Description
I can consistently crash the lustre client with the reproducer attached.
Info from the logs:
<0>LustreError: 26474:0:(osc_cache.c:519:osc_extent_merge()) ASSERTION( cur->oe_dlmlock == victim->oe_dlmlock ) failed: <0>LustreError: 26474:0:(osc_cache.c:519:osc_extent_merge()) LBUG
Stack trace from crash:
crash> bt PID: 26474 TASK: ffff88003747caa0 CPU: 3 COMMAND: "llsendfile3" #0 [ffff88001a2835f0] machine_kexec at ffffffff81038f3b #1 [ffff88001a283650] crash_kexec at ffffffff810c5b62 #2 [ffff88001a283720] panic at ffffffff815285a3 #3 [ffff88001a2837a0] lbug_with_loc at ffffffffa0ac8eeb [libcfs] #4 [ffff88001a2837c0] osc_extent_merge at ffffffffa06ce57d [osc] #5 [ffff88001a2838d0] osc_extent_release at ffffffffa06d3efb [osc] #6 [ffff88001a283900] osc_io_end at ffffffffa06c520f [osc] #7 [ffff88001a283920] cl_io_end at ffffffffa0dfc270 [obdclass] #8 [ffff88001a283950] lov_io_end_wrapper at ffffffffa070f3b1 [lov] #9 [ffff88001a283970] lov_io_call at ffffffffa070f0fe [lov] #10 [ffff88001a2839a0] lov_io_end at ffffffffa0710fbc [lov] #11 [ffff88001a2839c0] cl_io_end at ffffffffa0dfc270 [obdclass] #12 [ffff88001a2839f0] cl_io_loop at ffffffffa0e00b52 [obdclass] #13 [ffff88001a283a20] ll_file_io_generic at ffffffffa125e20c [lustre] #14 [ffff88001a283b40] ll_file_aio_write at ffffffffa125e933 [lustre] #15 [ffff88001a283ba0] ll_file_write at ffffffffa125edd9 [lustre] #16 [ffff88001a283c10] vfs_write at ffffffff81188df8 #17 [ffff88001a283c50] kernel_write at ffffffff811b8ded #18 [ffff88001a283c80] write_pipe_buf at ffffffff811b8e5a #19 [ffff88001a283cc0] splice_from_pipe_feed at ffffffff811b7a92 #20 [ffff88001a283d10] __splice_from_pipe at ffffffff811b84ee #21 [ffff88001a283d50] splice_from_pipe at ffffffff811b8551 #22 [ffff88001a283da0] default_file_splice_write at ffffffff811b858d #23 [ffff88001a283dc0] do_splice_from at ffffffff811b862e #24 [ffff88001a283e00] direct_splice_actor at ffffffff811b8680 #25 [ffff88001a283e10] splice_direct_to_actor at ffffffff811b8956 #26 [ffff88001a283e80] do_splice_direct at ffffffff811b8a9d #27 [ffff88001a283ed0] do_sendfile at ffffffff811891fc #28 [ffff88001a283f30] sys_sendfile64 at ffffffff81189294 #29 [ffff88001a283f80] system_call_fastpath at ffffffff8100b072 RIP: 0000003a522df7da RSP: 00007fffe6f8add8 RFLAGS: 00010206 RAX: 0000000000000028 RBX: ffffffff8100b072 RCX: 0000000000a00000 RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000004 RBP: 0000000000000004 R8: 0000003a5258f300 R9: 0000003a51a0e9f0 R10: 0000000000a00000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007fffe6f8aed0 R14: 0000000000401b90 R15: 0000000000000003 ORIG_RAX: 0000000000000028 CS: 0033 SS: 002b
This is related to the group lock on the target file. If the group lock is commented out, then no crash happens.
Client dk logs with special debug added. (The debug is part of my strided lock patches and so includes a lot of extra info in other places as well.)