Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.10.0, Lustre 2.11.0
-
None
-
3
-
9223372036854775807
Description
The primary MDS and then its backup panic when replaying layout change requests on a PFL file.
[ 842.061520] LustreError: 22966:0:(lod_lov.c:633:lod_free_comp_buffer()) ASSERTION( entry->llc_stripe == ((void *)0) ) failed: [ 842.075148] LustreError: 22966:0:(lod_lov.c:633:lod_free_comp_buffer()) LBUG [ 842.083308] Pid: 22966, comm: mdt03_001 crash_x86_64> bt PID: 22966 TASK: ffff881f0cbfdee0 CPU: 18 COMMAND: "mdt03_001" #0 [ffff881f0c4f7680] machine_kexec at ffffffff8105d77b #1 [ffff881f0c4f76e0] __crash_kexec at ffffffff81108742 #2 [ffff881f0c4f77b0] panic at ffffffff816a863f #3 [ffff881f0c4f7830] lbug_with_loc at ffffffffc0b65854 [libcfs] #4 [ffff881f0c4f7850] lod_free_comp_buffer at ffffffffc151b62f [lod] #5 [ffff881f0c4f7898] lod_free_comp_entries at ffffffffc151ff57 [lod] #6 [ffff881f0c4f78b0] lod_parse_striping at ffffffffc1522140 [lod] #7 [ffff881f0c4f7920] lod_load_striping_locked at ffffffffc152262a [lod] #8 [ffff881f0c4f7958] lod_declare_layout_change at ffffffffc153d0bf [lod] #9 [ffff881f0c4f79c0] mdd_declare_layout_change at ffffffffc13e9102 [mdd] #10 [ffff881f0c4f79e8] mdd_layout_change at ffffffffc13f110d [mdd] #11 [ffff881f0c4f7a40] mdt_layout_change at ffffffffc145aaf8 [mdt] #12 [ffff881f0c4f7a88] mdt_intent_layout at ffffffffc146220a [mdt] #13 [ffff881f0c4f7b40] mdt_intent_opc at ffffffffc1458315 [mdt] #14 [ffff881f0c4f7ba0] mdt_intent_policy at ffffffffc145ff63 [mdt] #15 [ffff881f0c4f7be0] ldlm_lock_enqueue at ffffffffc0e92f0e [ptlrpc] #16 [ffff881f0c4f7c48] ldlm_handle_enqueue0 at ffffffffc0ebb753 [ptlrpc] #17 [ffff881f0c4f7cd8] tgt_enqueue at ffffffffc0f41b32 [ptlrpc] #18 [ffff881f0c4f7cf8] tgt_request_handle at ffffffffc0f454da [ptlrpc] #19 [ffff881f0c4f7d40] ptlrpc_server_handle_request at ffffffffc0eed733 [ptlrpc] #20 [ffff881f0c4f7de0] ptlrpc_main at ffffffffc0ef0ee2 [ptlrpc] #21 [ffff881f0c4f7ec8] kthread at ffffffff810b4031 #22 [ffff881f0c4f7f50] ret_from_fork at ffffffff816c155d
During recovery, the MDS can not handle the fact that the file is already striped. The client is replaying a layout operation which completed. The server tries to free the component buffers but asserts when it finds striping info defined that it doesn't expect.
We have a dump of the LBUG with -1 debug logs. The scenario in the dump is the replay of 2 layout change rpcs on a PFL file with 4 components:
Process A - pid 22961 00000020:00000040:4.0:1531845858.765709:000000:22961:0:(tgt_handler.c:578:tgt_handle_recovery()) @@@ Got new replay req@ffff881f1041ce00 x1605550737582320/t0(1065152839981) o101->5fa746bc-5d86-4a17-630c-889714734378@175@gni6:148/0 lens 848/0 e 0 to 0 dl 1531845938 ref 1 fl Interpret:/4/ffffffff rc 0/-1 00000004:00000040:4.0:1531845858.765745:000000:22961:0:(mdt_handler.c:3807:mdt_intent_layout()) [0x340077935:0x99b3:0x0]got layout change request from client: opc:2 flags:0x0 extent [0x1000000, 0x1100000) Process B - pid 00000020:00000040:18.0:1531845858.766672:000000:22966:0:(tgt_handler.c:578:tgt_handle_recovery()) @@@ Got new replay req@ffff881e83a8bf00 x1605550737582560/t0(1065152849881) o101->5fa746bc-5d86-4a17-630c-889714734378@175@gni6:148/0 lens 920/0 e 0 to 0 dl 1531845938 ref 1 fl Interpret:/4/ffffffff rc 0/-1 00000004:00000040:18.0:1531845858.766710:000000:22966:0:(mdt_handler.c:3807:mdt_intent_layout()) [0x340077935:0x99b3:0x0]got layout change request from client: opc:2 flags:0x0 extent [0x2000000, 0x2100000)
Process A runs first. It:
- finds object O for [0x340077935:0x99b3:0x0] in the lu_site object cache,
- follows the normal layout request path and loads component definitions from disk into O
- detects that the request is for replay: deletes all components and loads component definitions from the rpc buffers into O. Only the first 2 of the 4 components are flagged as initialized.
- defines striping info for those two components
Process A leaves the object in cache. Some components have striping info, some don't. The O.ldo_comp_cached flag is NOT set.
Process B then runs. It tries to follow the same path as Process A. It:
- finds O in the object cache
- tries to remove the components of O before loading them from disk
- finds the llc_stripe info set on the first component and LBUGs
If O.lldo_comp_cached flag was set, then Process B would not try to load the components from disk. Instead it would remove the components defined for O and replace them with the layout from the rpc buffer, which would be correct.
The following notes outline the function flow involved in the bug. Process A and Process B both call lod_declare_layout_change(); the calls are sequential.
lod_declare_layout_change() - layout from rpc lod_load_striping_locked() - returns if obj striping is cached - creates obj.comps; lbuf from lu_env lod_parse_striping() - allocs stripes according to obj <--- Process B LBUGs sets obj.ldo_comp_cached layout_lock_data_init() lod_declare_update_plain() - comp info from mlc_buf(rpc) if replay lod_object_free_striping() - removes stripes from obj unsets obj.ldo_comp_cached lod_use_defined_striping() - obj, mlc_buf(rpc), - builds comps,stripes from rpc lod_alloc_comp_entries() lod_get_lov_ea() - gets ea from disk puts in lov_env_info loop over obj components if obj.e_start > rpc.e_end break from loop if replay skip comps already initialized on disk add comp to reconstruct list if no comps added to reconstruct list return EALREADY <--- Process A exits through this path lod_obj_inc_layout_gen() lod_declare_instantiate_components()
The replay logic in lod_declare_update plain leaves the object in an unexpected state: components have llc_stripe set but ldo_comp_cached is false. The simplest fix is just to add "obj.ldo_comp_cached = 1" following the call to lod_use_defined_striping().