Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.10.7
-
soak is running on b2_10-ib #98 with PFL enabled.
-
3
-
9223372036854775807
Description
Soak has been running on b2_10-ib #98 over the weekend with no crash but many application failures, looking into the log found errors like below
[ 267.880824] LustreError: 12558:0:(lod_dev.c:419:lod_sub_recovery_thread()) soaked-MDT0002-osp-MDT0001 getting update log failed: rc = -2 [ 268.388707] LustreError: 12556:0:(lod_dev.c:419:lod_sub_recovery_thread()) soaked-MDT0001-osd getting update log failed: rc = -108
then remove the update log on all MDS
pdsh -w soak-8 "mount -t ldiskfs /dev/disk/by-id/dm-name-360080e50002ff4f0000037dd59ba4f5f /mnt/soaked-mdt0" pdsh -w soak-9 "mount -t ldiskfs /dev/disk/by-id/dm-name-360080e50002ff4f0000037e159ba4ffd /mnt/soaked-mdt1" pdsh -w soak-10 "mount -t ldiskfs /dev/disk/by-id/dm-name-360080e50001fedb80000015752012949 /mnt/soaked-mdt2" pdsh -w soak-11 "mount -t ldiskfs /dev/disk/by-id/dm-name-360080e50001fedb80000015952012962 /mnt/soaked-mdt3" pdsh -g mds "df -h /mnt/soaked-mdt?" pdsh -g mds "rm -rf /mnt/soaked-mdt?/update_log*" pdsh -g mds "ls /mnt/soaked-mdt? | grep update_log" pdsh -g mds "umount /mnt/soaked-mdt?"
and remount, restart soak, 1 of the MDs crash right away
[15895.912762] sd 1:0:1:44: rdac: array soak-netapp5660-1, ctlr 1, MODE_SELECT completed [15896.567246] LustreError: 13958:0:(lod_qos.c:862:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed: [15896.582712] LustreError: 13958:0:(lod_qos.c:862:lod_comp_ost_in_use()) LBUG [15896.590512] Pid: 13958, comm: mdt00_005 3.10.0-957.el7_lustre.x86_64 #1 SMP Mon Jan 7 20:06:41 UTC 2019 [15896.601014] Call Trace: [15896.603759] [<ffffffffc0b5c7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [15896.611129] [<ffffffffc0b5c87c>] lbug_with_loc+0x4c/0xa0 [libcfs] [15896.618071] [<ffffffffc1416f96>] lod_env_info.part.10+0x0/0x36 [lod] [15896.625307] [<ffffffffc140c31f>] lod_alloc_rr.constprop.18+0xf2f/0x1000 [lod] [15896.633437] [<ffffffffc1410749>] lod_qos_prep_create+0x12b9/0x17f0 [lod] [15896.641097] [<ffffffffc14111bd>] lod_prepare_create+0x25d/0x360 [lod] [15896.648422] [<ffffffffc14059de>] lod_declare_striped_create+0x1ee/0x970 [lod] [15896.656559] [<ffffffffc1407e54>] lod_declare_create+0x1e4/0x540 [lod] [15896.663881] [<ffffffffc1473b22>] mdd_declare_create_object_internal+0xe2/0x2f0 [mdd] [15896.672667] [<ffffffffc14651a3>] mdd_declare_create+0x53/0xe30 [mdd] [15896.679936] [<ffffffffc1469059>] mdd_create+0x879/0x1400 [mdd] [15896.686573] [<ffffffffc1339df5>] mdt_reint_open+0x2175/0x3190 [mdt] [15896.693729] [<ffffffffc132ec73>] mdt_reint_rec+0x83/0x210 [mdt] [15896.700469] [<ffffffffc131018b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] [15896.707825] [<ffffffffc13106b2>] mdt_intent_reint+0x162/0x430 [mdt] [15896.714956] [<ffffffffc13134bb>] mdt_intent_opc+0x1eb/0xaf0 [mdt] [15896.721881] [<ffffffffc131bd28>] mdt_intent_policy+0x138/0x320 [mdt] [15896.729102] [<ffffffffc0e632cd>] ldlm_lock_enqueue+0x38d/0x980 [ptlrpc] [15896.736653] [<ffffffffc0e8cab3>] ldlm_handle_enqueue0+0x943/0x15b0 [ptlrpc] [15896.744589] [<ffffffffc0f12372>] tgt_enqueue+0x62/0x210 [ptlrpc] [15896.751463] [<ffffffffc0f162aa>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [15896.759197] [<ffffffffc0ebed5b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [15896.767814] [<ffffffffc0ec24a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [15896.774881] [<ffffffff876c1c31>] kthread+0xd1/0xe0 [15896.780351] [<ffffffff87d74c37>] ret_from_fork_nospec_end+0x0/0x39 [15896.787380] [<ffffffffffffffff>] 0xffffffffffffffff [15896.792969] Kernel panic - not syncing: LBUG [15896.797750] CPU: 16 PID: 13958 Comm: mdt00_005 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7_lustre.x86_64 #1 [15896.811058] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [15896.823578] Call Trace: [15896.826310] [<ffffffff87d61dc1>] dump_stack+0x19/0x1b [15896.832059] [<ffffffff87d5b4d0>] panic+0xe8/0x21f [15896.837412] [<ffffffffc0b5c8cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [15896.844308] [<ffffffffc1416f96>] lod_comp_ost_in_use.part.8+0x36/0x36 [lod] [15896.852185] [<ffffffffc140c31f>] lod_alloc_rr.constprop.18+0xf2f/0x1000 [lod] [15896.860254] [<ffffffffc1410749>] lod_qos_prep_create+0x12b9/0x17f0 [lod] [15896.867835] [<ffffffffc14111bd>] lod_prepare_create+0x25d/0x360 [lod] [15896.875125] [<ffffffffc14059de>] lod_declare_striped_create+0x1ee/0x970 [lod] [15896.883191] [<ffffffffc1407e54>] lod_declare_create+0x1e4/0x540 [lod] [15896.890480] [<ffffffffc1473b22>] mdd_declare_create_object_internal+0xe2/0x2f0 [mdd] [15896.899224] [<ffffffffc14651a3>] mdd_declare_create+0x53/0xe30 [mdd] [15896.906416] [<ffffffffc1469059>] mdd_create+0x879/0x1400 [mdd] [15896.913034] [<ffffffffc1339df5>] mdt_reint_open+0x2175/0x3190 [mdt] [15896.920159] [<ffffffffc0c9e4c1>] ? upcall_cache_get_entry+0x211/0x8f0 [obdclass] [15896.928517] [<ffffffffc131ece3>] ? ucred_set_jobid+0x53/0x70 [mdt] [15896.935517] [<ffffffffc132ec73>] mdt_reint_rec+0x83/0x210 [mdt] [15896.942227] [<ffffffffc131018b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] [15896.949517] [<ffffffffc13106b2>] mdt_intent_reint+0x162/0x430 [mdt] [15896.956613] [<ffffffffc13134bb>] mdt_intent_opc+0x1eb/0xaf0 [mdt] [15896.963542] [<ffffffffc0eb4c90>] ? lustre_swab_ldlm_policy_data+0x30/0x30 [ptlrpc] [15896.972095] [<ffffffffc131bd28>] mdt_intent_policy+0x138/0x320 [mdt] [15896.979306] [<ffffffffc0e632cd>] ldlm_lock_enqueue+0x38d/0x980 [ptlrpc] [15896.986810] [<ffffffffc0e8cab3>] ldlm_handle_enqueue0+0x943/0x15b0 [ptlrpc] [15896.994710] [<ffffffffc0eb4d10>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] [15897.003095] [<ffffffffc0f12372>] tgt_enqueue+0x62/0x210 [ptlrpc] [15897.009931] [<ffffffffc0f162aa>] tgt_request_handle+0x92a/0x1370 [ptlrpc] [15897.017635] [<ffffffffc0ebed5b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] [15897.026209] [<ffffffffc0ebb388>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [15897.033809] [<ffffffff876d67c2>] ? default_wake_function+0x12/0x20 [15897.040833] [<ffffffff876cba9b>] ? __wake_up_common+0x5b/0x90 [15897.047368] [<ffffffffc0ec24a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [15897.054393] [<ffffffffc0ec1a10>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc] [15897.062648] [<ffffffff876c1c31>] kthread+0xd1/0xe0 [15897.068097] [<ffffffff876c1b60>] ? insert_kthread_work+0x40/0x40 [15897.074896] [<ffffffff87d74c37>] ret_from_fork_nospec_begin+0x21/0x21 [15897.082179] [<ffffffff876c1b60>] ? insert_kthread_work+0x40/0x40 [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu
Attachments
Issue Links
- duplicates
-
LU-10429 soak, LBUG lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size ) failed:
- Resolved