Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11935

MDS hit LBUG: (lod_qos.c:862:lod_comp_ost_in_use()) LBUG

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.7
    • soak is running on b2_10-ib #98 with PFL enabled.
    • 3
    • 9223372036854775807

    Description

      Soak has been running on b2_10-ib #98 over the weekend with no crash but many application failures, looking into the log found errors like below

      [  267.880824] LustreError: 12558:0:(lod_dev.c:419:lod_sub_recovery_thread()) soaked-MDT0002-osp-MDT0001 getting update log failed: rc = -2
      [  268.388707] LustreError: 12556:0:(lod_dev.c:419:lod_sub_recovery_thread()) soaked-MDT0001-osd getting update log failed: rc = -108
      

      then remove the update log on all MDS

      pdsh -w soak-8 "mount -t ldiskfs /dev/disk/by-id/dm-name-360080e50002ff4f0000037dd59ba4f5f /mnt/soaked-mdt0"
      pdsh -w soak-9 "mount -t ldiskfs /dev/disk/by-id/dm-name-360080e50002ff4f0000037e159ba4ffd /mnt/soaked-mdt1"
      pdsh -w soak-10 "mount -t ldiskfs /dev/disk/by-id/dm-name-360080e50001fedb80000015752012949 /mnt/soaked-mdt2"
      pdsh -w soak-11 "mount -t ldiskfs /dev/disk/by-id/dm-name-360080e50001fedb80000015952012962 /mnt/soaked-mdt3"
      pdsh -g mds "df -h /mnt/soaked-mdt?"
      pdsh -g mds "rm -rf /mnt/soaked-mdt?/update_log*"
      pdsh -g mds "ls /mnt/soaked-mdt? | grep update_log"
      pdsh -g mds "umount /mnt/soaked-mdt?"
      

      and remount, restart soak, 1 of the MDs crash right away

      [15895.912762] sd 1:0:1:44: rdac: array soak-netapp5660-1, ctlr 1, MODE_SELECT completed
      [15896.567246] LustreError: 13958:0:(lod_qos.c:862:lod_comp_ost_in_use()) ASSERTION( inuse->op_count * sizeof(inuse->op_array[0]) < inuse->op_size )
       failed: 
      [15896.582712] LustreError: 13958:0:(lod_qos.c:862:lod_comp_ost_in_use()) LBUG
      [15896.590512] Pid: 13958, comm: mdt00_005 3.10.0-957.el7_lustre.x86_64 #1 SMP Mon Jan 7 20:06:41 UTC 2019
      [15896.601014] Call Trace:
      [15896.603759]  [<ffffffffc0b5c7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [15896.611129]  [<ffffffffc0b5c87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [15896.618071]  [<ffffffffc1416f96>] lod_env_info.part.10+0x0/0x36 [lod]
      [15896.625307]  [<ffffffffc140c31f>] lod_alloc_rr.constprop.18+0xf2f/0x1000 [lod]
      [15896.633437]  [<ffffffffc1410749>] lod_qos_prep_create+0x12b9/0x17f0 [lod]
      [15896.641097]  [<ffffffffc14111bd>] lod_prepare_create+0x25d/0x360 [lod]
      [15896.648422]  [<ffffffffc14059de>] lod_declare_striped_create+0x1ee/0x970 [lod]
      [15896.656559]  [<ffffffffc1407e54>] lod_declare_create+0x1e4/0x540 [lod]
      [15896.663881]  [<ffffffffc1473b22>] mdd_declare_create_object_internal+0xe2/0x2f0 [mdd]
      [15896.672667]  [<ffffffffc14651a3>] mdd_declare_create+0x53/0xe30 [mdd]
      [15896.679936]  [<ffffffffc1469059>] mdd_create+0x879/0x1400 [mdd]
      [15896.686573]  [<ffffffffc1339df5>] mdt_reint_open+0x2175/0x3190 [mdt]
      [15896.693729]  [<ffffffffc132ec73>] mdt_reint_rec+0x83/0x210 [mdt]
      [15896.700469]  [<ffffffffc131018b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      [15896.707825]  [<ffffffffc13106b2>] mdt_intent_reint+0x162/0x430 [mdt]
      [15896.714956]  [<ffffffffc13134bb>] mdt_intent_opc+0x1eb/0xaf0 [mdt]
      [15896.721881]  [<ffffffffc131bd28>] mdt_intent_policy+0x138/0x320 [mdt]
      [15896.729102]  [<ffffffffc0e632cd>] ldlm_lock_enqueue+0x38d/0x980 [ptlrpc]
      [15896.736653]  [<ffffffffc0e8cab3>] ldlm_handle_enqueue0+0x943/0x15b0 [ptlrpc]
      [15896.744589]  [<ffffffffc0f12372>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [15896.751463]  [<ffffffffc0f162aa>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [15896.759197]  [<ffffffffc0ebed5b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [15896.767814]  [<ffffffffc0ec24a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [15896.774881]  [<ffffffff876c1c31>] kthread+0xd1/0xe0
      [15896.780351]  [<ffffffff87d74c37>] ret_from_fork_nospec_end+0x0/0x39
      [15896.787380]  [<ffffffffffffffff>] 0xffffffffffffffff
      [15896.792969] Kernel panic - not syncing: LBUG
      [15896.797750] CPU: 16 PID: 13958 Comm: mdt00_005 Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.el7_lustre.x86_64 #1
      [15896.811058] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      [15896.823578] Call Trace:
      [15896.826310]  [<ffffffff87d61dc1>] dump_stack+0x19/0x1b
      [15896.832059]  [<ffffffff87d5b4d0>] panic+0xe8/0x21f
      [15896.837412]  [<ffffffffc0b5c8cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [15896.844308]  [<ffffffffc1416f96>] lod_comp_ost_in_use.part.8+0x36/0x36 [lod]
      [15896.852185]  [<ffffffffc140c31f>] lod_alloc_rr.constprop.18+0xf2f/0x1000 [lod]
      [15896.860254]  [<ffffffffc1410749>] lod_qos_prep_create+0x12b9/0x17f0 [lod]
      [15896.867835]  [<ffffffffc14111bd>] lod_prepare_create+0x25d/0x360 [lod]
      [15896.875125]  [<ffffffffc14059de>] lod_declare_striped_create+0x1ee/0x970 [lod]
      [15896.883191]  [<ffffffffc1407e54>] lod_declare_create+0x1e4/0x540 [lod]
      [15896.890480]  [<ffffffffc1473b22>] mdd_declare_create_object_internal+0xe2/0x2f0 [mdd]
      [15896.899224]  [<ffffffffc14651a3>] mdd_declare_create+0x53/0xe30 [mdd]
      [15896.906416]  [<ffffffffc1469059>] mdd_create+0x879/0x1400 [mdd]
      [15896.913034]  [<ffffffffc1339df5>] mdt_reint_open+0x2175/0x3190 [mdt]
      [15896.920159]  [<ffffffffc0c9e4c1>] ? upcall_cache_get_entry+0x211/0x8f0 [obdclass]
      [15896.928517]  [<ffffffffc131ece3>] ? ucred_set_jobid+0x53/0x70 [mdt]
      [15896.935517]  [<ffffffffc132ec73>] mdt_reint_rec+0x83/0x210 [mdt]
      [15896.942227]  [<ffffffffc131018b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      [15896.949517]  [<ffffffffc13106b2>] mdt_intent_reint+0x162/0x430 [mdt]
      [15896.956613]  [<ffffffffc13134bb>] mdt_intent_opc+0x1eb/0xaf0 [mdt]
      [15896.963542]  [<ffffffffc0eb4c90>] ? lustre_swab_ldlm_policy_data+0x30/0x30 [ptlrpc]
      [15896.972095]  [<ffffffffc131bd28>] mdt_intent_policy+0x138/0x320 [mdt]
      [15896.979306]  [<ffffffffc0e632cd>] ldlm_lock_enqueue+0x38d/0x980 [ptlrpc]
      [15896.986810]  [<ffffffffc0e8cab3>] ldlm_handle_enqueue0+0x943/0x15b0 [ptlrpc]
      [15896.994710]  [<ffffffffc0eb4d10>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
      [15897.003095]  [<ffffffffc0f12372>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [15897.009931]  [<ffffffffc0f162aa>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [15897.017635]  [<ffffffffc0ebed5b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [15897.026209]  [<ffffffffc0ebb388>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      [15897.033809]  [<ffffffff876d67c2>] ? default_wake_function+0x12/0x20
      [15897.040833]  [<ffffffff876cba9b>] ? __wake_up_common+0x5b/0x90
      [15897.047368]  [<ffffffffc0ec24a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [15897.054393]  [<ffffffffc0ec1a10>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
      [15897.062648]  [<ffffffff876c1c31>] kthread+0xd1/0xe0
      [15897.068097]  [<ffffffff876c1b60>] ? insert_kthread_work+0x40/0x40
      [15897.074896]  [<ffffffff87d74c37>] ret_from_fork_nospec_begin+0x21/0x21
      [15897.082179]  [<ffffffff876c1b60>] ? insert_kthread_work+0x40/0x40
      [    0.000000] Initializing cgroup subsys cpuset
      [    0.000000] Initializing cgroup subsys cpu
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: