Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17034

memory corruption caused by bug in qmt_seed_glbe_all

Details

    • 3
    • 9223372036854775807

    Description

      The code in qmt_seed_glbe_all() doesn't support a case when OST index is larger than the number of OSTs.

      BUG: unable to handle kernel paging request at ffff91c13a4ad868
      IP: [<ffffffffc1001931>] qmt_lvbo_update+0x261/0xe60 [lquota]
      CPU: 0 PID: 5100 Comm: qmt_reba_lustre 3.10.0-1160.83.1.el7_lustre.ddn17.x86_64 #1
      RIP: 0010:[<ffffffffc1001931>]  [<ffffffffc1001931>] qmt_lvbo_update+0x261/0xe60 [lquota]
      Call Trace:
        [<ffffffffc13158db>] mdt_lvbo_update+0xbb/0x140 [mdt]
        [<ffffffffc0ba0002>] ldlm_cb_interpret+0x122/0x740 [ptlrpc]
        [<ffffffffc0bbaa67>] ptlrpc_check_set+0x3f7/0x2230 [ptlrpc]
        [<ffffffffc0bbcabb>] ptlrpc_set_wait+0x21b/0x7e0 [ptlrpc]
        [<ffffffffc0b780b5>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc]
        [<ffffffffc0b9b4cb>] ldlm_glimpse_locks+0x3b/0x110 [ptlrpc]
        [<ffffffffc0fffe6f>] qmt_glimpse_lock.isra.15+0x39f/0xa50 [lquota]
        [<ffffffffc10009c4>] qmt_reba_thread+0x4a4/0xa80 [lquota]
        [<ffffffff9c0cb511>] kthread+0xd1/0xe0
      

      For example, if the system has 4 OSTs with indexes 0001, 0002, 00c9, 00ca. As could be seen from the below code index 00c9 would cause writing outside lqeg_arr which has 64 elements by default. 

      void qmt_seed_glbe_all(const struct lu_env *env, struct lqe_glbl_data *lgd,
                             bool qunit, bool edquot)
      {
      ...
                      for (j = 0; j < slaves_cnt; j++) {
                              idx = qmt_sarr_get_idx(qpi, j);
                              LASSERT(idx >= 0);
      
                              if (edquot) {
                                      int lge_edquot, new_edquot, edquot_nu;
      
                                      lge_edquot = lgd->lqeg_arr[idx].lge_edquot;
                                      edquot_nu = lgd->lqeg_arr[idx].lge_edquot_nu;
                                      new_edquot = lqe->lqe_edquot;
      
                                      if (lge_edquot == new_edquot ||
                                          (edquot_nu && lge_edquot == 1))
                                              goto qunit_lbl;
                                      lgd->lqeg_arr[idx].lge_edquot = new_edquot;

      3 things are required to make this bug possible:

      • enabled quota(quota_slave.enalbed != 0) and quota limits set for at least one ID(user/group/project).
      • at least one OST pool in the system
      • at least one OST in the OST pool with index > 64(QMT_INIT_SLV_CNT)

      This bug may cause different kind of kernel panics, but on the system where it often occurred in 80% of all cases it corrupted UUID and NID rhashtables. All of these panics are described in LU-16930. By default the size of lqeg_arr is 64*16=1024. It means that with high probability it would corrupt the neighbor kmalloc-1024 region. 

      Attachments

        Issue Links

          Activity

            [LU-17034] memory corruption caused by bug in qmt_seed_glbe_all

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55035/
            Subject: LU-17034 quota: tmp fix against memory corruption
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: 92c75b7e9fc0616fa660fce3a69f823524297d1c

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55035/ Subject: LU-17034 quota: tmp fix against memory corruption Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: 92c75b7e9fc0616fa660fce3a69f823524297d1c

            "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55035
            Subject: LU-17034 quota: tmp fix against memory corruption
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: dd80f6e0a1359f0233db3e151c2b5c153e9bcf5c

            gerrit Gerrit Updater added a comment - "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55035 Subject: LU-17034 quota: tmp fix against memory corruption Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: dd80f6e0a1359f0233db3e151c2b5c153e9bcf5c

            "Stephane Thiell <sthiell@stanford.edu>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55026
            Subject: LU-17034 quota: lqeg_arr memmory corruption
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 396293cefde28dc42458370095a9f30f6582ff95

            gerrit Gerrit Updater added a comment - "Stephane Thiell <sthiell@stanford.edu>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55026 Subject: LU-17034 quota: lqeg_arr memmory corruption Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 396293cefde28dc42458370095a9f30f6582ff95
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52094/
            Subject: LU-17034 quota: lqeg_arr memmory corruption
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 67f90e42889ff22d574e82cc647f6076e48c65a5

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52094/ Subject: LU-17034 quota: lqeg_arr memmory corruption Project: fs/lustre-release Branch: master Current Patch Set: Commit: 67f90e42889ff22d574e82cc647f6076e48c65a5

            "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52293
            Subject: LU-17034 tests: memory corruption in PQ
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 3cf0ee70e918030f33f2efba4f7a9974afe96c9f

            gerrit Gerrit Updater added a comment - "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52293 Subject: LU-17034 tests: memory corruption in PQ Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3cf0ee70e918030f33f2efba4f7a9974afe96c9f
            yujian Jian Yu added a comment -

            With sparse OST indexes "OST_INDEX_LIST=[0,10,20,40,55,60,80]" (for OSTCOUNT=7) and "ENABLE_QUOTA=yes", performance-sanity test 2 and sanity-benchmark test dbench crashed on master branch:
            https://testing.whamcloud.com/test_sets/aa85d42f-f125-48a0-9b9f-c001b6ec3349
            https://testing.whamcloud.com/test_sets/2a8e95b6-fb76-40fe-bebc-809f9a5959df

            [  265.154037] Lustre: DEBUG MARKER: == sanity-benchmark test dbench: dbench ================== 01:14:05 (1693358045)
            [  265.448184] LustreError: 16616:0:(qmt_entry.c:865:qmt_adjust_edquot_qunit_notify()) ASSERTION( idx <= lgd->lqeg_num_used ) failed: 
            [  265.450565] LustreError: 16616:0:(qmt_entry.c:865:qmt_adjust_edquot_qunit_notify()) LBUG
            [  265.452116] Pid: 16616, comm: mdt_rdpg00_003 4.18.0-477.15.1.el8_lustre.x86_64 #1 SMP Tue Aug 1 06:59:39 UTC 2023
            [  265.454013] Call Trace TBD:
            [  265.454761] [<0>] libcfs_call_trace+0x6f/0xa0 [libcfs]
            [  265.455838] [<0>] lbug_with_loc+0x3f/0x70 [libcfs]
            [  265.456807] [<0>] qmt_adjust_edquot_qunit_notify+0x4e1/0x4f0 [lquota]
            [  265.458122] [<0>] qmt_dqacq0+0x1b00/0x2430 [lquota]
            [  265.459108] [<0>] qmt_intent_policy+0x942/0xfe0 [lquota]
            [  265.460151] [<0>] mdt_intent_opc+0xa66/0xc30 [mdt]
            [  265.461270] [<0>] mdt_intent_policy+0xe8/0x460 [mdt]
            [  265.462259] [<0>] ldlm_lock_enqueue+0x455/0xaf0 [ptlrpc]
            [  265.463809] [<0>] ldlm_handle_enqueue+0x645/0x1870 [ptlrpc]
            [  265.464983] [<0>] tgt_enqueue+0xa8/0x230 [ptlrpc]
            [  265.466042] [<0>] tgt_request_handle+0xd20/0x19c0 [ptlrpc]
            [  265.467193] [<0>] ptlrpc_server_handle_request+0x31d/0xbc0 [ptlrpc]
            [  265.468460] [<0>] ptlrpc_main+0xc91/0x15a0 [ptlrpc]
            [  265.469535] [<0>] kthread+0x134/0x150
            [  265.470333] [<0>] ret_from_fork+0x35/0x40
            [  265.471167] Kernel panic - not syncing: LBUG
            [  265.472006] CPU: 0 PID: 16616 Comm: mdt_rdpg00_003 Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-477.15.1.el8_lustre.x86_64 #1
            [  265.474318] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [  265.475391] Call Trace:
            [  265.475914]  dump_stack+0x41/0x60
            [  265.476588]  panic+0xe7/0x2ac
            [  265.477194]  ? ret_from_fork+0x35/0x40
            [  265.477931]  lbug_with_loc.cold.8+0x18/0x18 [libcfs]
            [  265.478883]  qmt_adjust_edquot_qunit_notify+0x4e1/0x4f0 [lquota]
            [  265.480027]  qmt_dqacq0+0x1b00/0x2430 [lquota]
            [  265.480909]  ? qmt_intent_policy+0x942/0xfe0 [lquota]
            [  265.481906]  qmt_intent_policy+0x942/0xfe0 [lquota]
            [  265.482863]  mdt_intent_opc+0xa66/0xc30 [mdt]
            [  265.483752]  ? lprocfs_counter_add+0x12a/0x1a0 [obdclass]
            [  265.485025]  mdt_intent_policy+0xe8/0x460 [mdt]
            [  265.485920]  ldlm_lock_enqueue+0x455/0xaf0 [ptlrpc]
            [  265.486933]  ? cfs_hash_bd_add_locked+0x1f/0x90 [libcfs]
            [  265.487962]  ? cfs_hash_multi_bd_lock+0xa0/0xa0 [libcfs]
            [  265.488978]  ldlm_handle_enqueue+0x645/0x1870 [ptlrpc]
            [  265.490054]  tgt_enqueue+0xa8/0x230 [ptlrpc]
            [  265.490977]  tgt_request_handle+0xd20/0x19c0 [ptlrpc]
            [  265.492024]  ptlrpc_server_handle_request+0x31d/0xbc0 [ptlrpc]
            [  265.493246]  ? lprocfs_counter_add+0x12a/0x1a0 [obdclass]
            [  265.494312]  ptlrpc_main+0xc91/0x15a0 [ptlrpc]
            [  265.495246]  ? __schedule+0x2d9/0x870
            [  265.495972]  ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
            [  265.497025]  kthread+0x134/0x150
            [  265.497677]  ? set_kthread_struct+0x50/0x50
            [  265.498474]  ret_from_fork+0x35/0x40
            
            yujian Jian Yu added a comment - With sparse OST indexes "OST_INDEX_LIST= [0,10,20,40,55,60,80] " (for OSTCOUNT=7) and "ENABLE_QUOTA=yes", performance-sanity test 2 and sanity-benchmark test dbench crashed on master branch: https://testing.whamcloud.com/test_sets/aa85d42f-f125-48a0-9b9f-c001b6ec3349 https://testing.whamcloud.com/test_sets/2a8e95b6-fb76-40fe-bebc-809f9a5959df [ 265.154037] Lustre: DEBUG MARKER: == sanity-benchmark test dbench: dbench ================== 01:14:05 (1693358045) [ 265.448184] LustreError: 16616:0:(qmt_entry.c:865:qmt_adjust_edquot_qunit_notify()) ASSERTION( idx <= lgd->lqeg_num_used ) failed: [ 265.450565] LustreError: 16616:0:(qmt_entry.c:865:qmt_adjust_edquot_qunit_notify()) LBUG [ 265.452116] Pid: 16616, comm: mdt_rdpg00_003 4.18.0-477.15.1.el8_lustre.x86_64 #1 SMP Tue Aug 1 06:59:39 UTC 2023 [ 265.454013] Call Trace TBD: [ 265.454761] [<0>] libcfs_call_trace+0x6f/0xa0 [libcfs] [ 265.455838] [<0>] lbug_with_loc+0x3f/0x70 [libcfs] [ 265.456807] [<0>] qmt_adjust_edquot_qunit_notify+0x4e1/0x4f0 [lquota] [ 265.458122] [<0>] qmt_dqacq0+0x1b00/0x2430 [lquota] [ 265.459108] [<0>] qmt_intent_policy+0x942/0xfe0 [lquota] [ 265.460151] [<0>] mdt_intent_opc+0xa66/0xc30 [mdt] [ 265.461270] [<0>] mdt_intent_policy+0xe8/0x460 [mdt] [ 265.462259] [<0>] ldlm_lock_enqueue+0x455/0xaf0 [ptlrpc] [ 265.463809] [<0>] ldlm_handle_enqueue+0x645/0x1870 [ptlrpc] [ 265.464983] [<0>] tgt_enqueue+0xa8/0x230 [ptlrpc] [ 265.466042] [<0>] tgt_request_handle+0xd20/0x19c0 [ptlrpc] [ 265.467193] [<0>] ptlrpc_server_handle_request+0x31d/0xbc0 [ptlrpc] [ 265.468460] [<0>] ptlrpc_main+0xc91/0x15a0 [ptlrpc] [ 265.469535] [<0>] kthread+0x134/0x150 [ 265.470333] [<0>] ret_from_fork+0x35/0x40 [ 265.471167] Kernel panic - not syncing: LBUG [ 265.472006] CPU: 0 PID: 16616 Comm: mdt_rdpg00_003 Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.15.1.el8_lustre.x86_64 #1 [ 265.474318] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 265.475391] Call Trace: [ 265.475914] dump_stack+0x41/0x60 [ 265.476588] panic+0xe7/0x2ac [ 265.477194] ? ret_from_fork+0x35/0x40 [ 265.477931] lbug_with_loc.cold.8+0x18/0x18 [libcfs] [ 265.478883] qmt_adjust_edquot_qunit_notify+0x4e1/0x4f0 [lquota] [ 265.480027] qmt_dqacq0+0x1b00/0x2430 [lquota] [ 265.480909] ? qmt_intent_policy+0x942/0xfe0 [lquota] [ 265.481906] qmt_intent_policy+0x942/0xfe0 [lquota] [ 265.482863] mdt_intent_opc+0xa66/0xc30 [mdt] [ 265.483752] ? lprocfs_counter_add+0x12a/0x1a0 [obdclass] [ 265.485025] mdt_intent_policy+0xe8/0x460 [mdt] [ 265.485920] ldlm_lock_enqueue+0x455/0xaf0 [ptlrpc] [ 265.486933] ? cfs_hash_bd_add_locked+0x1f/0x90 [libcfs] [ 265.487962] ? cfs_hash_multi_bd_lock+0xa0/0xa0 [libcfs] [ 265.488978] ldlm_handle_enqueue+0x645/0x1870 [ptlrpc] [ 265.490054] tgt_enqueue+0xa8/0x230 [ptlrpc] [ 265.490977] tgt_request_handle+0xd20/0x19c0 [ptlrpc] [ 265.492024] ptlrpc_server_handle_request+0x31d/0xbc0 [ptlrpc] [ 265.493246] ? lprocfs_counter_add+0x12a/0x1a0 [obdclass] [ 265.494312] ptlrpc_main+0xc91/0x15a0 [ptlrpc] [ 265.495246] ? __schedule+0x2d9/0x870 [ 265.495972] ? ptlrpc_wait_event+0x590/0x590 [ptlrpc] [ 265.497025] kthread+0x134/0x150 [ 265.497677] ? set_kthread_struct+0x50/0x50 [ 265.498474] ret_from_fork+0x35/0x40

            "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52094
            Subject: LU-17034 quota: lqeg_arr memmory corruption
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 3db2668fd0e161875ed20ac8b14184de1a8046b9

            gerrit Gerrit Updater added a comment - "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52094 Subject: LU-17034 quota: lqeg_arr memmory corruption Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3db2668fd0e161875ed20ac8b14184de1a8046b9

            People

              scherementsev Sergey Cheremencev
              scherementsev Sergey Cheremencev
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: