Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.16.0, Lustre 2.15.5
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

The code in qmt_seed_glbe_all() doesn't support a case when OST index is larger than the number of OSTs.

BUG: unable to handle kernel paging request at ffff91c13a4ad868
IP: [<ffffffffc1001931>] qmt_lvbo_update+0x261/0xe60 [lquota]
CPU: 0 PID: 5100 Comm: qmt_reba_lustre 3.10.0-1160.83.1.el7_lustre.ddn17.x86_64 #1
RIP: 0010:[<ffffffffc1001931>]  [<ffffffffc1001931>] qmt_lvbo_update+0x261/0xe60 [lquota]
Call Trace:
  [<ffffffffc13158db>] mdt_lvbo_update+0xbb/0x140 [mdt]
  [<ffffffffc0ba0002>] ldlm_cb_interpret+0x122/0x740 [ptlrpc]
  [<ffffffffc0bbaa67>] ptlrpc_check_set+0x3f7/0x2230 [ptlrpc]
  [<ffffffffc0bbcabb>] ptlrpc_set_wait+0x21b/0x7e0 [ptlrpc]
  [<ffffffffc0b780b5>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc]
  [<ffffffffc0b9b4cb>] ldlm_glimpse_locks+0x3b/0x110 [ptlrpc]
  [<ffffffffc0fffe6f>] qmt_glimpse_lock.isra.15+0x39f/0xa50 [lquota]
  [<ffffffffc10009c4>] qmt_reba_thread+0x4a4/0xa80 [lquota]
  [<ffffffff9c0cb511>] kthread+0xd1/0xe0

For example, if the system has 4 OSTs with indexes 0001, 0002, 00c9, 00ca. As could be seen from the below code index 00c9 would cause writing outside lqeg_arr which has 64 elements by default.

void qmt_seed_glbe_all(const struct lu_env *env, struct lqe_glbl_data *lgd,
                       bool qunit, bool edquot)
{
...
                for (j = 0; j < slaves_cnt; j++) {
                        idx = qmt_sarr_get_idx(qpi, j);
                        LASSERT(idx >= 0);

                        if (edquot) {
                                int lge_edquot, new_edquot, edquot_nu;

                                lge_edquot = lgd->lqeg_arr[idx].lge_edquot;
                                edquot_nu = lgd->lqeg_arr[idx].lge_edquot_nu;
                                new_edquot = lqe->lqe_edquot;

                                if (lge_edquot == new_edquot ||
                                    (edquot_nu && lge_edquot == 1))
                                        goto qunit_lbl;
                                lgd->lqeg_arr[idx].lge_edquot = new_edquot;

3 things are required to make this bug possible:

enabled quota(quota_slave.enalbed != 0) and quota limits set for at least one ID(user/group/project).
at least one OST pool in the system
at least one OST in the OST pool with index > 64(QMT_INIT_SLV_CNT)

This bug may cause different kind of kernel panics, but on the system where it often occurred in 80% of all cases it corrupted UUID and NID rhashtables. All of these panics are described in ~~LU-16930~~. By default the size of lqeg_arr is 64*16=1024. It means that with high probability it would corrupt the neighbor kmalloc-1024 region.

Attachments

Issue Links

is duplicated by

LU-17790 BUG: unable to handle kernel paging request IP: qmt_lvbo_update [lquota]

Open

LU-16930 BUG: nid_keycmp+0x6

Resolved

is related to

LU-16189 memory corruption in conf-sanity test 111

Resolved

LU-17037 Tests should run with high and sparse index numbers for OSTs and MDTs

In Progress

LU-17033 Add RCU protect for export nid operation

Closed

Activity

People

Assignee:: Sergey Cheremencev

Reporter:: Sergey Cheremencev

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 16/Aug/23 1:30 PM

Updated:: 02/Oct/24 8:19 PM

Resolved:: 18/Nov/23 9:53 PM