[LU-17033] Add RCU protect for export nid operation Created: 16/Aug/23  Updated: 23/Sep/23  Resolved: 23/Sep/23

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question/Request Priority: Minor
Reporter: Yang Sheng Assignee: Yang Sheng
Resolution: Not a Bug Votes: 0
Labels: None

Issue Links:
Duplicate
Related
is related to LU-17034 memory corruption caused by bug in qm... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

A few of crash relate to exp_nid_hash. Looks it was operated without RCU protect.

[  257.896656] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e2
[  257.897791] IP: [<ffffffffc0cf1eb0>] ldebugfs_rhash_seq_show+0xa0/0x1e0 [obdclass]
[  257.898814] PGD 21c80e0067 PUD 21bab0c067 PMD 0
[  257.899472] Oops: 0000 [#1] SMP
[  257.914018] CPU: 9 PID: 13241 Comm: lctl Kdump: loaded Tainted: G           OE  ------------ T 3.10.0-1160.95.1.el7_lustre.ddn17.x86_64 #1
[  257.915601] Hardware name: DDN SFA400NVX2E, BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  257.916811] task: ffffa1678707d280 ti: ffffa168c6f54000 task.ti: ffffa168c6f54000
[  257.917773] RIP: 0010:[<ffffffffc0cf1eb0>]  [<ffffffffc0cf1eb0>] ldebugfs_rhash_seq_show+0xa0/0x1e0 [obdclass]
[  257.919093] RSP: 0018:ffffa168c6f57d78  EFLAGS: 00010246
[  257.944326] Call Trace:
[  257.945836]  [<ffffffff8c084e93>] ? seq_printf+0x53/0x80
[  257.947705]  [<ffffffffc0cf20b0>] lprocfs_hash_seq_show+0x60/0x90 [obdclass]
[  257.949770]  [<ffffffffc15ff862>] mgs_hash_seq_show+0x12/0x20 [mgs]
[  257.951731]  [<ffffffff8c0857f8>] seq_read+0x138/0x460
[  257.953549]  [<ffffffff8c0d7ad0>] proc_reg_read+0x40/0x80
[  257.955357]  [<ffffffff8c05bb2f>] vfs_read+0x9f/0x170
[  257.957088]  [<ffffffff8c05c9a5>] SyS_read+0x55/0xd0
[  257.958780]  [<ffffffff8c5c639a>] system_call_fastpath+0x25/0x2a

.....

[ 8320.870019] BUG: unable to handle kernel NULL pointer dereference at 00000000000001ca
[ 8320.872531] IP: [<ffffffff98db7459>] rht_deferred_worker+0x209/0x430
[ 8320.874773] PGD 0
[ 8320.876458] Oops: 0000 [#1] SMP
[ 8320.904160] CPU: 13 PID: 3272 Comm: kworker/13:1 Kdump: loaded Tainted: G           OE  ------------ T 3.10.0-1160.88.1.el7_lustre.ddn17.x86_64 #1
[ 8320.907100] Hardware name: DDN SFA400NVX2E, BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 8320.909544] Workqueue: events rht_deferred_worker
[ 8320.911387] task: ffff89c6dfdb3180 ti: ffff89e8c3994000 task.ti: ffff89e8c3994000
[ 8320.913572] RIP: 0010:[<ffffffff98db7459>]  [<ffffffff98db7459>] rht_deferred_worker+0x209/0x430
[ 8320.939508] Call Trace:
[ 8320.940810]  [<ffffffff98ac32ef>] process_one_work+0x17f/0x440
[ 8320.942542]  [<ffffffff98ac4436>] worker_thread+0x126/0x3c0
[ 8320.944188]  [<ffffffff98ac4310>] ? manage_workers.isra.26+0x2b0/0x2b0
[ 8320.946001]  [<ffffffff98acb621>] kthread+0xd1/0xe0
[ 8320.947555]  [<ffffffff98acb550>] ? insert_kthread_work+0x40/0x40
[ 8320.949308]  [<ffffffff991c61dd>] ret_from_fork_nospec_begin+0x7/0x21
[ 8320.951057]  [<ffffffff98acb550>] ? insert_kthread_work+0x40/0x40



 Comments   
Comment by Gerrit Updater [ 16/Aug/23 ]

"Yang Sheng <ys@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51957
Subject: LU-17033 obdclass: obd_nid_hash was corruption
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f0e9b2a27324fe01340f83b9d53937e461fd02b1

Comment by Yang Sheng [ 16/Aug/23 ]

Hi, Neil,

As you asked, faddr2line result:

VMrhel7# LANG=C bash   ~/git/linux/scripts/faddr2line vmlinux rht_deferred_worker+0x209/0x430
rht_deferred_worker+0x209/0x430:
rhashtable_rehash_one at lib/rhashtable.c:275
(inlined by) rhashtable_rehash_chain at lib/rhashtable.c:315
(inlined by) rhashtable_rehash_table at lib/rhashtable.c:363
(inlined by) rht_deferred_worker at lib/rhashtable.c:464
.........
       rht_for_each(entry, old_tbl, old_hash) {
                err = 0;
                next = rht_dereference_bucket(entry->next, old_tbl, old_hash);   <<<--------

                if (rht_is_a_nulls(next))
                        break;

                pprev = &entry->next;
        }

The main problem as below:

for stack:
[  471.820067] BUG: unable to handle kernel NULL pointer dereference at 0000000000000142
[  471.822528] IP: [<ffffffffa07b7536>] rht_deferred_worker+0x226/0x430
[  471.851583] CPU: 23 PID: 316 Comm: kworker/23:2 Kdump: loaded Tainted: G           OE  ------------ T 3.10.0-1160.95.1.el7_lustre.ddn17.x86_64 #1
[  471.854631] Hardware name: DDN SFA400NVX2E, BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  471.857301] Workqueue: events rht_deferred_worker
[  471.859330] task: ffff9ed3a5770000 ti: ffff9ed3b6960000 task.ti: ffff9ed3b6960000
[  471.861664] RIP: 0010:[<ffffffffa07b7536>]  [<ffffffffa07b7536>] rht_deferred_worker+0x226/0x430
[  471.864180] RSP: 0018:ffff9ed3b6963da0  EFLAGS: 00010246
[  471.866235] RAX: ffff9ed3e63944b8 RBX: 0000000000000142 RCX: 0000000000000000
[  471.868508] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ed3d8c46c8c
[  471.870756] RBP: ffff9ed3b6963e18 R08: ffff9ed5d67608b0 R09: 0000000000000598
[  471.872993] R10: 00000000a77101d6 R11: 00000000c7893a1b R12: 0000000000000139
[  471.875213] R13: ffff9ed494bbe000 R14: ffff9ed3e63944b8 R15: ffff9ed457ea2498
[  471.877431] FS:  0000000000000000(0000) GS:ffff9ed6315c0000(0000) knlGS:0000000000000000
[  471.879730] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  471.881757] CR2: 0000000000000142 CR3: 00000023001ea000 CR4: 0000000000760fe0
[  471.883901] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  471.886028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  471.888141] PKRU: 00000000
[  471.889741] Call Trace:

The table is exp_nid_hash:

crash> bucket_table 0xffff9ed3f1c73000
struct bucket_table {
  size = 256,
  nest = 0,
  rehash = 163,
  hash_rnd = 2859063006,
  locks_mask = 127,
  locks = 0xffff9ed3d8c46c00,
  walkers = {
    next = 0xffff9ed3f1c73020,
    prev = 0xffff9ed3f1c73020
  },
  rcu = {
    next = 0x0,
    func = 0x0
  },
  future_tbl = 0xffff9ed494bbe000,
  buckets = 0xffff9ed3f1c73080
}

Then look into bucket:
.......
ffff9ed3f1c73580:  0000000000000141 0000000000000143   A.......C.......
ffff9ed3f1c73590:  0000000000000145 ffff9ed3e63944b8  <<<<------    E........D9.....
ffff9ed3f1c735a0:  0000000000000149 000000000000014b   I.......K..............
crash> rd ffff9ed3e63944b8
ffff9ed3e63944b8:  0000000000000142 <<<<----- it should be 000147, marker as a null entry, but was set to 0000142.

Other few of instance also in such case. So i suspect the exp_nid_hash lost some locking or barrier.

Thanks,
YangSheng

Generated at Sat Feb 10 03:32:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.