[LU-13018] NULL pointer dereference in osd_attr_get while consuming changelogs and reading changelog_size Created: 26/Nov/19 Updated: 01/Feb/20 Resolved: 01/Feb/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.8 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Minor |
| Reporter: | Olaf Faaland | Assignee: | Mikhail Pershin |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
zfs-0.7.11-9llnl |
||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
We monitor changelog_size on our MDTs, fetching it every 5 minutes under normal operation. We also consume changelogs constantly via Starfish. We occasionally see the following BUG: Lustre: 29422:0:(mdd_device.c:1577:mdd_changelog_clear()) Skipped 22 previous similar messages BUG: unable to handle kernel NULL pointer dereference at 000000000000001c IP: [<ffffffffc14d20a2>] osd_attr_get+0x62/0x340 [osd_zfs] PGD 0 Oops: 0000 [#1] SMP CPU: 4 PID: 21384 Comm: snmpd Kdump: loaded Tainted: P OE ------------ T 3.10.0-1062.1.1.1chaos.ch6.x86_64 #1 Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 task: ffff9b1a2ab2b150 ti: ffff9b170ece0000 task.ti: ffff9b170ece0000 RIP: 0010:[<ffffffffc14d20a2>] [<ffffffffc14d20a2>] osd_attr_get+0x62/0x340 [osd_zfs] Call Trace: [<ffffffffc12f73b5>] llog_size+0x35/0xe0 [obdclass] [<ffffffffc13006fe>] ? llog_cat_id2handle+0x30e/0x5b0 [obdclass] [<ffffffffb9232fb8>] ? kmem_cache_alloc+0x48/0x240 [<ffffffffc1300b9c>] llog_cat_size_cb+0x1fc/0x3d0 [obdclass] [<ffffffffc12fb9db>] llog_process_thread+0x87b/0x1470 [obdclass] [<ffffffffb9232c89>] ? ___slab_alloc+0x209/0x4f0 [<ffffffffc13009a0>] ? llog_cat_id2handle+0x5b0/0x5b0 [obdclass] [<ffffffffc12fc68c>] llog_process_or_fork+0xbc/0x450 [obdclass] [<ffffffffc13009a0>] ? llog_cat_id2handle+0x5b0/0x5b0 [obdclass] [<ffffffffc1300f09>] llog_cat_process_or_fork+0x199/0x2a0 [obdclass] [<ffffffffc1301098>] llog_cat_size+0x58/0x80 [obdclass] [<ffffffffc19d7a32>] mdd_changelog_size_ctxt+0x92/0x320 [mdd] [<ffffffffc19d7d35>] mdd_changelog_size_seq_show+0x75/0xe0 [mdd] [<ffffffffb9284d18>] seq_read+0x138/0x460 [<ffffffffb92d94b0>] proc_reg_read+0x40/0x80 [<ffffffffb925a71c>] vfs_read+0xbc/0x1c0 [<ffffffffb925b68f>] SyS_read+0x7f/0xf0 [<ffffffffb97ba11e>] system_call_fastpath+0x25/0x2a |
| Comments |
| Comment by Olaf Faaland [ 26/Nov/19 ] |
|
I haven't tried the obvious thing of reading changelog_size in a tight loop while clearing changelogs to see how easy it is to reproduce. I'll do that today. |
| Comment by Peter Jones [ 26/Nov/19 ] |
|
Mike Could you please advise? Thanks Peter |
| Comment by Olaf Faaland [ 26/Nov/19 ] |
|
I wasn't able to quickly reproduce it as described above. I'll let you know if running for a longer duration works. |
| Comment by Mikhail Pershin [ 27/Nov/19 ] |
|
Olaf, is that possible to find out what is happening at 0x62 there: |
| Comment by Olaf Faaland [ 03/Dec/19 ] |
|
Mike, Sorry, I meant to do that in the first place. Here you go: (gdb) l *(osd_attr_get+0x62) 0x90d2 is in osd_attr_get (/usr/src/debug/lustre-2.10.8_4.chaos/lustre/osd-zfs/osd_object.c:788). 783 uint32_t blksize; 784 int rc = 0; 785 786 down_read(&obj->oo_guard); 787 788 if (unlikely(!dt_object_exists(dt) || obj->oo_destroyed)) 789 GOTO(out, rc = -ENOENT); 790 791 if (unlikely(fid_is_acct(lu_object_fid(&dt->do_lu)))) 792 GOTO(out, rc = 0); Thanks |
| Comment by Mikhail Pershin [ 09/Dec/19 ] |
|
As I understand this offset 0x1c is for loh_attr field in lo_header while checking dt_object_exists(). That means we have a race here between llog deletion and polling its size. I will check related code. Olaf, can this be reproduced easily or this is quite rare situation? |
| Comment by Olaf Faaland [ 09/Dec/19 ] |
|
Hi Mike, Thanks for looking. I haven't been able to reproduce it so far. It has not been frequent in production, but has happened multiple times.
|
| Comment by Mikhail Pershin [ 10/Dec/19 ] |
|
Olaf, speaking of possible use-after-free for lu_object, there is NULL pointer and I wonder why it is not poisoned, can you check kernel config options for SLAB_DEBUG or any other poisoning things? Are they turned off? I mean that if allocation debug is being used then this case is unlikely use-after-free |
| Comment by Olaf Faaland [ 11/Dec/19 ] |
|
Hi Mike, I know of the concept but don't know the different kernel config options to search for. Here's one - let me know what others to look for. [root@zinc2:~]# grep SLUB_DEBUG /boot/config-3.10.0-1062.1.1.1chaos.ch6.x86_64 CONFIG_SLUB_DEBUG=y # CONFIG_SLUB_DEBUG_ON is not set I confirmed that all of /sys/kernel/slab/*/poison are "0", which I believe means no poisoning is being done for that memory. If I'm wrong, let me know. The kernel is a patched version of RHEL 7.7 kernel, 3.10.0-1062.1.1.1chaos.ch6.x86_64.
|
| Comment by Olaf Faaland [ 12/Dec/19 ] |
|
For my reference: My local issue is TOSS4656 |
| Comment by Mikhail Pershin [ 29/Jan/20 ] |
|
this issue has the same reason as |
| Comment by Mikhail Pershin [ 29/Jan/20 ] |
|
Patch was added under |
| Comment by Olaf Faaland [ 29/Jan/20 ] |
|
Thanks, Mikhail. You can close this issue as dupe. |