[LU-13018] NULL pointer dereference in osd_attr_get while consuming changelogs and reading changelog_size Created: 26/Nov/19  Updated: 01/Feb/20  Resolved: 01/Feb/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.8
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Olaf Faaland Assignee: Mikhail Pershin
Resolution: Duplicate Votes: 0
Labels: llnl
Environment:

zfs-0.7.11-9llnl
Lustre 2.10.8_4.chaos
See https://github.com/LLNL/lustre/commits/2.10.8_4.chaos
RHEL 7.7 based OS


Issue Links:
Related
is related to LU-10198 GPF llog_osd_declare_write_rec+0xb6/0... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

We monitor changelog_size on our MDTs, fetching it every 5 minutes under normal operation.  We also consume changelogs constantly via Starfish.  We occasionally see the following BUG:

Lustre: 29422:0:(mdd_device.c:1577:mdd_changelog_clear()) Skipped 22 previous similar messages
BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
IP: [<ffffffffc14d20a2>] osd_attr_get+0x62/0x340 [osd_zfs]
PGD 0 
Oops: 0000 [#1] SMP 
CPU: 4 PID: 21384 Comm: snmpd Kdump: loaded Tainted: P           OE  ------------ T 3.10.0-1062.1.1.1chaos.ch6.x86_64 #1
Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
task: ffff9b1a2ab2b150 ti: ffff9b170ece0000 task.ti: ffff9b170ece0000
RIP: 0010:[<ffffffffc14d20a2>]  [<ffffffffc14d20a2>] osd_attr_get+0x62/0x340 [osd_zfs]
Call Trace:
 [<ffffffffc12f73b5>] llog_size+0x35/0xe0 [obdclass]
 [<ffffffffc13006fe>] ? llog_cat_id2handle+0x30e/0x5b0 [obdclass]
 [<ffffffffb9232fb8>] ? kmem_cache_alloc+0x48/0x240
 [<ffffffffc1300b9c>] llog_cat_size_cb+0x1fc/0x3d0 [obdclass]
 [<ffffffffc12fb9db>] llog_process_thread+0x87b/0x1470 [obdclass]
 [<ffffffffb9232c89>] ? ___slab_alloc+0x209/0x4f0
 [<ffffffffc13009a0>] ? llog_cat_id2handle+0x5b0/0x5b0 [obdclass]
 [<ffffffffc12fc68c>] llog_process_or_fork+0xbc/0x450 [obdclass]
 [<ffffffffc13009a0>] ? llog_cat_id2handle+0x5b0/0x5b0 [obdclass]
 [<ffffffffc1300f09>] llog_cat_process_or_fork+0x199/0x2a0 [obdclass]
 [<ffffffffc1301098>] llog_cat_size+0x58/0x80 [obdclass]
 [<ffffffffc19d7a32>] mdd_changelog_size_ctxt+0x92/0x320 [mdd]
 [<ffffffffc19d7d35>] mdd_changelog_size_seq_show+0x75/0xe0 [mdd]
 [<ffffffffb9284d18>] seq_read+0x138/0x460
 [<ffffffffb92d94b0>] proc_reg_read+0x40/0x80
 [<ffffffffb925a71c>] vfs_read+0xbc/0x1c0
 [<ffffffffb925b68f>] SyS_read+0x7f/0xf0
 [<ffffffffb97ba11e>] system_call_fastpath+0x25/0x2a 


 Comments   
Comment by Olaf Faaland [ 26/Nov/19 ]

I haven't tried the obvious thing of reading changelog_size in a tight loop while clearing changelogs to see how easy it is to reproduce.  I'll do that today.

Comment by Peter Jones [ 26/Nov/19 ]

Mike

Could you please advise?

Thanks

Peter

Comment by Olaf Faaland [ 26/Nov/19 ]

I wasn't able to quickly reproduce it as described above.  I'll let you know if running for a longer duration works.

Comment by Mikhail Pershin [ 27/Nov/19 ]

Olaf, is that possible to find out what is happening at 0x62 there:
RIP: 0010:[<ffffffffc14d20a2>] [<ffffffffc14d20a2>] osd_attr_get+0x62/0x340 [osd_zfs]
or maybe attach osd_object.o binary file or .ko module?

Comment by Olaf Faaland [ 03/Dec/19 ]

Mike,

Sorry, I meant to do that in the first place.  Here you go:

(gdb) l *(osd_attr_get+0x62)
0x90d2 is in osd_attr_get (/usr/src/debug/lustre-2.10.8_4.chaos/lustre/osd-zfs/osd_object.c:788).
783		uint32_t		 blksize;
784		int			 rc = 0;
785	
786		down_read(&obj->oo_guard);
787	
788		if (unlikely(!dt_object_exists(dt) || obj->oo_destroyed))
789			GOTO(out, rc = -ENOENT);
790	
791		if (unlikely(fid_is_acct(lu_object_fid(&dt->do_lu))))
792			GOTO(out, rc = 0); 

Thanks

Comment by Mikhail Pershin [ 09/Dec/19 ]

As I understand this offset 0x1c is for loh_attr field in lo_header while checking dt_object_exists(). That means we have a race here between llog deletion and polling its size. I will check related code.

Olaf, can this be reproduced easily or this is quite rare situation?

Comment by Olaf Faaland [ 09/Dec/19 ]

Hi Mike,

Thanks for looking.  I haven't been able to reproduce it so far.  It has not been frequent in production, but has happened multiple times.

 

Comment by Mikhail Pershin [ 10/Dec/19 ]

Olaf, speaking of possible use-after-free for lu_object, there is NULL pointer and I wonder why it is not poisoned, can you check kernel config options for SLAB_DEBUG or any other poisoning things? Are they turned off? I mean that if allocation debug is being used then this case is unlikely use-after-free

Comment by Olaf Faaland [ 11/Dec/19 ]

Hi Mike,

I know of the concept but don't know the different kernel config options to search for.  Here's one - let me know what others to look for.

[root@zinc2:~]# grep SLUB_DEBUG /boot/config-3.10.0-1062.1.1.1chaos.ch6.x86_64
CONFIG_SLUB_DEBUG=y
# CONFIG_SLUB_DEBUG_ON is not set

I confirmed that all of /sys/kernel/slab/*/poison are "0", which I believe means no poisoning is being done for that memory.  If I'm wrong, let me know.

The kernel is a patched version of RHEL 7.7 kernel, 3.10.0-1062.1.1.1chaos.ch6.x86_64.

 

Comment by Olaf Faaland [ 12/Dec/19 ]

For my reference: My local issue is TOSS4656

Comment by Mikhail Pershin [ 29/Jan/20 ]

this issue has the same reason as LU-10198, I will prepate patch soon

Comment by Mikhail Pershin [ 29/Jan/20 ]

Patch was added under LU-10198, here is the link: https://review.whamcloud.com/#/c/37367/

Comment by Olaf Faaland [ 29/Jan/20 ]

Thanks, Mikhail.  You can close this issue as dupe.

Generated at Sat Feb 10 02:57:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.