[LU-17076] ptlrpc_nrs_req_stop_nolock() use after free with ORR NRS policy Created: 01/Sep/23  Updated: 13/Nov/23  Resolved: 16/Oct/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Major
Reporter: Mikhail Pershin Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Attachments: File crash_orr_eaujames.tgz    
Issue Links:
Related
is related to LU-10465 increase default stripe size to 4MB Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanityn.sh test 77c crashed as below:

[13898.279218] Lustre: DEBUG MARKER: == sanityn test 77c: check ORR NRS policy ================ 11:42:47 (1693482167)
[13898.512652] Lustre: DEBUG MARKER: lctl set_param ost.OSS.ost_io.nrs_policies=orr ost.OSS.*.nrs_orr_quantum=1 ost.OSS.*.nrs_orr_offset_type=physical ost.OSS.*.nrs_orr_supported=reads
[13900.571898] Lustre: DEBUG MARKER: lctl set_param ost.OSS.*.nrs_orr_supported=writes ost.OSS.*.nrs_orr_quantum=64
[13903.067814] Lustre: DEBUG MARKER: lctl set_param ost.OSS.*.nrs_orr_supported=reads_and_writes ost.OSS.*.nrs_orr_offset_type=logical
[13905.203004] general protection fault, probably for non-canonical address 0x5a5a5a5a5a5a5a5a: 0000 [#1] SMP PTI
[13905.205028] CPU: 1 PID: 250894 Comm: ll_ost_io00_027 Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-477.15.1.el8_lustre.x86_64 #1
[13905.207489] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[13905.208620] RIP: 0010:ptlrpc_nrs_req_stop_nolock+0x34/0x140 [ptlrpc]
[13905.210443] Code: 00 00 a8 04 75 05 e9 0b 08 f0 c1 53 a8 01 74 65 a8 08 0f 85 8d 00 00 00 8b 87 08 01 00 00 48 8b 84 c7 f8 00 00 00 48 8b 58 08 <48> 8b 43 68 48 8b 40 20 48 8b 40 50 48 85 c0 74 0f 48 8d b7 f8 00
[13905.213924] RSP: 0018:ffff9c0ec3513da8 EFLAGS: 00010246
[13905.214961] RAX: ffff8ecb454357d0 RBX: 5a5a5a5a5a5a5a5a RCX: 0000000000000000
[13905.216321] RDX: 0000000000000001 RSI: ffff8ecb309ca400 RDI: ffff8ecb309ca400
[13905.217685] RBP: ffff8ecb05c4d8e8 R08: 00000000000000b8 R09: abcc77118461cefd
[13905.219044] R10: ffff9c0ec3513d88 R11: ffff8ecb34982646 R12: ffff8ecb05c4d800
[13905.220422] R13: ffff8ecb43bfb200 R14: 00000000000089b6 R15: ffff8ecb309ca400
[13905.221866] FS:  0000000000000000(0000) GS:ffff8ecbbfd00000(0000) knlGS:0000000000000000
[13905.223781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13905.224902] CR2: 00007ffc748ef221 CR3: 0000000076810004 CR4: 00000000001706e0
[13905.226268] Call Trace:
[13905.226846]  ptlrpc_server_finish_active_request+0x26/0x120 [ptlrpc]
[13905.228184]  ptlrpc_server_handle_request+0x4a2/0xbc0 [ptlrpc]
[13905.229415]  ptlrpc_main+0xc91/0x15a0 [ptlrpc]
[13905.230399]  ? __schedule+0x2d9/0x870
[13905.231251]  ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
[13905.232347]  kthread+0x134/0x150
[13905.233064]  ? set_kthread_struct+0x50/0x50
[13905.233920]  ret_from_fork+0x35/0x40

report link: https://testing.whamcloud.com/test_sets/79713f31-a8a9-4056-8478-4fb529a14077

It can be related to recent changes in ORR/TRR code in 'LU-8130 nrs: convert NRS ORR/TRR to rhashtable' landed two weeks  ago



 Comments   
Comment by Mikhail Pershin [ 01/Sep/23 ]

On further check it seems that all recent occurrences are from LU-10465 patch for 4M default stripe size, so it can be related to that patch. 

Comment by James A Simmons [ 01/Sep/23 ]

Strange bug since  nrs_orr_req_stop() is the under lying function being called and the debug data is not being handled by RCU so its not a rhashtable issue.  I was thinking it was due to the lquota memory corruption bug but if you can show its LU-10465 then we have a path to look at it.

Comment by Mikhail Pershin [ 02/Sep/23 ]

https://testing.whamcloud.com/test_sets/49dd0ad5-bff7-4dba-a84e-7d8dcae4402a

Another occurrence without relation to  LU-10465, so it seems bug is just started to appear sometimes

 

Comment by Mikhail Pershin [ 02/Sep/23 ]

It happened prior related policy operation ..._req_stop call. I did objdump for related code and it looks like that is due to freed resource:

static inline void nrs_request_stop(struct ptlrpc_nrs_request *nrq)
{
    struct ptlrpc_nrs_policy *policy = nrs_request_policy(nrq);

    which is in turn:

    struct ptlrpc_nrs_policy *nrs_request_policy(struct ptlrpc_nrs_request *nrq)
    {
        return nrs_request_resource(nrq)->res_policy; <--- here res_policy is filled from resource in %rbx
    }

so it looks like nrs_request_resource returned by nrs_request_policy() is freed and contains that 0x5a5a5a5a5a 

Comment by Etienne Aujames [ 05/Sep/23 ]

+1 on master: https://testing.whamcloud.com/test_sets/49dd0ad5-bff7-4dba-a84e-7d8dcae4402a

Comment by Etienne Aujames [ 06/Sep/23 ]

Hi,

I tried to reproduce this bug because I suspected the "c098c09 LU-14976 nrs: change nrs policies at run time".
But it seems this is the "LU-8130 nrs: convert NRS ORR/TRR to rhashtable" patch causing the crash.

I reproduced this crash with debug=all on zfs (it took arround 5-10min):

ONLY_REPEAT=1000 FSTYPE=zfs auster -rv sanityn --only "77c"

From the vmcore-dmesg:

[ 1187.404425] RIP: 0010:[<ffffffffc0c33ddb>]  [<ffffffffc0c33ddb>] ptlrpc_nrs_req_stop_nolock+0x3b/0x150 [ptlrpc]  
[ 1187.404509] RSP: 0018:ffff904365ab7ce0  EFLAGS: 00010246                                                         
[ 1187.404528] RAX: ffff9043bbae4b40 RBX: 5a5a5a5a5a5a5a5a RCX: 0000000000000000                                    
[ 1187.404552] RDX: 0000000000000001 RSI: ffff9043639ca880 RDI: ffff9043639ca880                                    
[ 1187.404576] RBP: ffff904365ab7ce8 R08: 000000000000000a R09: 0000000000000038                                    
[ 1187.404600] R10: 0000000000000000 R11: ffff904365ab7af6 R12: ffff9043797b8800                                    
[ 1187.404624] R13: ffff9043797b88f0 R14: ffff9043baad0e00 R15: 00064f83754e9e80                                    
[ 1187.404648] FS:  0000000000000000(0000) GS:ffff9043bfc00000(0000) knlGS:0000000000000000                         
[ 1187.404676] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                    
[ 1187.404695] CR2: 00007fc038264d00 CR3: 0000000078f9e000 CR4: 00000000000606f0                                    
[ 1187.404731] Call Trace:                                                                                          
[ 1187.404780]  [<ffffffffc0bfcb8b>] ptlrpc_server_finish_active_request+0x2b/0x140 [ptlrpc]                        
[ 1187.404846]  [<ffffffffc0c0080e>] ptlrpc_server_handle_request+0x40e/0xc00 [ptlrpc]                              
[ 1187.404903]  [<ffffffffc0c0232c>] ptlrpc_main+0xc8c/0x1680 [ptlrpc]                                              
[ 1187.404929]  [<ffffffffb5e2b59e>] ? __switch_to+0xce/0x580                                                       
[ 1187.404990]  [<ffffffffc0c016a0>] ? ptlrpc_wait_event+0x5c0/0x5c0 [ptlrpc]                                       
[ 1187.405789]  [<ffffffffb5ec5e61>] kthread+0xd1/0xe0                                                              
[ 1187.406600]  [<ffffffffb5ec5d90>] ? insert_kthread_work+0x40/0x40                                                
[ 1187.407483]  [<ffffffffb65aadf7>] ret_from_fork_nospec_begin+0x21/0x21                                           
[ 1187.408317]  [<ffffffffb5ec5d90>] ? insert_kthread_work+0x40/0x40                                                

From the crash:

crash> ptlrpc_request ffff9043639ca880
....
      sr_nrq = {                                                         
        nr_res_ptrs = {0xffff904398815680, 0xffff9043bbae4b40},   
....

crash> kmem 0xffff9043bbae4b40                                                    
CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  
ffff9043bd16c900 Acpi-State                80         31       306      6     4k  
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE                
  ffffe2c481eeb900  ffff9043bbae4000     0     51         10    41                
  FREE / [ALLOCATED]                                                              
   ffff9043bbae4b40  (cpu 0 cache)                                                
                                                                                  
      PAGE       PHYSICAL      MAPPING       INDEX CNT FLAGS                      
ffffe2c481eeb900 7bae4000                0 ffff9043bbae4b40  1 1fffff00000080 slab

crash> struct nrs_orr_data -x -l nrs_orr_data.od_res  0xffff9043bbae4b40 | head -n10   
struct nrs_orr_data {                                                                  
  od_res = {                                                                           
    res_parent = 0xffff9043bbae4cd0,                                                   
    res_policy = 0x5a5a5a5a5a5a5a5a                                                    
  },                                                                                   
  od_binheap = 0x5a5a5a5a5a5a5a5a,                                                     
  od_obj_hash = {                                                                      
    tbl = 0x5a5a5a5a5a5a5a5a,                                                          
    nelems = {                                                                         
      counter = 0x5a5a5a5a                                                             

The orr object was freed.

The debug log confirmed this:

00000100:00000010:1.1:1693989787.815785:0:10036:0:(nrs_orr.c:832:nrs_orr_object_free()) slab-freed '(orro)': 80 at ffff9043bbae4b40.
Comment by Etienne Aujames [ 06/Sep/23 ]

I think the culprit is the following code:

static int nrs_orr_res_get(struct ptlrpc_nrs_policy *policy,                     
                           struct ptlrpc_nrs_request *nrq,                       
                           const struct ptlrpc_nrs_resource *parent,             
                           struct ptlrpc_nrs_resource **resp, bool moving_req)   
....
        orro2 = rhashtable_lookup_get_insert_fast(&orrd->od_obj_hash,               
                                                  &orro->oo_rhead,                  
                                                  nrs_orr_hash_params);             
        /* A returned non-error orro2 means it already exist */                     
        if (!IS_ERR_OR_NULL(orro2)) {                                               <------ no ref taken if rhashtable_lookup_get_insert_fast returns an existing object
                OBD_SLAB_FREE_PTR(orro, orrd->od_cache);                            
                orro = orro2;                                                       
        }                                                                           
                                                                                    
        /* insertion failed */                                                      
        if (IS_ERR(orro2)) {                                                        

A ref needs to be taken if rhashtable_lookup_get_insert_fast() returns an object, otherwise the object could be freed before nrs_orr_res_put()/ptlrpc_nrs_req_stop_nolock().

Comment by Gerrit Updater [ 06/Sep/23 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52295
Subject: LU-17076 orr: take a ref after lookup_get_insert_fast()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: de641ac0edf93607966bbc7a7ca795edfb1d3049

Comment by Gerrit Updater [ 26/Sep/23 ]

"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52515
Subject: LU-17076 nrs: wait for RCU completion
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 18adf51db65b8226435765cef4c099722ad5627f

Comment by Gerrit Updater [ 28/Sep/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52295/
Subject: LU-17076 orr: take a ref after lookup_get_insert_fast()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d7e70e38c942baebfdc186fcd5ad48cdcc2ebde3

Comment by Gerrit Updater [ 16/Oct/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52515/
Subject: LU-17076 nrs: wait for RCU completion
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a9411a9856a0a1539ea2c8dc9c7eb1bd8fa2c409

Comment by Peter Jones [ 16/Oct/23 ]

Landed for 2.16

Comment by Gerrit Updater [ 13/Nov/23 ]

"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53113
Subject: LU-17076 tests: reproducer
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fbcff6be03f5e30e1811cf5790d52cc8daed69dd

Generated at Sat Feb 10 03:32:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.