[LU-16144] OST crash at umount in ptlrpc_nrs_req_stop_nolock (with TBF policy). Created: 08/Sep/22  Updated: 19/Jun/23  Resolved: 04/Oct/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Etienne Aujames Assignee: Etienne Aujames
Resolution: Fixed Votes: 0
Labels: tbf

Issue Links:
Related
is related to LU-16253 sanityn: ASSERTION( orro->oo_ref == 0... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

OST calltrace:

[5839915.258394] BUG: unable to handle kernel NULL pointer dereference at 0000000000000114  
[5839915.260256] IP: [<ffffffffc0d9e965>] ptlrpc_nrs_req_stop_nolock+0x5/0x150 [ptlrpc]    
.....
[5839915.319008]  [<ffffffffc0d6861b>] ? ptlrpc_server_finish_active_request+0x2b/0x140 [ptlrpc]        
[5839915.320846]  [<ffffffffc0d68867>] ptlrpc_service_purge_all+0x137/0x920 [ptlrpc]                    
[5839915.322159]  [<ffffffffc0d6ac37>] ptlrpc_unregister_service+0xe7/0x6f0 [ptlrpc]                    
[5839915.323521]  [<ffffffffc09090f2>] ost_cleanup+0x52/0x1b0 [ost]                                      
[5839915.324585]  [<ffffffffc0a4db2d>] class_free_dev+0x21d/0x720 [obdclass]                            
[5839915.325761]  [<ffffffffc0a4e220>] class_export_put+0x1f0/0x2c0 [obdclass]                          
[5839915.327088]  [<ffffffffc0a4fc95>] class_unlink_export+0x135/0x170 [obdclass]                        
[5839915.328496]  [<ffffffffc0a659e0>] class_decref+0x80/0x160 [obdclass]                                
[5839915.329883]  [<ffffffffc0a65e43>] class_detach+0x1b3/0x2e0 [obdclass]                              
[5839915.331131]  [<ffffffffc0a6ca48>] class_process_config+0x1a38/0x2830 [obdclass]                    
[5839915.332602]  [<ffffffffb08d3b0a>] ? complete+0x4a/0x60                                              
[5839915.333756]  [<ffffffffb0ba14fd>] ? list_del+0xd/0x30                                              
[5839915.334904]  [<ffffffffb0f814fe>] ? wait_for_completion+0x4e/0x140                                  
[5839915.336336]  [<ffffffffc0a6da20>] class_manual_cleanup+0x1e0/0x710 [obdclass]                      
[5839915.337972]  [<ffffffffc0a99835>] server_stop_servers+0xd5/0x160 [obdclass]                        
[5839915.339302]  [<ffffffffc0a9ef9d>] server_put_super+0x12d/0xd00 [obdclass]                          
[5839915.340450]  [<ffffffffb0a4d53d>] generic_shutdown_super+0x6d/0x100                                
[5839915.341528]  [<ffffffffb0a4d942>] kill_anon_super+0x12/0x20                                        
[5839915.342542]  [<ffffffffc0a70852>] lustre_kill_super+0x32/0x50 [obdclass]                            
[5839915.343693]  [<ffffffffb0a4dd1e>] deactivate_locked_super+0x4e/0x70                                
[5839915.344791]  [<ffffffffb0a4e4a6>] deactivate_super+0x46/0x60                                        
[5839915.345863]  [<ffffffffb0a6d03f>] cleanup_mnt+0x3f/0x80                                            
[5839915.346952]  [<ffffffffb0a6d0d2>] __cleanup_mnt+0x12/0x20                                          
[5839915.347897]  [<ffffffffb08c2e5b>] task_work_run+0xbb/0xe0                                          
[5839915.348805]  [<ffffffffb082cc65>] do_notify_resume+0xa5/0xc0                                        
[5839915.349916]  [<ffffffffb0f8e23b>] int_signal+0x12/0x17                                              

ptlrpc_server_request_get() return NULL pointer in ptlrpc_service_purge_all():

 ptlrpc_service_purge_all(struct ptlrpc_service *svc)
....
                 while (ptlrpc_server_request_pending(svcpt, true)) {       
                         req = ptlrpc_server_request_get(svcpt, true);      
                         ptlrpc_server_finish_active_request(svcpt, req);   
                 }                                                          

It seems that nrs_tbf_req_get does not implement force mode:

static                                                                          
struct ptlrpc_nrs_request *nrs_tbf_req_get(struct ptlrpc_nrs_policy *policy,    
                                           bool peek, bool force)               
{                                                                               
        struct nrs_tbf_head       *head = policy->pol_private;                  
        struct ptlrpc_nrs_request *nrq = NULL;                                  
        struct nrs_tbf_client     *cli;                                         
        struct binheap_node       *node;                                        
                                                                                
        assert_spin_locked(&policy->pol_nrs->nrs_svcpt->scp_req_lock);          
                                                                                
        if (!peek && policy->pol_nrs->nrs_throttling)                           <---------
                return NULL;                                                    
....


 Comments   
Comment by Etienne Aujames [ 08/Sep/22 ]

This issue could be linked to LU-14976, I do not have crash to confirm.

Comment by Gerrit Updater [ 09/Sep/22 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/48494
Subject: LU-16144 tbf: implement force mode for nrs_tbf_req_get()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5da90080fce3e34ca5a76ee88c91ffbb2a43999c

Comment by Gerrit Updater [ 04/Oct/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48494/
Subject: LU-16144 nrs: implement force mode for nrs_tbf_req_get()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1bba7dd425d3fc9ef3f51ee68a99bef36e2dcf90

Comment by Peter Jones [ 04/Oct/22 ]

Landed for 2.16

Comment by Gerrit Updater [ 19/Dec/22 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49448
Subject: LU-16144 nrs: implement force mode for nrs_tbf_req_get()
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: b3fa2b0b616728275645eafbb2721d11c1f257ba

Comment by Gerrit Updater [ 19/Jun/23 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51363
Subject: LU-16144 nrs: implement force mode for nrs_tbf_req_get()
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: d4f1032ca64ee4b31065bfec639f8989b1b2fa65

Generated at Sat Feb 10 03:24:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.