[LU-16571] "lfs migrate -b" can cause thread starvation on OSS - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.16.0
Affects Version/s: None
Labels:
None
Environment:
Reproduced on VMs with ldiskfs Lustre 2.15.54:
1 client
1 MDS: 2 MDTs
1 OSS: 2 OSTs

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

I am not 100% sure but I think we hit the following issue in production with "lfs migrate -b"
parrallel instances on files with several hard links:

OSS setup:

[root@oss ~]# lctl get_param ost.OSS.ost.threads_* cpu_partition_table
ost.OSS.ost.threads_max=78
ost.OSS.ost.threads_min=3
ost.OSS.ost.threads_started=4
cpu_partition_table=0   : 0 1
[root@oss ~]# mount -tlustre
/dev/mapper/ost1_flakey on /...
/dev/mapper/ost2_flakey on /...

Reproducer with 1 client:

[root@client test]# lfs setstripe  -c2 .
[root@client test]# dd if=/dev/zero of=test1 bs=1M count=100                                   
[root@client test]# mkdir links                                              
[root@client test]# printf "%s\n" link{1..100} | xargs -I{} ln test1 links/{}
[root@client test]# find -type f | xargs -P100 -I{} lfs migrate -c2 {}
# --> the command hang

Node states
messages queued:

[root@oss ~]# lctl get_param -n ost.OSS.ost.nrs_policies    

regular_requests:
  - name: fifo
    state: started
    fallback: yes
    queued: 125                 
    active: 76                  
...

OSS dmesg:

[369238.270582] LustreError: 13117:0:(ldlm_request.c:124:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1676651767, 300s ago); not entering recovery in server code, just going back to sleep ns: 
filter-lustrefs-OST0000_UUID lock: ffff967687a826c0/0xdb3bda27f476b2cd lrc: 3/1,0 mode: --/PR res: [0xc46:0x0:0x0].0x0 rrc: 40 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) gid 0 flags: 0x400
10000000000 nid: local remote: 0x0 expref: -99 pid: 13117 timeout: 0 lvb_type: 0
[369238.288701] LustreError: 13117:0:(ldlm_request.c:124:ldlm_expired_completion_wait()) Skipped 72 previous similar messages
[369688.315515] Lustre: 13182:0:(service.c:1453:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/-150), not sending early reply  req@ffff9676632edf80 x1758012803008832/t0(0) o101->d58673c6-592c-4241-b
a90-fd6a89dece56@10.0.2.6@tcp:617/0 lens 328/0 e 0 to 0 dl 1676652522 ref 2 fl New:/0/ffffffff rc 0/-1 job:'client.29583'
[369688.315528] Lustre: 13182:0:(service.c:1453:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message
[369693.860227] Lustre: 13182:0:(service.c:1453:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/-150), not sending early reply  req@ffff96769dcb5b00 x1758005042515584/t0(0) o6->lustrefs-MDT0000-mdtlo
v_UUID@10.0.2.4@tcp:622/0 lens 544/0 e 0 to 0 dl 1676652527 ref 2 fl New:/0/ffffffff rc 0/-1 job:'osp-syn-0-0.0'
[369693.860239] Lustre: 13182:0:(service.c:1453:ptlrpc_at_send_early_reply()) Skipped 163 previous similar messages
[369694.392072] Lustre: lustrefs-OST0000: Client d58673c6-592c-4241-ba90-fd6a89dece56 (at 10.0.2.6@tcp) reconnecting
[369694.392084] Lustre: Skipped 1 previous similar message
[369699.869588] Lustre: lustrefs-OST0000: Client lustrefs-MDT0000-mdtlov_UUID (at 10.0.2.4@tcp) reconnecting
[369699.869591] Lustre: Skipped 1 previous similar message
[370140.452612] ptlrpc_watchdog_fire: 72 callbacks suppressed
[370140.452620] Lustre: ll_ost00_025: service thread pid 13134 was inactive for 1202.140 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging p
urposes:
[370140.452623] Lustre: Skipped 2 previous similar messages
[370140.452626] Pid: 13134, comm: ll_ost00_025 3.10.0-1160.59.1.el7.centos.plus.x86_64 #1 SMP Wed Feb 23 17:40:21 UTC 2022
[370140.452631] Call Trace:
[370140.452705] [<0>] ldlm_completion_ast+0x62d/0xa40 [ptlrpc]
[370140.452767] [<0>] ldlm_cli_enqueue_local+0x25c/0x880 [ptlrpc]
[370140.452818] [<0>] tgt_extent_lock+0xea/0x2a0 [ptlrpc]
[370140.452827] [<0>] ofd_getattr_hdl+0x385/0x750 [ofd]
[370140.452874] [<0>] tgt_request_handle+0x92f/0x19c0 [ptlrpc]
[370140.452929] [<0>] ptlrpc_server_handle_request+0x253/0xc00 [ptlrpc]
[370140.452963] [<0>] ptlrpc_main+0xc3c/0x15f0 [ptlrpc]
[370140.452968] [<0>] kthread+0xd1/0xe0
[370140.452972] [<0>] ret_from_fork_nospec_begin+0x21/0x21
[370140.452993] [<0>] 0xfffffffffffffffe
....

Client dmesg (reconnections):

[373336.926476] Lustre: lustrefs-OST0000-osc-ffff9bf4bb70f000: Connection restored to 10.0.2.5@tcp (at 10.0.2.5@tcp)
[373336.926481] Lustre: Skipped 1 previous similar message
[374092.440057] Lustre: 3473:0:(client.c:2305:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1676654035/real 1676654035]  req@ffff9bf48c66e400 x1758012803097024/t0(0) o17->lustrefs-OST0001-osc-ffff9bf4bb70f000@10.0.2.5@tcp:28/4 lens 456/432 e 0 to 1 dl 1676654791 ref 1 fl Rpc:XQr/2/ffffffff rc -11/-1 job:''
[374092.440068] Lustre: 3473:0:(client.c:2305:ptlrpc_expire_one_request()) Skipped 164 previous similar messages
[374092.440098] Lustre: lustrefs-OST0000-osc-ffff9bf4bb70f000: Connection to lustrefs-OST0000 (at 10.0.2.5@tcp) was lost; in progress operations using this service will wait for recovery to complete
[374092.440106] Lustre: Skipped 2 previous similar messages
[374092.450347] Lustre: lustrefs-OST0000-osc-ffff9bf4bb70f000: Connection restored to 10.0.2.5@tcp (at 10.0.2.5@tcp)
[374092.450354] Lustre: Skipped 1 previous similar message
[374848.233487] Lustre: 3473:0:(client.c:2305:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1676654791/real 1676654791]  req@ffff9bf44b5b9f80 x1758012803195712/t0(0) o17->lustrefs-OST0000-osc-ffff9bf4bb70f000@10.0.2.5@tcp:28/4 lens 456/432 e 0 to 1 dl 1676655547 ref 1 fl Rpc:XQr/2/ffffffff rc -11/-1 job:''
[374848.233499] Lustre: 3473:0:(client.c:2305:ptlrpc_expire_one_request()) Skipped 164 previous similar messages
[374848.233550] Lustre: lustrefs-OST0000-osc-ffff9bf4bb70f000: Connection to lustrefs-OST0000 (at 10.0.2.5@tcp) was lost; in progress operations using this service will wait for recovery to complete
[374848.246634] Lustre: lustrefs-OST0000-osc-ffff9bf4bb70f000: Connection restored to 10.0.2.5@tcp (at 10.0.2.5@tcp)
[374848.246645] Lustre: Skipped 1 previous similar message
[375031.816746] Lustre: 3473:0:(client.c:2305:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1676655547/real 1676655547]  req@ffff9bf44b5bb180 x1758012803283392/t0(0) o17->lustrefs-OST0001-osc-ffff9bf4bb70f000@10.0.2.5@tcp:28/4 lens 456/432 e 7 to 1 dl 1676655730 ref 1 fl Rpc:XQr/2/ffffffff rc -11/-1 job:''
[375031.816758] Lustre: 3473:0:(client.c:2305:ptlrpc_expire_one_request()) Skipped 168 previous similar messages
[375031.816775] Lustre: lustrefs-OST0000-osc-ffff9bf4bb70f000: Connection to lustrefs-OST0000 (at 10.0.2.5@tcp) was lost; in progress operations using this service will wait for recovery to complete
[375031.816785] Lustre: Skipped 2 previous similar messages

At this point, the OSS in not able to reply to requests of OST service .

All the threads on the OSS are waiting for an extent lock on the OST objects:

crash> foreach 'll_ost00' bt | grep -c "ldlm_completion_ast"                                  
76

OST resources:

crash> ldlm_resource.lr_name -x 0xffff96768edc7600,
  lr_name = {
    name = {0x7fc, 0x0, 0x0, 0x0}
  }
crash> ldlm_resource.lr_name -x 0xffff96768edc7300  
  lr_name = {
    name = {0xd06, 0x0, 0x0, 0x0}
  }


crash> ldlm_resource.lr_granted -o 0xffff96768edc7600
struct ldlm_resource {
  [ffff96768edc7620] struct list_head lr_granted;
}
crash> list -H ffff96768edc7620 -s ldlm_lock.l_granted_mode  ldlm_lock.l_res_link        
ffff967646789200
  l_granted_mode = LCK_GROUP
crash> ldlm_resource.lr_waiting -o 0xffff96768edc7600
struct ldlm_resource {
  [ffff96768edc7630] struct list_head lr_waiting;
}
crash> list -H ffff96768edc7630 -s ldlm_lock.l_req_mode  ldlm_lock.l_res_link         
ffff96766a297d40
  l_req_mode = LCK_PR
...
crash> list -H ffff96768edc7630 | wc -l
38

Same thing for 0xffff96768edc7300 resource:
....
crash> ldlm_resource.lr_waiting -o 0xffff96768edc7300 
struct ldlm_resource {
  [ffff96768edc7330] struct list_head lr_waiting;
}
crash> list -H ffff96768edc7330 | wc -l
38

38 + 38 = 76 locks are waiting for 2 group locks on the 2 OST objects of the file.

On the client, the "lfs migrate" process that has the group lock is waiting for a reply from the OSS:

PID: 6136   TASK: ffff9bf44e0b2100  CPU: 0   COMMAND: "lfs"                
 #0 [ffff9bf4450a7b38] __schedule at ffffffffa619d018                      
 #1 [ffff9bf4450a7ba0] schedule at ffffffffa619d3e9                        
 #2 [ffff9bf4450a7bb0] cl_sync_io_wait at ffffffffc0b32d55 [obdclass]      
 #3 [ffff9bf4450a7c30] cl_lock_request at ffffffffc0b2e783 [obdclass]      
 #4 [ffff9bf4450a7c68] cl_glimpse_lock at ffffffffc11b4339 [lustre]        
 #5 [ffff9bf4450a7cb0] cl_glimpse_size0 at ffffffffc11b4705 [lustre]       
 #6 [ffff9bf4450a7d08] ll_file_aio_read at ffffffffc1163706 [lustre]       
 #7 [ffff9bf4450a7de8] ll_file_read at ffffffffc1163fa1 [lustre]           
 #8 [ffff9bf4450a7ed0] vfs_read at ffffffffa5c4e3ff                        
 #9 [ffff9bf4450a7f00] sys_pread64 at ffffffffa5c4f472                     
#10 [ffff9bf4450a7f50] system_call_fastpath at ffffffffa61aaf92

The other "lfs migrate" are calling stat() or llapi_get_data_version() (before taking the group lock):

PID: 6137   TASK: ffff9bf44e0b3180  CPU: 1   COMMAND: "lfs"                    
 #0 [ffff9bf4ba1d7ac0] __schedule at ffffffffa619d018                          
 #1 [ffff9bf4ba1d7b28] schedule at ffffffffa619d3e9                            
 #2 [ffff9bf4ba1d7b38] schedule_timeout at ffffffffa619b0b1                    
 #3 [ffff9bf4ba1d7be8] wait_for_completion at ffffffffa619d79d                 
 #4 [ffff9bf4ba1d7c48] osc_io_data_version_end at ffffffffc10311e9 [osc]       
 #5 [ffff9bf4ba1d7c80] cl_io_end at ffffffffc0b3075f [obdclass]                
 #6 [ffff9bf4ba1d7cb0] lov_io_end_wrapper at ffffffffc10864eb [lov]            
 #7 [ffff9bf4ba1d7cd0] lov_io_data_version_end at ffffffffc10874a8 [lov]       
 #8 [ffff9bf4ba1d7cf8] cl_io_end at ffffffffc0b3075f [obdclass]                
 #9 [ffff9bf4ba1d7d28] cl_io_loop at ffffffffc0b334dd [obdclass]               
#10 [ffff9bf4ba1d7d60] ll_ioc_data_version at ffffffffc1159f0b [lustre]        
#11 [ffff9bf4ba1d7da8] ll_file_ioctl at ffffffffc11728c3 [lustre]              
#12 [ffff9bf4ba1d7e80] do_vfs_ioctl at ffffffffa5c63ad0                        
#13 [ffff9bf4ba1d7f00] sys_ioctl at ffffffffa5c63d81                           
#14 [ffff9bf4ba1d7f50] system_call_fastpath at ffffffffa61aaf92                

PID: 6158   TASK: ffff9bf4b9145280  CPU: 0   COMMAND: "lfs"                           
 #0 [ffff9bf444cd3bb8] __schedule at ffffffffa619d018                                 
 #1 [ffff9bf444cd3c20] schedule at ffffffffa619d3e9                                   
 #2 [ffff9bf444cd3c30] cl_sync_io_wait at ffffffffc0b32d55 [obdclass]                 
 #3 [ffff9bf444cd3cb0] cl_lock_request at ffffffffc0b2e783 [obdclass]                 
 #4 [ffff9bf444cd3ce8] cl_glimpse_lock at ffffffffc11b4339 [lustre]                   
 #5 [ffff9bf444cd3d30] cl_glimpse_size0 at ffffffffc11b4705 [lustre]                  
 #6 [ffff9bf444cd3d88] ll_getattr_dentry at ffffffffc116c5ee [lustre]                 
 #7 [ffff9bf444cd3e38] ll_getattr at ffffffffc116cafe [lustre]                        
 #8 [ffff9bf444cd3e48] vfs_getattr at ffffffffa5c53ee9                                
 #9 [ffff9bf444cd3e78] vfs_fstat at ffffffffa5c53f65                                  
#10 [ffff9bf444cd3eb8] SYSC_newfstat at ffffffffa5c544d4                              
#11 [ffff9bf444cd3f40] sys_newfstat at ffffffffa5c548ae

Conclusion:
The requests from the "lfs migrate" process with the group lock are stuck in the NRS policy of the OST service because all the OSS threads are waiting for its group lock.

Workarround
Temporary increase the number of threads to dequeue all the requests:

[root@oss ~]# lctl set_param ost.OSS.ost.threads_max=200

Attachments

Issue Links

is related to

LU-18758 'lfs migrate --block' does not block modifications

Resolved

Activity

People

Assignee:: Etienne Aujames

Reporter:: Etienne Aujames

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 17/Feb/23 7:31 PM

Updated:: 19/Mar/25 8:50 AM

Resolved:: 02/Apr/24 10:01 PM