[LU-16830] mdtest SEL jobs aborted with ENOSPC during automated FOFB testing Created: 15/May/23 Updated: 28/Sep/23 Resolved: 09/Jun/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Alexander Boyko | Assignee: | Alexander Boyko |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
lod_ost_alloc_rr() has 3 speed loops (speed 0, 1, 2). At 0,1 it loops OSTs without waiting on objects. And at speed 2 it waits obd_timeout(60s) for OST object creation. 00000004:00080000:5.0:1681399745.413238:0:52376:0:(osp_precreate.c:1552:osp_precreate_reserve()) snx11922-OST0001-osc-MDT0000: slow creates, last=[0x100010000:0x856c7ce1:0x0], next=[0x100010000:0x856c7ce1:0x0], reserved=0, sync_changes=0, sync_rpcs_in_progress=0, status=-19 00020000:00001000:5.0:1681399745.413246:0:52376:0:(lod_qos.c:415:lod_qos_declare_object_on()) can't declare creation on #1: -110 00020000:00001000:5.0:1681399745.413269:0:52376:0:(lod_qos.c:822:lod_ost_alloc_rr()) #3 strt 4 act 0 strp 0 ary 0 idx 0 00000004:00000040:5.0:1681399745.413304:0:52376:0:(osp_precreate.c:1538:osp_precreate_reserve()) snx11922-OST0000-osc-MDT0000: Sleeping on objects 00000004:00080000:5.0:1681399806.853255:0:52376:0:(osp_precreate.c:1552:osp_precreate_reserve()) snx11922-OST0000-osc-MDT0000: slow creates, last=[0x100000000:0x86f5beb8:0x0], next=[0x100000000:0x86f5beb8:0x0], reserved=0, sync_changes=0, sync_rpcs_in_progress=0, status=-19 00020000:00001000:5.0:1681399806.853278:0:52376:0:(lod_qos.c:415:lod_qos_declare_object_on()) can't declare creation on #0: -110 00000100:00100000:5.0:1681399806.853525:0:52376:0:(service.c:2368:ptlrpc_server_handle_request()) Handled RPC req@0000000032eefb47 pname:cluuid+ref:pid:xid:nid:opc:job mdt01_013:deadcc71-76dd-42b1-be5d-3b7c0cd701f9+310925:10959:x1762034564784640:12345-104@gni4:101: Request processed in 247686653us (247686679us total) trans 0 rc 301/301 00000100:00001000:5.0:1681399806.853545:0:52376:0:(service.c:2384:ptlrpc_server_handle_request()) @@@ sent 9 early replies before finishing in 247s req@0000000032eefb47 x1762034564784640/t0(0) o101->deadcc71-76dd-42b1-be5d-3b7c0cd701f9@104@gni4:316/0 lens 984/600 e 9 to 0 dl 1681399836 ref 1 fl Complete:/0/0 rc 301/301 job:'' Here is enospc logs, without -28, but it is really it. 00000004:00080000:4.0:1681509372.244328:0:96679:0:(osp_precreate.c:1552:osp_precreate_reserve()) snx11922-OST0000-osc-MDT0001: slow creates, last=[0x280000400:0x9685b1fc:0x0], next=[0x280000400:0x9685b1fc:0x0], reserved=0, sync_changes=83616, sync_rpcs_in_progress=0, status=-19 00020000:00001000:4.0:1681509372.244340:0:96679:0:(lod_qos.c:415:lod_qos_declare_object_on()) can't declare creation on #0: -110 00020000:00001000:4.0:1681509372.244345:0:96679:0:(lod_qos.c:693:lod_check_and_reserve_ost()) can't declare new object on #0: -110 00020000:00001000:4.0:1681509372.244347:0:96679:0:(lod_qos.c:822:lod_ost_alloc_rr()) #1 strt 14 act 0 strp 0 ary 2 idx 1 00000004:00001000:4.0:1681509372.244351:0:96679:0:(osp_dev.c:806:osp_statfs()) snx11922-OST0001-osc-MDT0001: 30062590209 blocks, 20088647533 free, 19785479608 avail, 4096 bsize, 114679 reserved mb low, 229359 reserved mb high, 118423424 files, 82006030 free files 00000004:00000040:4.0:1681509372.244362:0:96679:0:(osp_precreate.c:1538:osp_precreate_reserve()) snx11922-OST0001-osc-MDT0001: Sleeping on objects 00000004:00080000:4.0:1681509433.684392:0:96679:0:(osp_precreate.c:1552:osp_precreate_reserve()) snx11922-OST0001-osc-MDT0001: slow creates, last=[0x300000400:0x953923b1:0x0], next=[0x300000400:0x953923b1:0x0], reserved=0, sync_changes=76494, sync_rpcs_in_progress=0, status=-19 00020000:00001000:4.0:1681509433.684395:0:96679:0:(lod_qos.c:415:lod_qos_declare_object_on()) can't declare creation on #1: -110 00020000:00001000:4.0:1681509433.684396:0:96679:0:(lod_qos.c:693:lod_check_and_reserve_ost()) can't declare new object on #1: -110 00020000:00001000:4.0:1681509433.684397:0:96679:0:(lod_qos.c:822:lod_ost_alloc_rr()) #2 strt 12 act 0 strp 0 ary 0 idx 0 00000004:00001000:4.0:1681509433.684398:0:96679:0:(osp_dev.c:806:osp_statfs()) snx11922-OST0000-osc-MDT0001: 30062590209 blocks, 20054934974 free, 19751767042 avail, 4096 bsize, 114679 reserved mb low, 229359 reserved mb high, 118423424 files, 81874401 free files 00000004:00000040:4.0:1681509433.684401:0:96679:0:(osp_precreate.c:1538:osp_precreate_reserve()) snx11922-OST0000-osc-MDT0001: Sleeping on objects 00000004:00080000:4.0:1681509495.124293:0:96679:0:(osp_precreate.c:1552:osp_precreate_reserve()) snx11922-OST0000-osc-MDT0001: slow creates, last=[0x280000400:0x9685b1fc:0x0], next=[0x280000400:0x9685b1fc:0x0], reserved=0, sync_changes=83616, sync_rpcs_in_progress=0, status=-19 00020000:00001000:4.0:1681509495.124299:0:96679:0:(lod_qos.c:415:lod_qos_declare_object_on()) can't declare creation on #0: -110 00020000:00001000:4.0:1681509495.124301:0:96679:0:(lod_qos.c:693:lod_check_and_reserve_ost()) can't declare new object on #0: -110 00020000:00001000:4.0:1681509495.124302:0:96679:0:(lod_qos.c:822:lod_ost_alloc_rr()) #3 strt 6 act 0 strp 0 ary 2 idx 1 00000004:00001000:4.0:1681509495.124304:0:96679:0:(osp_dev.c:806:osp_statfs()) snx11922-OST0001-osc-MDT0001: 30062590209 blocks, 20088647533 free, 19785479608 avail, 4096 bsize, 114679 reserved mb low, 229359 reserved mb high, 118423424 files, 82006030 free files 00000004:00000040:4.0:1681509495.124310:0:96679:0:(osp_precreate.c:1538:osp_precreate_reserve()) snx11922-OST0001-osc-MDT0001: Sleeping on objects 00000004:00080000:4.0:1681509556.564304:0:96679:0:(osp_precreate.c:1552:osp_precreate_reserve()) snx11922-OST0001-osc-MDT0001: slow creates, last=[0x300000400:0x953923b1:0x0], next=[0x300000400:0x953923b1:0x0], reserved=0, sync_changes=76494, sync_rpcs_in_progress=0, status=-19 00020000:00001000:4.0:1681509556.564310:0:96679:0:(lod_qos.c:415:lod_qos_declare_object_on()) can't declare creation on #1: -110 00020000:00001000:4.0:1681509556.564313:0:96679:0:(lod_qos.c:693:lod_check_and_reserve_ost()) can't declare new object on #1: -110 00000004:00000040:4.0:1681509556.564324:0:96679:0:(mdt_handler.c:6369:mdt_object_free()) object free, fid = [0x2400f42e8:0xc2de:0x0] 00000004:00000040:4.0:1681509556.564342:0:96679:0:(mdt_lib.c:735:mdt_fix_reply()) Shrink to md_size = 0 cookie/acl_size = 0 00000100:00001000:4.0:1681509556.564350:0:96679:0:(import.c:1941:at_measured()) add 248 to 00000000f15f2540 time=111 v=253 (253 153 23 25) 00000100:00000040:4.0:1681509556.564352:0:96679:0:(connection.c:145:ptlrpc_connection_addref()) conn=00000000cd9dc6e2 refcount 3 to 112@gni4 00000100:00000040:4.0:1681509556.564355:0:96679:0:(niobuf.c:58:ptl_send_buf()) peer_id 12345-112@gni4 00000100:00000040:4.0:1681509556.564366:0:96679:0:(lustre_net.h:1945:ptlrpc_connection_put()) PUT conn=00000000cd9dc6e2 refcount 2 to 112@gni4 00000100:00000040:4.0:1681509556.564368:0:96679:0:(lustre_net.h:2404:ptlrpc_rqphase_move()) @@@ move request phase from Interpret to Complete req@00000000d5ffd1e9 x1761995533613248/t0(0) o101->00e2d691-47e0-4194-a67c-1aa5d855ebc6@112@gni4:456/0 lens 984/600 e 8 to 0 dl 1681509561 ref 1 fl Interpret:/0/0 rc 301/301 job:'' 00000100:00100000:4.0:1681509556.564373:0:96679:0:(service.c:2368:ptlrpc_server_handle_request()) Handled RPC req@00000000d5ffd1e9 pname:cluuid+ref:pid:xid:nid:opc:job mdt01_028:00e2d691-47e0-4194-a67c-1aa5d855ebc6+191645:4033:x1761995533613248:12345-112@gni4:101: Request processed in 247828965us (247829037us total) trans 0 rc 301/301 It looks like lod_ost_alloc_rr() logic generates OST idx for allocation always equal to a failover indexes, 0 and 1. |
| Comments |
| Comment by Gerrit Updater [ 15/May/23 ] |
|
"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50996 |
| Comment by Gerrit Updater [ 09/Jun/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50996/ |
| Comment by Peter Jones [ 09/Jun/23 ] |
|
Landed for 2.16 |
| Comment by Gerrit Updater [ 15/Sep/23 ] |
|
"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52393 |
| Comment by Gerrit Updater [ 28/Sep/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52393/ |