Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18487

ll_ost_io03_049: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.15.6
    • None
    • 3
    • 9223372036854775807

    Description

      server and client: 2.15.6-rc1

      On one of the server vms, hit following error. Operations to the fs on client side (ls/cd) have no response

      dmesg on server vm

      [103494.568312] ll_ost_io03_049: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
      [103494.568971]  obd_commitrw+0x1b6/0x370 [ptlrpc]
      [103494.573015]  tgt_brw_write+0x1374/0x1cb0 [ptlrpc]
      [103494.574496]  ? flush_work+0x42/0x1d0
      [103494.575567]  ? internal_add_timer+0x42/0x70
      [103494.576764]  ? _cond_resched+0x15/0x30
      [103494.577886]  ? mutex_lock+0xe/0x30
      [103494.578924]  tgt_request_handle+0xccd/0x1a20 [ptlrpc]
      [103494.580373]  ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 [ptlrpc]
      [103494.581948]  ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc]
      [103494.583534]  ptlrpc_main+0xbec/0x1530 [ptlrpc]
      [103494.584849]  ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
      [103494.586306]  kthread+0x134/0x150
      [103494.587276]  ? set_kthread_struct+0x50/0x50
      [103494.588431]  ret_from_fork+0x1f/0x40
      [103494.589452] warn_alloc_show_mem: 9 callbacks suppressed
      [103494.589454] CPU: 12 PID: 527326 Comm: ll_ost_io03_049 Kdump: loaded Tainted: G           OE     -------- -  - 4.18.0-553.27.1.el8_lustre.x86_64 #1
      [103494.589461] Mem-Info:
      [103494.590814] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
      [103494.593745] active_anon:26628 inactive_anon:79658 isolated_anon:0
                       active_file:20525660 inactive_file:15332357 isolated_file:1
                       unevictable:0 dirty:1077 writeback:0
                       slab_reclaimable:1037863 slab_unreclaimable:649098
                       mapped:18143 shmem:145 pagetables:5906 bounce:0
                       free:355951 free_pcp:504 free_cma:0
      [103494.594550] Call Trace:
      [103494.596747] Node 0 active_anon:106512kB inactive_anon:318632kB active_file:82104092kB inactive_file:61329428kB unevictable:0kB isolated(anon):0kB isolated(file):4kB mapped:7257
      2kB dirty:4308kB writeback:0kB shmem:580kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:36384kB pagetables:23624kB all_unreclaimable? no
      [103494.604648]  dump_stack+0x41/0x60
      [103494.605501] Node 0 
      [103494.611172]  warn_alloc.cold.127+0x7b/0x108
      [103494.612174] DMA free:11264kB min:4kB low:16kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB man
      aged:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
      [103494.612954]  __alloc_pages_slowpath+0xcb2/0xcd0
      [103494.614138] lowmem_reserve[]:
      [103494.618870]  ? __alloc_pages_nodemask+0x166/0x330
      [103494.620186]  0
      [103494.621005]  __alloc_pages_nodemask+0x2e2/0x330
      [103494.622224]  913
      [103494.622937]  kmalloc_order+0x28/0x90
      [103494.624247]  150007
      [103494.624881]  kmalloc_order_trace+0x1d/0xb0
      [103494.626008]  150007
      [103494.626623]  __kmalloc+0x203/0x250
      [103494.627813]  150007
      [103494.628630]  virtqueue_add+0x493/0xc70
      
      [103494.630530]  ? finish_wait+0x80/0x80
      [103494.631671] Node 0 
      [103494.632312]  virtqueue_add_sgs+0x80/0xa0
      [103494.633363] DMA32 free:596392kB min:300kB low:1232kB high:2164kB active_anon:380kB inactive_anon:4kB active_file:12kB inactive_file:100kB unevictable:0kB writepending:0kB present:2080608kB managed:966496kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
      [103494.634050]  __virtscsi_add_cmd+0x148/0x270 [virtio_scsi]
      [103494.635242] lowmem_reserve[]: 0
      [103494.639370]  ? scsi_alloc_sgtables+0x84/0x1c0
      [103494.640900]  0
      [103494.641787]  virtscsi_add_cmd+0x38/0xa0 [virtio_scsi]
      [103494.643086]  149093
      [103494.643847]  virtscsi_queuecommand+0x186/0x2d0 [virtio_scsi]
      [103494.645268]  149093
      [103494.646125]  scsi_queue_rq+0x512/0xb10
      [103494.647693]  149093
      [103494.648548]  __blk_mq_try_issue_directly+0x163/0x200
      
      [103494.650565]  blk_mq_request_issue_directly+0x4e/0xb0
      [103494.651855] Node 0 
      [103494.652620]  blk_mq_try_issue_list_directly+0x62/0x100
      [103494.653645] Normal free:791628kB min:49268kB low:201940kB high:354612kB active_anon:106132kB inactive_anon:318612kB active_file:82124008kB inactive_file:61333432kB unevictable:
      0kB writepending:4308kB present:155189248kB managed:152680600kB mlocked:0kB bounce:0kB free_pcp:1968kB local_pcp:4kB free_cma:0kB
      [103494.654481]  blk_mq_sched_insert_requests+0xa4/0xf0
      [103494.655525] lowmem_reserve[]:
      [103494.661555]  blk_mq_flush_plug_list+0x135/0x220
      [103494.662784]  0
      [103494.663652]  blk_flush_plug_list+0xd7/0x100
      [103494.664946]  0
      [103494.665661]  blk_finish_plug+0x25/0x36
      [103494.666908]  0
      [103494.667536]  osd_do_bio.constprop.49+0xd86/0xeb0 [osd_ldiskfs]
      [103494.668645]  0
      [103494.669357]  ? ldiskfs_map_blocks+0x607/0x610 [ldiskfs]
      [103494.670845]  0
      [103494.671568]  ? osd_ldiskfs_map_inode_pages+0x8c0/0x930 [osd_ldiskfs]
      
      [103494.673707]  osd_ldiskfs_map_inode_pages+0x8c0/0x930 [osd_ldiskfs]
      [103494.675277] Node 0 
      [103494.675954]  osd_write_commit+0x5e2/0x990 [osd_ldiskfs]
      [103494.677481] DMA: 
      [103494.678198]  ofd_commitrw_write+0x77e/0x1ad0 [ofd]
      [103494.679523] 0*4kB 
      [103494.680255]  ofd_commitrw+0x5b4/0xd20 [ofd]
      [103494.681504] 0*8kB 
      [103494.682154]  ? obd_commitrw+0x1b6/0x370 [ptlrpc]
      [103494.683263] 0*16kB 
      [103494.684077]  obd_commitrw+0x1b6/0x370 [ptlrpc]
      [103494.685340] 0*32kB 
      [103494.686141]  tgt_brw_write+0x1374/0x1cb0 [ptlrpc]
      [103494.687254] 0*64kB 
      [103494.688000]  ? newidle_balance+0x2b6/0x3b0
      [103494.689268] 0*128kB 
      [103494.690085]  ? internal_add_timer+0x42/0x70
      [103494.691203] 0*256kB 
      [103494.691952]  ? _cond_resched+0x15/0x30
      [103494.693140] 0*512kB 
      [103494.693836]  ? mutex_lock+0xe/0x30
      [103494.694927] 1*1024kB 
      [103494.695689]  tgt_request_handle+0xccd/0x1a20 [ptlrpc]
      [103494.696614] (U) 
      [103494.697398]  ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 [ptlrpc]
      [103494.698750] 1*2048kB 
      [103494.699398]  ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc]
      [103494.700870] (M) 
      [103494.701709]  ? finish_wait+0x80/0x80
      [103494.703192] 2*4096kB 
      [103494.703760]  ptlrpc_main+0xbec/0x1530 [ptlrpc]
      [103494.704514] (M) 
      [103494.705099]  ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
      [103494.705966] = 11264kB
      [103494.706497]  kthread+0x134/0x150
      [103494.707457] Node 0 
      [103494.708035]  ? set_kthread_struct+0x50/0x50
      [103494.708730] DMA32: 
      [103494.709417]  ret_from_fork+0x1f/0x40
      [103494.710522] 34*4kB (UM) 36*8kB (UMH) 84*16kB (UMEH) 35*32kB (MEH) 26*64kB (UMH) 36*128kB (UMH) 36*256kB (UM) 35*512kB (UMEH) 31*1024kB (UM) 34*2048kB (UM) 112*4096kB (UM) = 596424kB
      [103494.716318] Node 0 Normal: 146273*4kB (UME) 21305*8kB (UME) 2810*16kB (UMEH) 1*32kB (H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 800524kB
      [103494.719426] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      [103494.721471] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      [103494.723325] 35857993 total pagecache pages
      [103494.724499] 1252 pages in swap cache
      [103494.725624] Swap cache stats: add 1899282, delete 1898069, find 114983/139707
      [103494.727326] Free swap  = 4434568kB
      [103494.728247] Total swap = 11075580kB
      [103494.729078] 39321462 pages RAM
      [103494.729880] 0 pages HighMem/MovableOnly
      [103494.731082] 905848 pages reserved
      [103494.732085] 0 pages hwpoisoned
      [126130.353319] mlx5_core 0000:03:00.0: temp_warn:173:(pid 0): High temperature on sensors with bit set 1 8000000000000000
      [126130.367840] mlx5_core 0000:04:00.0: temp_warn:173:(pid 0): High temperature on sensors with bit set 1 8000000000000000
      [126274.777416] LustreError: 185190:0:(ldlm_lockd.c:261:expired_lock_main()) ### lock callback timer expired after 82s: evicting client at 172.25.80.23@tcp  ns: mdt-sfa18k03-MDT0001_UUID lock: 00000000fb44c62b/0xcf4e9953b3f4d8c1 lrc: 3/0,0 mode: PW/PW res: [0x28003bd3e:0x19a11:0x0].0x0 bits 0x41/0x0 rrc: 6 type: IBT gid 0 flags: 0x60200400000020 nid: 172.25.80.23@tcp remote: 0x5876d76d8d3e86a3 expref: 2566 pid: 531728 timeout: 126263 lvb_type: 0
      [126274.787334] LustreError: 185190:0:(client.c:1256:ptlrpc_import_delay_req()) @@@ IMP_CLOSED  req@00000000ef2e8167 x1816365546510848/t0(0) o104->sfa18k03-MDT0001@172.25.80.23@tcp:15/16 lens 328/224 e 0 to 0 dl 0 ref 1 fl Rpc:QU/0/ffffffff rc 0/-1 job:''
      
      

      Attachments

        Issue Links

          Activity

            [LU-18487] ll_ost_io03_049: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0

            This looks similar to the allocation failures in virtio scsi in LU-18225.

            adilger Andreas Dilger added a comment - This looks similar to the allocation failures in virtio scsi in LU-18225 .

            People

              wc-triage WC Triage
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: