Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.15.6
-
None
-
3
-
9223372036854775807
Description
server and client: 2.15.6-rc1
On one of the server vms, hit following error. Operations to the fs on client side (ls/cd) have no response
dmesg on server vm
[103494.568312] ll_ost_io03_049: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [103494.568971] obd_commitrw+0x1b6/0x370 [ptlrpc] [103494.573015] tgt_brw_write+0x1374/0x1cb0 [ptlrpc] [103494.574496] ? flush_work+0x42/0x1d0 [103494.575567] ? internal_add_timer+0x42/0x70 [103494.576764] ? _cond_resched+0x15/0x30 [103494.577886] ? mutex_lock+0xe/0x30 [103494.578924] tgt_request_handle+0xccd/0x1a20 [ptlrpc] [103494.580373] ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 [ptlrpc] [103494.581948] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] [103494.583534] ptlrpc_main+0xbec/0x1530 [ptlrpc] [103494.584849] ? ptlrpc_wait_event+0x590/0x590 [ptlrpc] [103494.586306] kthread+0x134/0x150 [103494.587276] ? set_kthread_struct+0x50/0x50 [103494.588431] ret_from_fork+0x1f/0x40 [103494.589452] warn_alloc_show_mem: 9 callbacks suppressed [103494.589454] CPU: 12 PID: 527326 Comm: ll_ost_io03_049 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.27.1.el8_lustre.x86_64 #1 [103494.589461] Mem-Info: [103494.590814] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [103494.593745] active_anon:26628 inactive_anon:79658 isolated_anon:0 active_file:20525660 inactive_file:15332357 isolated_file:1 unevictable:0 dirty:1077 writeback:0 slab_reclaimable:1037863 slab_unreclaimable:649098 mapped:18143 shmem:145 pagetables:5906 bounce:0 free:355951 free_pcp:504 free_cma:0 [103494.594550] Call Trace: [103494.596747] Node 0 active_anon:106512kB inactive_anon:318632kB active_file:82104092kB inactive_file:61329428kB unevictable:0kB isolated(anon):0kB isolated(file):4kB mapped:7257 2kB dirty:4308kB writeback:0kB shmem:580kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:36384kB pagetables:23624kB all_unreclaimable? no [103494.604648] dump_stack+0x41/0x60 [103494.605501] Node 0 [103494.611172] warn_alloc.cold.127+0x7b/0x108 [103494.612174] DMA free:11264kB min:4kB low:16kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB man aged:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [103494.612954] __alloc_pages_slowpath+0xcb2/0xcd0 [103494.614138] lowmem_reserve[]: [103494.618870] ? __alloc_pages_nodemask+0x166/0x330 [103494.620186] 0 [103494.621005] __alloc_pages_nodemask+0x2e2/0x330 [103494.622224] 913 [103494.622937] kmalloc_order+0x28/0x90 [103494.624247] 150007 [103494.624881] kmalloc_order_trace+0x1d/0xb0 [103494.626008] 150007 [103494.626623] __kmalloc+0x203/0x250 [103494.627813] 150007 [103494.628630] virtqueue_add+0x493/0xc70 [103494.630530] ? finish_wait+0x80/0x80 [103494.631671] Node 0 [103494.632312] virtqueue_add_sgs+0x80/0xa0 [103494.633363] DMA32 free:596392kB min:300kB low:1232kB high:2164kB active_anon:380kB inactive_anon:4kB active_file:12kB inactive_file:100kB unevictable:0kB writepending:0kB present:2080608kB managed:966496kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [103494.634050] __virtscsi_add_cmd+0x148/0x270 [virtio_scsi] [103494.635242] lowmem_reserve[]: 0 [103494.639370] ? scsi_alloc_sgtables+0x84/0x1c0 [103494.640900] 0 [103494.641787] virtscsi_add_cmd+0x38/0xa0 [virtio_scsi] [103494.643086] 149093 [103494.643847] virtscsi_queuecommand+0x186/0x2d0 [virtio_scsi] [103494.645268] 149093 [103494.646125] scsi_queue_rq+0x512/0xb10 [103494.647693] 149093 [103494.648548] __blk_mq_try_issue_directly+0x163/0x200 [103494.650565] blk_mq_request_issue_directly+0x4e/0xb0 [103494.651855] Node 0 [103494.652620] blk_mq_try_issue_list_directly+0x62/0x100 [103494.653645] Normal free:791628kB min:49268kB low:201940kB high:354612kB active_anon:106132kB inactive_anon:318612kB active_file:82124008kB inactive_file:61333432kB unevictable: 0kB writepending:4308kB present:155189248kB managed:152680600kB mlocked:0kB bounce:0kB free_pcp:1968kB local_pcp:4kB free_cma:0kB [103494.654481] blk_mq_sched_insert_requests+0xa4/0xf0 [103494.655525] lowmem_reserve[]: [103494.661555] blk_mq_flush_plug_list+0x135/0x220 [103494.662784] 0 [103494.663652] blk_flush_plug_list+0xd7/0x100 [103494.664946] 0 [103494.665661] blk_finish_plug+0x25/0x36 [103494.666908] 0 [103494.667536] osd_do_bio.constprop.49+0xd86/0xeb0 [osd_ldiskfs] [103494.668645] 0 [103494.669357] ? ldiskfs_map_blocks+0x607/0x610 [ldiskfs] [103494.670845] 0 [103494.671568] ? osd_ldiskfs_map_inode_pages+0x8c0/0x930 [osd_ldiskfs] [103494.673707] osd_ldiskfs_map_inode_pages+0x8c0/0x930 [osd_ldiskfs] [103494.675277] Node 0 [103494.675954] osd_write_commit+0x5e2/0x990 [osd_ldiskfs] [103494.677481] DMA: [103494.678198] ofd_commitrw_write+0x77e/0x1ad0 [ofd] [103494.679523] 0*4kB [103494.680255] ofd_commitrw+0x5b4/0xd20 [ofd] [103494.681504] 0*8kB [103494.682154] ? obd_commitrw+0x1b6/0x370 [ptlrpc] [103494.683263] 0*16kB [103494.684077] obd_commitrw+0x1b6/0x370 [ptlrpc] [103494.685340] 0*32kB [103494.686141] tgt_brw_write+0x1374/0x1cb0 [ptlrpc] [103494.687254] 0*64kB [103494.688000] ? newidle_balance+0x2b6/0x3b0 [103494.689268] 0*128kB [103494.690085] ? internal_add_timer+0x42/0x70 [103494.691203] 0*256kB [103494.691952] ? _cond_resched+0x15/0x30 [103494.693140] 0*512kB [103494.693836] ? mutex_lock+0xe/0x30 [103494.694927] 1*1024kB [103494.695689] tgt_request_handle+0xccd/0x1a20 [ptlrpc] [103494.696614] (U) [103494.697398] ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 [ptlrpc] [103494.698750] 1*2048kB [103494.699398] ptlrpc_server_handle_request+0x323/0xbe0 [ptlrpc] [103494.700870] (M) [103494.701709] ? finish_wait+0x80/0x80 [103494.703192] 2*4096kB [103494.703760] ptlrpc_main+0xbec/0x1530 [ptlrpc] [103494.704514] (M) [103494.705099] ? ptlrpc_wait_event+0x590/0x590 [ptlrpc] [103494.705966] = 11264kB [103494.706497] kthread+0x134/0x150 [103494.707457] Node 0 [103494.708035] ? set_kthread_struct+0x50/0x50 [103494.708730] DMA32: [103494.709417] ret_from_fork+0x1f/0x40 [103494.710522] 34*4kB (UM) 36*8kB (UMH) 84*16kB (UMEH) 35*32kB (MEH) 26*64kB (UMH) 36*128kB (UMH) 36*256kB (UM) 35*512kB (UMEH) 31*1024kB (UM) 34*2048kB (UM) 112*4096kB (UM) = 596424kB [103494.716318] Node 0 Normal: 146273*4kB (UME) 21305*8kB (UME) 2810*16kB (UMEH) 1*32kB (H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 800524kB [103494.719426] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [103494.721471] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [103494.723325] 35857993 total pagecache pages [103494.724499] 1252 pages in swap cache [103494.725624] Swap cache stats: add 1899282, delete 1898069, find 114983/139707 [103494.727326] Free swap = 4434568kB [103494.728247] Total swap = 11075580kB [103494.729078] 39321462 pages RAM [103494.729880] 0 pages HighMem/MovableOnly [103494.731082] 905848 pages reserved [103494.732085] 0 pages hwpoisoned [126130.353319] mlx5_core 0000:03:00.0: temp_warn:173:(pid 0): High temperature on sensors with bit set 1 8000000000000000 [126130.367840] mlx5_core 0000:04:00.0: temp_warn:173:(pid 0): High temperature on sensors with bit set 1 8000000000000000 [126274.777416] LustreError: 185190:0:(ldlm_lockd.c:261:expired_lock_main()) ### lock callback timer expired after 82s: evicting client at 172.25.80.23@tcp ns: mdt-sfa18k03-MDT0001_UUID lock: 00000000fb44c62b/0xcf4e9953b3f4d8c1 lrc: 3/0,0 mode: PW/PW res: [0x28003bd3e:0x19a11:0x0].0x0 bits 0x41/0x0 rrc: 6 type: IBT gid 0 flags: 0x60200400000020 nid: 172.25.80.23@tcp remote: 0x5876d76d8d3e86a3 expref: 2566 pid: 531728 timeout: 126263 lvb_type: 0 [126274.787334] LustreError: 185190:0:(client.c:1256:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@00000000ef2e8167 x1816365546510848/t0(0) o104->sfa18k03-MDT0001@172.25.80.23@tcp:15/16 lens 328/224 e 0 to 0 dl 0 ref 1 fl Rpc:QU/0/ffffffff rc 0/-1 job:''
Attachments
Issue Links
- is related to
-
LU-18225 LDISKFS-fs: initial error at time 1724092382: ldiskfs_generic_delete_entry
-
- Reopened
-
This looks similar to the allocation failures in virtio scsi in LU-18225.