[LU-18225] LDISKFS-fs: initial error at time 1724092382: ldiskfs_generic_delete_entry - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.16.0
Labels:
- HWC

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

loading build from https://build.whamcloud.com/job/lustre-reviews/107525/ on both server and clients of WR-SOAK, with no HA enabled, soak hit following issue

[63239.265986] Lustre: Lustre: Build Version: 2.15.90_24_ge863f95f

both VMs from the 2nd controller rebooted, see following errors in console log

vcontroller vm0

[63664.348413] Lustre: sfa18k03-OST0025: new connection from sfa18k03-MDT0003-mdtlov (cleaning up unused objects from 0x340000401:20963886 to 0x340000401:20968833)
[63664.813575] Lustre: sfa18k03-OST0026: new connection from sfa18k03-MDT0003-mdtlov (cleaning up unused objects from 0xc40000400:20942873 to 0xc40000400:20948833)
[63664.817519] Lustre: sfa18k03-OST0024: new connection from sfa18k03-MDT0003-mdtlov (cleaning up unused objects from 0x5c0000400:20671906 to 0x5c0000400:20675073)
[63670.858601] Lustre: sfa18k03-OST0024: new connection from sfa18k03-MDT0001-mdtlov (cleaning up unused objects from 0x5c0000401:19996851 to 0x5c0000401:19999185)
[63670.878393] Lustre: sfa18k03-OST001c: new connection from sfa18k03-MDT0001-mdtlov (cleaning up unused objects from 0x780000400:19109574 to 0x780000400:19111665)
[63705.015859] Lustre: sfa18k03-OST0029-osc-MDT0001: Connection restored to 172.25.80.53@tcp (at 172.25.80.53@tcp)
[63705.019483] Lustre: Skipped 14 previous similar messages
[63711.321154] LustreError: sfa18k03-OST0031-osc-MDT0001: operation ost_connect to node 172.25.80.53@tcp failed: rc = -19
[63711.323272] LustreError: Skipped 10 previous similar messages
[63936.604947] LDISKFS-fs (sdao): error count since last fsck: 38
[63936.604946] LDISKFS-fs (sdap): error count since last fsck: 2
[63936.604966] LDISKFS-fs (sdap): initial error at time 1723874458: ldiskfs_find_dest_de:2297
[63936.609560] LDISKFS-fs (sdao): initial error at time 1723660739: ldiskfs_find_dest_de:2297
[63936.610905] : inode 182190082
[63936.612597] : inode 73138214
[63936.614252] : block 5830242584
[63936.615147] : block 60217102
[63936.615989] 
[63936.616849] 
[63936.617671] LDISKFS-fs (sdap): last error at time 1723874458: ldiskfs_evict_inode:257
[63936.618286] LDISKFS-fs (sdao): last error at time 1723878231: ldiskfs_evict_inode:257
[63936.618893] 
[63936.620394] 
[75516.204047] LustreError: sfa18k03-OST002c: not available for connect from 172.25.80.50@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[75516.206751] LustreError: sfa18k03-OST002b-osc-MDT0001: operation ost_statfs to node 172.25.80.53@tcp failed: rc = -107
[75516.206773] Lustre: sfa18k03-MDT0003-osp-MDT0001: Connection to sfa18k03-MDT0003 (at 172.25.80.53@tcp) was lost; in progress operations using this service will wait for recovery to complete
[75516.206775] Lustre: Skipped 1 previous similar message
[75516.209189] LustreError: Skipped 238 previous similar messages
[75516.211099] LustreError: Skipped 10 previous similar messages
[ESC[0;32m  OK  ESC[0m] Stopped target resource-agents dependencies.
         Stopping Restore /run/initramfs on shutdown...
         Stopping LVM event activation on device 8:0...
         Stopping LVM event activation on device 8:48...
         Stopping LVM event activation on device 252:3...
[ESC[0;32m  OK  ESC[0m] Stopped target rpc_pipefs.target.
         Unmounting RPC Pipe File System...
         Stopping LVM event activation on device 8:32...
         Stopping LVM event activation on device 252:2...
         Stopping LVM event activation on device 8:64...
         Stopping Hostname Service...
[ESC[0;32m  OK  ESC[0m] Stopped t[ESC[0;32m  OK  ESC[0m] Stopped irqbalance daemon.
[ESC[0;32m  OK  ESC[0m] Stopped Self Monitoring and Reporting Technology (SMART) Daemon.
[ESC[0;32m  OK  ESC[0m] Stopped Prometheus exporter for Lustre filesystem.
[ESC[0;32m  OK  ESC[0m] Stopped Hostname Service.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

debugfs-bd
16 kB
08/Oct/24 5:01 PM
debugfs-c-R-1014
15 kB
14/Oct/24 5:29 PM
debugfs-Rbd
16 kB
15/Oct/24 4:13 PM
debugfs-stat
15 kB
08/Oct/24 5:01 PM
e2fsck-fn-1015
3 kB
15/Oct/24 4:13 PM
e2fsck-output
12 kB
18/Sep/24 9:59 PM
messages-20240923.gz
1.11 MB
04/Oct/24 5:40 PM
sdao-dump
2 kB
08/Oct/24 5:01 PM

Issue Links

is related to

LU-18487 ll_ost_io03_049: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0

Open

is related to

LU-17711 ldiskfs corruption on el9 (dx_probe: Corrupt directory)

Resolved

Activity

[LU-18225] LDISKFS-fs: initial error at time 1724092382: ldiskfs_generic_delete_entry

Sarah Liu added a comment - 16/Oct/24 4:26 PM

sure, I will load the patch and update the ticket if it happens again

Sarah Liu added a comment - 16/Oct/24 4:26 PM sure, I will load the patch and update the ticket if it happens again

Dongyang Li added a comment - 16/Oct/24 3:51 AM

Looks like the block content has already been changed since Sep 24, the new name len is now 6:

4120  1ef4 7401 1000 0601 6631 6239 6630 0000  ..t.....f1b9f0..

I guess after the e2fsck the corrupted data is all gone.
I just made a debug patch, which should give us more info when the error is triggered again. could you give that a try? Thanks

Dongyang Li added a comment - 16/Oct/24 3:51 AM Looks like the block content has already been changed since Sep 24, the new name len is now 6: 4120 1ef4 7401 1000 0601 6631 6239 6630 0000 ..t.....f1b9f0.. I guess after the e2fsck the corrupted data is all gone. I just made a debug patch, which should give us more info when the error is triggered again. could you give that a try? Thanks

Gerrit Updater added a comment - 16/Oct/24 3:28 AM

"Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56706
Subject: LU-18225 ldiskfs: add ext4-de-debug.patch
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 738d9c83967e75370ae333a1b43cac5d7536f74f

Gerrit Updater added a comment - 16/Oct/24 3:28 AM "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56706 Subject: LU-18225 ldiskfs: add ext4-de-debug.patch Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 738d9c83967e75370ae333a1b43cac5d7536f74f

Sarah Liu added a comment - 15/Oct/24 3:49 PM - edited

I can collect the info. the server vm is 10.25.80.52

please check debugfs-Rbd and e2fsck-fn-1015

Sarah Liu added a comment - 15/Oct/24 3:49 PM - edited I can collect the info. the server vm is 10.25.80.52 please check debugfs-Rbd and e2fsck-fn-1015

Dongyang Li added a comment - 15/Oct/24 12:03 AM

And this the same inode with the same block 2349208195, with the same offset 2128.
could we dump the block with

debugfs -R "bd 2349208195" /dev/sdao

and the output of read-only e2fsck, need to umount sdao first

e2fsck -fn -m [num of cpus] /dev/sdao

And if it's easier, could you point me to the server and I can login and collect/debug there?

Dongyang Li added a comment - 15/Oct/24 12:03 AM And this the same inode with the same block 2349208195, with the same offset 2128. could we dump the block with debugfs -R "bd 2349208195" /dev/sdao and the output of read-only e2fsck, need to umount sdao first e2fsck -fn -m [num of cpus] /dev/sdao And if it's easier, could you point me to the server and I can login and collect/debug there?

Sarah Liu added a comment - 14/Oct/24 5:28 PM

I have the same ldiskfs error on same node, same sdao and inode on Sep 24

[root@snp-18k-03-v3 log]# less messages|grep "LDISKFS-fs er"
Sep 24 13:30:57 snp-18k-03-v3 kernel: LDISKFS-fs error (device sdao): ldiskfs_find_dest_de:2295: inode #73400343: block 2349208195: comm ll_ost03_040: bad entry in directory: rec_len is too small for name_len - offset=2128, inode=24441886, rec_len=16, name_len=3, size=4096

upload the output of "debugfs -c -R 'stat <73400343>' /dev/sdao", I have not run e2fsck, please let me know any other information is needed.

Sarah Liu added a comment - 14/Oct/24 5:28 PM I have the same ldiskfs error on same node, same sdao and inode on Sep 24 [root@snp-18k-03-v3 log]# less messages|grep "LDISKFS-fs er" Sep 24 13:30:57 snp-18k-03-v3 kernel: LDISKFS-fs error (device sdao): ldiskfs_find_dest_de:2295: inode #73400343: block 2349208195: comm ll_ost03_040: bad entry in directory: rec_len is too small for name_len - offset=2128, inode=24441886, rec_len=16, name_len=3, size=4096 upload the output of "debugfs -c -R 'stat <73400343>' /dev/sdao", I have not run e2fsck, please let me know any other information is needed.

Gerrit Updater added a comment - 10/Oct/24 10:04 AM

"Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56644
Subject: LU-18225 kernel: silent page allocation failure in virtio_scsi
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 01cfbb23defae93dbdeef344ce6d9687410d8b4c

Gerrit Updater added a comment - 10/Oct/24 10:04 AM "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56644 Subject: LU-18225 kernel: silent page allocation failure in virtio_scsi Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 01cfbb23defae93dbdeef344ce6d9687410d8b4c

Andreas Dilger added a comment - 10/Oct/24 7:24 AM

If the allocation failure is handled transparently at a higher level, then using __GFP_NOWARN to quiet this failure is fine. Possibly that should also go upstream, or is there already a patch fixing this code that could be backported?

It looks like the GFP flag is passed down from the caller through a few layers, so it is a bit tricky to know where is the right place to add __GFP_NOWARN. Ideally, this would be done at the same layer that is doing the allocation fallback, since it knows that the allocation failure is not harmful.

Andreas Dilger added a comment - 10/Oct/24 7:24 AM If the allocation failure is handled transparently at a higher level, then using __GFP_NOWARN to quiet this failure is fine. Possibly that should also go upstream, or is there already a patch fixing this code that could be backported? It looks like the GFP flag is passed down from the caller through a few layers, so it is a bit tricky to know where is the right place to add __GFP_NOWARN . Ideally, this would be done at the same layer that is doing the allocation fallback, since it knows that the allocation failure is not harmful.

Dongyang Li added a comment - 09/Oct/24 10:40 PM

Yes the allocation failure is a separate problem that needs looking into.

Dongyang Li added a comment - 09/Oct/24 10:40 PM Yes the allocation failure is a separate problem that needs looking into.

Sarah Liu added a comment - 09/Oct/24 4:53 PM

hi dongyang, no, there was no ldiskfs error on vm1. After I ran e2fsck and restarted the test, there was no ldiskfs error on any server vm when it was timeout. sorry I didn't recall the output of e2fsck.

I rebooted all the server vms since the fs was no response(ls hung). it ran overnight, testing seems ongoing but I found on all server vms, there are page allocation error, dmesg like
should I open new ticket for this?

[16247.823902] Lustre: 68562:0:(mdt_recovery.c:148:mdt_req_from_lrd()) @@@ restoring transno  req@000000001dd58399 x1811746499175296/t523986305680(0) o101->7ec4b321-fc8d-49b2-9fc1-4ae12b6cd4c0@172.25.80.33@tcp:657/0 lens 376/1784 e 0 to 0 dl 1728427442 ref 1 fl Interpret:H/202/0 rc 0/0 job:'' uid:0 gid:0
[22473.618001] Lustre: 127072:0:(mdt_recovery.c:148:mdt_req_from_lrd()) @@@ restoring transno  req@00000000abb6e3d3 x1811759166812672/t523989095606(0) o101->ce0abb5d-3cd7-43eb-bb94-be1985fd7ad4@172.25.80.22@tcp:63/0 lens 376/1864 e 0 to 0 dl 1728433643 ref 1 fl Interpret:H/202/0 rc 0/0 job:'' uid:0 gid:0
[22474.401775] Lustre: 129905:0:(mdt_recovery.c:148:mdt_req_from_lrd()) @@@ restoring transno  req@000000007252b13d x1811759170103296/t523989095619(0) o101->ce0abb5d-3cd7-43eb-bb94-be1985fd7ad4@172.25.80.22@tcp:64/0 lens 376/1864 e 0 to 0 dl 1728433644 ref 1 fl Interpret:H/202/0 rc 0/0 job:'' uid:0 gid:0
[22581.834967] ll_ost_io02_039: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[22581.838485] CPU: 8 PID: 127347 Comm: ll_ost_io02_039 Kdump: loaded Tainted: G           OE     -------- -  - 4.18.0-553.16.1.el8_lustre.x86_64 #1
[22581.841877] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[22581.844339] Call Trace:
[22581.845495]  dump_stack+0x41/0x60
[22581.846780]  warn_alloc.cold.127+0x7b/0x108
[22581.848191]  __alloc_pages_slowpath+0xcb2/0xcd0
[22581.849511]  __alloc_pages_nodemask+0x2e2/0x330
[22581.850575]  kmalloc_order+0x28/0x90
[22581.851486]  kmalloc_order_trace+0x1d/0xb0
[22581.852635]  __kmalloc+0x25c/0x2d0
[22581.853670]  virtqueue_add+0x493/0xc70
[22581.854731]  ? finish_wait+0x80/0x80
[22581.855904]  virtqueue_add_sgs+0x80/0xa0
[22581.857186]  __virtscsi_add_cmd+0x148/0x270 [virtio_scsi]
[22581.858910]  ? scsi_alloc_sgtables+0x84/0x1c0
[22581.860249]  virtscsi_add_cmd+0x38/0xa0 [virtio_scsi]
[22581.861689]  virtscsi_queuecommand+0x186/0x2d0 [virtio_scsi]
[22581.863242]  scsi_queue_rq+0x512/0xb10
[22581.864494]  __blk_mq_try_issue_directly+0x163/0x200
[22581.866006]  blk_mq_request_issue_directly+0x4e/0xb0
[22581.867558]  blk_mq_try_issue_list_directly+0x62/0x100
[22581.869078]  blk_mq_sched_insert_requests+0xa4/0xf0
[22581.870562]  blk_mq_flush_plug_list+0x135/0x220
[22581.871979]  blk_flush_plug_list+0xd7/0x100
[22581.873287]  blk_finish_plug+0x25/0x36
[22581.874491]  osd_do_bio+0x7f1/0xc70 [osd_ldiskfs]
[22581.876012]  ? ldiskfs_map_blocks+0x607/0x610 [ldiskfs]
[22581.877596]  ? osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs]
[22581.879360]  osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs]
[22581.881043]  osd_write_commit+0x4c6/0x990 [osd_ldiskfs]
[22581.882571]  ofd_commitrw_write+0x9df/0x1b00 [ofd]
[22581.884021]  ofd_commitrw+0x5dc/0xd80 [ofd]
[22581.885272]  ? obd_commitrw+0x1b6/0x370 [ptlrpc]
[22581.887103]  obd_commitrw+0x1b6/0x370 [ptlrpc]
[22581.888655]  tgt_brw_write+0xf54/0x1f90 [ptlrpc]
[22581.890213]  ? _cond_resched+0x15/0x30
[22581.891400]  ? flush_work+0x42/0x1d0
[22581.892500]  ? internal_add_timer+0x42/0x70
[22581.893720]  ? _cond_resched+0x15/0x30
[22581.894884]  ? mutex_lock+0xe/0x30
[22581.895949]  tgt_request_handle+0x3f4/0x1a30 [ptlrpc]
[22581.897495]  ? ptlrpc_update_export_timer+0x3d/0x500 [ptlrpc]
[22581.899225]  ptlrpc_server_handle_request+0x2aa/0xcf0 [ptlrpc]
[22581.900939]  ? lprocfs_counter_add+0x10e/0x180 [obdclass]
[22581.902509]  ptlrpc_main+0xc9e/0x15c0 [ptlrpc]
[22581.903979]  ? ptlrpc_wait_event+0x5b0/0x5b0 [ptlrpc]
[22581.904744] ll_ost_io01_034: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[22581.905528]  kthread+0x134/0x150
[22581.907356] ll_ost_io03_029: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[22581.909345] ll_ost_io01_064: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[22581.911672]  ? set_kthread_struct+0x50/0x50
[22581.912135] ll_ost_io02_073: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[22581.917402]  ret_from_fork+0x1f/0x40
[22581.918250] Mem-Info:
[22581.918251] CPU: 12 PID: 127683 Comm: ll_ost_io03_029 Kdump: loaded Tainted: G           OE     -------- -  - 4.18.0-553.16.1.el8_lustre.x86_64 #1
[22581.918886] active_anon:9586 inactive_anon:235852 isolated_anon:0
                active_file:26214948 inactive_file:9997372 isolated_file:34
                unevictable:0 dirty:172 writeback:0
                slab_reclaimable:462546 slab_unreclaimable:544066
                mapped:29737 shmem:197 pagetables:3250 bounce:0
                free:558197 free_pcp:4297 free_cma:0
[22581.921880] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[22581.928597] Node 0 active_anon:38344kB inactive_anon:943408kB active_file:104869260kB inactive_file:39989456kB unevictable:0kB isolated(anon):0kB isolated(file):620kB mapped:118948kB dirty:688kB writeback:0kB shmem:788kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:34656kB pagetables:13000kB all_unreclaimable? no
[22581.930768] Call Trace:
[22581.936536] Node 0 DMA free:11264kB min:4kB low:16kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[22581.937430]  dump_stack+0x41/0x60
[22581.941697] lowmem_reserve[]: 0 913 150007 150007 150007
[22581.942766]  warn_alloc.cold.127+0x7b/0x108
[22581.943921] Node 0 DMA32 free:596536kB min:300kB low:1232kB high:2164kB active_anon:0kB inactive_anon:83612kB active_file:176000kB inactive_file:60624kB unevictable:0kB writepending:0kB present:2080608kB managed:966496kB mlocked:0kB bounce:0kB free_pcp:15096kB local_pcp:372kB free_cma:0kB
[22581.946334]  __alloc_pages_slowpath+0xcb2/0xcd0
[22581.951399] lowmem_reserve[]: 0 0 149093 149093 149093
[22581.952769]  __alloc_pages_nodemask+0x2e2/0x330
[22581.954004] Node 0 Normal free:1694996kB min:49268kB low:201940kB high:354612kB active_anon:38344kB inactive_anon:859796kB active_file:104623944kB inactive_file:39919140kB unevictable:0kB writepending:688kB present:155189248kB managed:152680592kB mlocked:0kB bounce:0kB free_pcp:2208kB local_pcp:24kB free_cma:0kB
[22581.955501]  kmalloc_order+0x28/0x90
[22581.961168] lowmem_reserve[]: 0 0 0 0 0
[22581.962424]  kmalloc_order_trace+0x1d/0xb0
[22581.963433] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 
[22581.964729]  __kmalloc+0x25c/0x2d0
[22581.964734] 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
[22581.967434]  virtqueue_add+0x493/0xc70
[22581.968864] Node 0 DMA32: 8556*4kB (UME) 4961*8kB (UME) 2066*16kB (ME) 
[22581.970182]  virtqueue_add_sgs+0x80/0xa0
[22581.970186] 725*32kB (UME) 311*64kB (UMEH) 122*128kB (UME) 73*256kB (UM) 51*512kB (UM) 17*1024kB (ME) 18*2048kB 
[22581.973014]  __virtscsi_add_cmd+0x148/0x270 [virtio_scsi]
[22581.973019] (UM) 81*4096kB (UM) = 596536kB
[22581.976714]  ? scsi_alloc_sgtables+0x84/0x1c0
[22581.977790] Node 0 Normal: 279769*4kB (UMEH) 108102*8kB (UMEH) 9366*16kB (UMH) 12*32kB 
[22581.979216]  virtscsi_add_cmd+0x38/0xa0 [virtio_scsi]
[22581.980930] (UMEH) 6*64kB (EH) 4*128kB (E) 2*256kB (E) 1*512kB (E) 0*1024kB 0*2048kB 0*4096kB = 2136052kB
[22581.982477]  virtscsi_queuecommand+0x186/0x2d0 [virtio_scsi]
[22581.984429] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[22581.986082]  scsi_queue_rq+0x512/0xb10
[22581.987909] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[22581.989257]  __blk_mq_try_issue_directly+0x163/0x200
[22581.991057] 36019216 total pagecache pages
[22581.992606]  blk_mq_request_issue_directly+0x4e/0xb0
[22581.993679] 4589 pages in swap cache
[22581.995184]  blk_mq_try_issue_list_directly+0x62/0x100
[22581.996150] Swap cache stats: add 236588, delete 231998, find 24220/31008
[22581.997572]  blk_mq_sched_insert_requests+0xa4/0xf0
[22581.999028] Free swap  = 10341124kB
[22582.000489]  blk_mq_flush_plug_list+0x135/0x220
[22582.001436] Total swap = 11075580kB
[22582.002794]  blk_flush_plug_list+0xd7/0x100
[22582.003723] 39321462 pages RAM
[22582.005003]  blk_finish_plug+0x25/0x36
[22582.005933] 0 pages HighMem/MovableOnly
[22582.007125]  osd_do_bio+0x7f1/0xc70 [osd_ldiskfs]
[22582.008103] 905850 pages reserved
[22582.009447]  ? ldiskfs_map_blocks+0x607/0x610 [ldiskfs]
[22582.010354] 0 pages hwpoisoned
[22582.011810]  ? osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs]
[22582.014521]  osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs]
[22582.016448]  osd_write_commit+0x4c6/0x990 [osd_ldiskfs]
[22582.017985]  ofd_commitrw_write+0x9df/0x1b00 [ofd]
[22582.019536]  ofd_commitrw+0x5dc/0xd80 [ofd]
[22582.020816]  ? obd_commitrw+0x1b6/0x370 [ptlrpc]
[22582.022466]  obd_commitrw+0x1b6/0x370 [ptlrpc]
[22582.023992]  tgt_brw_write+0xf54/0x1f90 [ptlrpc]
[22582.025495]  ? _cond_resched+0x15/0x30
[22582.026671]  ? flush_work+0x42/0x1d0
[22582.027752]  ? internal_add_timer+0x42/0x70
[22582.028976]  ? _cond_resched+0x15/0x30
[22582.030064]  ? mutex_lock+0xe/0x30
[22582.031072]  tgt_request_handle+0x3f4/0x1a30 [ptlrpc]
[22582.032636]  ? ptlrpc_update_export_timer+0x3d/0x500 [ptlrpc]
[22582.034103]  ptlrpc_server_handle_request+0x2aa/0xcf0 [ptlrpc]
[22582.035813]  ? lprocfs_counter_add+0x10e/0x180 [obdclass]
[22582.037335]  ptlrpc_main+0xc9e/0x15c0 [ptlrpc]
[22582.038763]  ? ptlrpc_wait_event+0x5b0/0x5b0 [ptlrpc]
[22582.040292]  kthread+0x134/0x150
[22582.041227]  ? set_kthread_struct+0x50/0x50
[22582.042376]  ret_from_fork+0x1f/0x40
[22582.043397] CPU: 9 PID: 127537 Comm: ll_ost_io02_073 Kdump: loaded Tainted: G           OE     -------- -  - 4.18.0-553.16.1.el8_lustre.x86_64 #1
[22582.045848] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[22582.047678] Call Trace:

Sarah Liu added a comment - 09/Oct/24 4:53 PM hi dongyang , no, there was no ldiskfs error on vm1. After I ran e2fsck and restarted the test, there was no ldiskfs error on any server vm when it was timeout. sorry I didn't recall the output of e2fsck. I rebooted all the server vms since the fs was no response(ls hung). it ran overnight, testing seems ongoing but I found on all server vms, there are page allocation error, dmesg like should I open new ticket for this? [16247.823902] Lustre: 68562:0:(mdt_recovery.c:148:mdt_req_from_lrd()) @@@ restoring transno req@000000001dd58399 x1811746499175296/t523986305680(0) o101->7ec4b321-fc8d-49b2-9fc1-4ae12b6cd4c0@172.25.80.33@tcp:657/0 lens 376/1784 e 0 to 0 dl 1728427442 ref 1 fl Interpret:H/202/0 rc 0/0 job:'' uid:0 gid:0 [22473.618001] Lustre: 127072:0:(mdt_recovery.c:148:mdt_req_from_lrd()) @@@ restoring transno req@00000000abb6e3d3 x1811759166812672/t523989095606(0) o101->ce0abb5d-3cd7-43eb-bb94-be1985fd7ad4@172.25.80.22@tcp:63/0 lens 376/1864 e 0 to 0 dl 1728433643 ref 1 fl Interpret:H/202/0 rc 0/0 job:'' uid:0 gid:0 [22474.401775] Lustre: 129905:0:(mdt_recovery.c:148:mdt_req_from_lrd()) @@@ restoring transno req@000000007252b13d x1811759170103296/t523989095619(0) o101->ce0abb5d-3cd7-43eb-bb94-be1985fd7ad4@172.25.80.22@tcp:64/0 lens 376/1864 e 0 to 0 dl 1728433644 ref 1 fl Interpret:H/202/0 rc 0/0 job:'' uid:0 gid:0 [22581.834967] ll_ost_io02_039: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [22581.838485] CPU: 8 PID: 127347 Comm: ll_ost_io02_039 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.16.1.el8_lustre.x86_64 #1 [22581.841877] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [22581.844339] Call Trace: [22581.845495] dump_stack+0x41/0x60 [22581.846780] warn_alloc.cold.127+0x7b/0x108 [22581.848191] __alloc_pages_slowpath+0xcb2/0xcd0 [22581.849511] __alloc_pages_nodemask+0x2e2/0x330 [22581.850575] kmalloc_order+0x28/0x90 [22581.851486] kmalloc_order_trace+0x1d/0xb0 [22581.852635] __kmalloc+0x25c/0x2d0 [22581.853670] virtqueue_add+0x493/0xc70 [22581.854731] ? finish_wait+0x80/0x80 [22581.855904] virtqueue_add_sgs+0x80/0xa0 [22581.857186] __virtscsi_add_cmd+0x148/0x270 [virtio_scsi] [22581.858910] ? scsi_alloc_sgtables+0x84/0x1c0 [22581.860249] virtscsi_add_cmd+0x38/0xa0 [virtio_scsi] [22581.861689] virtscsi_queuecommand+0x186/0x2d0 [virtio_scsi] [22581.863242] scsi_queue_rq+0x512/0xb10 [22581.864494] __blk_mq_try_issue_directly+0x163/0x200 [22581.866006] blk_mq_request_issue_directly+0x4e/0xb0 [22581.867558] blk_mq_try_issue_list_directly+0x62/0x100 [22581.869078] blk_mq_sched_insert_requests+0xa4/0xf0 [22581.870562] blk_mq_flush_plug_list+0x135/0x220 [22581.871979] blk_flush_plug_list+0xd7/0x100 [22581.873287] blk_finish_plug+0x25/0x36 [22581.874491] osd_do_bio+0x7f1/0xc70 [osd_ldiskfs] [22581.876012] ? ldiskfs_map_blocks+0x607/0x610 [ldiskfs] [22581.877596] ? osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs] [22581.879360] osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs] [22581.881043] osd_write_commit+0x4c6/0x990 [osd_ldiskfs] [22581.882571] ofd_commitrw_write+0x9df/0x1b00 [ofd] [22581.884021] ofd_commitrw+0x5dc/0xd80 [ofd] [22581.885272] ? obd_commitrw+0x1b6/0x370 [ptlrpc] [22581.887103] obd_commitrw+0x1b6/0x370 [ptlrpc] [22581.888655] tgt_brw_write+0xf54/0x1f90 [ptlrpc] [22581.890213] ? _cond_resched+0x15/0x30 [22581.891400] ? flush_work+0x42/0x1d0 [22581.892500] ? internal_add_timer+0x42/0x70 [22581.893720] ? _cond_resched+0x15/0x30 [22581.894884] ? mutex_lock+0xe/0x30 [22581.895949] tgt_request_handle+0x3f4/0x1a30 [ptlrpc] [22581.897495] ? ptlrpc_update_export_timer+0x3d/0x500 [ptlrpc] [22581.899225] ptlrpc_server_handle_request+0x2aa/0xcf0 [ptlrpc] [22581.900939] ? lprocfs_counter_add+0x10e/0x180 [obdclass] [22581.902509] ptlrpc_main+0xc9e/0x15c0 [ptlrpc] [22581.903979] ? ptlrpc_wait_event+0x5b0/0x5b0 [ptlrpc] [22581.904744] ll_ost_io01_034: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [22581.905528] kthread+0x134/0x150 [22581.907356] ll_ost_io03_029: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [22581.909345] ll_ost_io01_064: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [22581.911672] ? set_kthread_struct+0x50/0x50 [22581.912135] ll_ost_io02_073: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [22581.917402] ret_from_fork+0x1f/0x40 [22581.918250] Mem-Info: [22581.918251] CPU: 12 PID: 127683 Comm: ll_ost_io03_029 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.16.1.el8_lustre.x86_64 #1 [22581.918886] active_anon:9586 inactive_anon:235852 isolated_anon:0 active_file:26214948 inactive_file:9997372 isolated_file:34 unevictable:0 dirty:172 writeback:0 slab_reclaimable:462546 slab_unreclaimable:544066 mapped:29737 shmem:197 pagetables:3250 bounce:0 free:558197 free_pcp:4297 free_cma:0 [22581.921880] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [22581.928597] Node 0 active_anon:38344kB inactive_anon:943408kB active_file:104869260kB inactive_file:39989456kB unevictable:0kB isolated(anon):0kB isolated(file):620kB mapped:118948kB dirty:688kB writeback:0kB shmem:788kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:34656kB pagetables:13000kB all_unreclaimable? no [22581.930768] Call Trace: [22581.936536] Node 0 DMA free:11264kB min:4kB low:16kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [22581.937430] dump_stack+0x41/0x60 [22581.941697] lowmem_reserve[]: 0 913 150007 150007 150007 [22581.942766] warn_alloc.cold.127+0x7b/0x108 [22581.943921] Node 0 DMA32 free:596536kB min:300kB low:1232kB high:2164kB active_anon:0kB inactive_anon:83612kB active_file:176000kB inactive_file:60624kB unevictable:0kB writepending:0kB present:2080608kB managed:966496kB mlocked:0kB bounce:0kB free_pcp:15096kB local_pcp:372kB free_cma:0kB [22581.946334] __alloc_pages_slowpath+0xcb2/0xcd0 [22581.951399] lowmem_reserve[]: 0 0 149093 149093 149093 [22581.952769] __alloc_pages_nodemask+0x2e2/0x330 [22581.954004] Node 0 Normal free:1694996kB min:49268kB low:201940kB high:354612kB active_anon:38344kB inactive_anon:859796kB active_file:104623944kB inactive_file:39919140kB unevictable:0kB writepending:688kB present:155189248kB managed:152680592kB mlocked:0kB bounce:0kB free_pcp:2208kB local_pcp:24kB free_cma:0kB [22581.955501] kmalloc_order+0x28/0x90 [22581.961168] lowmem_reserve[]: 0 0 0 0 0 [22581.962424] kmalloc_order_trace+0x1d/0xb0 [22581.963433] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB [22581.964729] __kmalloc+0x25c/0x2d0 [22581.964734] 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB [22581.967434] virtqueue_add+0x493/0xc70 [22581.968864] Node 0 DMA32: 8556*4kB (UME) 4961*8kB (UME) 2066*16kB (ME) [22581.970182] virtqueue_add_sgs+0x80/0xa0 [22581.970186] 725*32kB (UME) 311*64kB (UMEH) 122*128kB (UME) 73*256kB (UM) 51*512kB (UM) 17*1024kB (ME) 18*2048kB [22581.973014] __virtscsi_add_cmd+0x148/0x270 [virtio_scsi] [22581.973019] (UM) 81*4096kB (UM) = 596536kB [22581.976714] ? scsi_alloc_sgtables+0x84/0x1c0 [22581.977790] Node 0 Normal: 279769*4kB (UMEH) 108102*8kB (UMEH) 9366*16kB (UMH) 12*32kB [22581.979216] virtscsi_add_cmd+0x38/0xa0 [virtio_scsi] [22581.980930] (UMEH) 6*64kB (EH) 4*128kB (E) 2*256kB (E) 1*512kB (E) 0*1024kB 0*2048kB 0*4096kB = 2136052kB [22581.982477] virtscsi_queuecommand+0x186/0x2d0 [virtio_scsi] [22581.984429] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [22581.986082] scsi_queue_rq+0x512/0xb10 [22581.987909] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [22581.989257] __blk_mq_try_issue_directly+0x163/0x200 [22581.991057] 36019216 total pagecache pages [22581.992606] blk_mq_request_issue_directly+0x4e/0xb0 [22581.993679] 4589 pages in swap cache [22581.995184] blk_mq_try_issue_list_directly+0x62/0x100 [22581.996150] Swap cache stats: add 236588, delete 231998, find 24220/31008 [22581.997572] blk_mq_sched_insert_requests+0xa4/0xf0 [22581.999028] Free swap = 10341124kB [22582.000489] blk_mq_flush_plug_list+0x135/0x220 [22582.001436] Total swap = 11075580kB [22582.002794] blk_flush_plug_list+0xd7/0x100 [22582.003723] 39321462 pages RAM [22582.005003] blk_finish_plug+0x25/0x36 [22582.005933] 0 pages HighMem/MovableOnly [22582.007125] osd_do_bio+0x7f1/0xc70 [osd_ldiskfs] [22582.008103] 905850 pages reserved [22582.009447] ? ldiskfs_map_blocks+0x607/0x610 [ldiskfs] [22582.010354] 0 pages hwpoisoned [22582.011810] ? osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs] [22582.014521] osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs] [22582.016448] osd_write_commit+0x4c6/0x990 [osd_ldiskfs] [22582.017985] ofd_commitrw_write+0x9df/0x1b00 [ofd] [22582.019536] ofd_commitrw+0x5dc/0xd80 [ofd] [22582.020816] ? obd_commitrw+0x1b6/0x370 [ptlrpc] [22582.022466] obd_commitrw+0x1b6/0x370 [ptlrpc] [22582.023992] tgt_brw_write+0xf54/0x1f90 [ptlrpc] [22582.025495] ? _cond_resched+0x15/0x30 [22582.026671] ? flush_work+0x42/0x1d0 [22582.027752] ? internal_add_timer+0x42/0x70 [22582.028976] ? _cond_resched+0x15/0x30 [22582.030064] ? mutex_lock+0xe/0x30 [22582.031072] tgt_request_handle+0x3f4/0x1a30 [ptlrpc] [22582.032636] ? ptlrpc_update_export_timer+0x3d/0x500 [ptlrpc] [22582.034103] ptlrpc_server_handle_request+0x2aa/0xcf0 [ptlrpc] [22582.035813] ? lprocfs_counter_add+0x10e/0x180 [obdclass] [22582.037335] ptlrpc_main+0xc9e/0x15c0 [ptlrpc] [22582.038763] ? ptlrpc_wait_event+0x5b0/0x5b0 [ptlrpc] [22582.040292] kthread+0x134/0x150 [22582.041227] ? set_kthread_struct+0x50/0x50 [22582.042376] ret_from_fork+0x1f/0x40 [22582.043397] CPU: 9 PID: 127537 Comm: ll_ost_io02_073 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.16.1.el8_lustre.x86_64 #1 [22582.045848] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [22582.047678] Call Trace:

Dongyang Li added a comment - 09/Oct/24 1:58 AM

Do we see the same ldiskfs error on vm1 for the timeout? If so it would be great to dump the debugfs output and the e2fsck log.

Back to sdao, dumpe2fs confirms dirdata is not enabled, and the directory inode is large with many entries. looks like one of the d[0-31] directories?
The dump of block 2349208195 does look like a directory block, with the entry in question:

4120  1ef4 7401 1000 0601 6631 6239 6630 0000  ..t.....f1b9f0..

So it looks like ofd was precreating the objects and the entries were adding to d[0-31],
as ldiskfs goes through the dir block trying to find the space for the new entry, it checks on all the entries it walk past, in this case the entry at offset 2128 fails the sanity check.

given e2fsck has been done and the entry looks ok, the only thing I can think of is the file_type flag somehow got flipped in the upper 4 bits, makes ldiskfs think that there are dirdata after the name, so the rec_len was off.

The dir block bh was from disk, so the flipper either happened on disk or in memory? Just in case there are no hardware errors like MCE events on the hypervisor?

and sarah do you recall anything related to the dir block during the e2fsck?

Dongyang Li added a comment - 09/Oct/24 1:58 AM Do we see the same ldiskfs error on vm1 for the timeout? If so it would be great to dump the debugfs output and the e2fsck log. Back to sdao, dumpe2fs confirms dirdata is not enabled, and the directory inode is large with many entries. looks like one of the d [0-31] directories? The dump of block 2349208195 does look like a directory block, with the entry in question: 4120 1ef4 7401 1000 0601 6631 6239 6630 0000 ..t.....f1b9f0.. So it looks like ofd was precreating the objects and the entries were adding to d [0-31] , as ldiskfs goes through the dir block trying to find the space for the new entry, it checks on all the entries it walk past, in this case the entry at offset 2128 fails the sanity check. given e2fsck has been done and the entry looks ok, the only thing I can think of is the file_type flag somehow got flipped in the upper 4 bits, makes ldiskfs think that there are dirdata after the name, so the rec_len was off. The dir block bh was from disk, so the flipper either happened on disk or in memory? Just in case there are no hardware errors like MCE events on the hypervisor? and sarah do you recall anything related to the dir block during the e2fsck?

People

Assignee:: Dongyang Li

Reporter:: Sarah Liu

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 16/Sep/24 5:21 PM

Updated:: 18/Feb/25 8:14 AM