Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18225

LDISKFS-fs: initial error at time 1724092382: ldiskfs_generic_delete_entry

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      loading build from https://build.whamcloud.com/job/lustre-reviews/107525/ on both server and clients of WR-SOAK, with no HA enabled, soak hit following issue

      [63239.265986] Lustre: Lustre: Build Version: 2.15.90_24_ge863f95f

      both VMs from the 2nd controller rebooted, see following errors in console log

      vcontroller vm0

      [63664.348413] Lustre: sfa18k03-OST0025: new connection from sfa18k03-MDT0003-mdtlov (cleaning up unused objects from 0x340000401:20963886 to 0x340000401:20968833)
      [63664.813575] Lustre: sfa18k03-OST0026: new connection from sfa18k03-MDT0003-mdtlov (cleaning up unused objects from 0xc40000400:20942873 to 0xc40000400:20948833)
      [63664.817519] Lustre: sfa18k03-OST0024: new connection from sfa18k03-MDT0003-mdtlov (cleaning up unused objects from 0x5c0000400:20671906 to 0x5c0000400:20675073)
      [63670.858601] Lustre: sfa18k03-OST0024: new connection from sfa18k03-MDT0001-mdtlov (cleaning up unused objects from 0x5c0000401:19996851 to 0x5c0000401:19999185)
      [63670.878393] Lustre: sfa18k03-OST001c: new connection from sfa18k03-MDT0001-mdtlov (cleaning up unused objects from 0x780000400:19109574 to 0x780000400:19111665)
      [63705.015859] Lustre: sfa18k03-OST0029-osc-MDT0001: Connection restored to 172.25.80.53@tcp (at 172.25.80.53@tcp)
      [63705.019483] Lustre: Skipped 14 previous similar messages
      [63711.321154] LustreError: sfa18k03-OST0031-osc-MDT0001: operation ost_connect to node 172.25.80.53@tcp failed: rc = -19
      [63711.323272] LustreError: Skipped 10 previous similar messages
      [63936.604947] LDISKFS-fs (sdao): error count since last fsck: 38
      [63936.604946] LDISKFS-fs (sdap): error count since last fsck: 2
      [63936.604966] LDISKFS-fs (sdap): initial error at time 1723874458: ldiskfs_find_dest_de:2297
      [63936.609560] LDISKFS-fs (sdao): initial error at time 1723660739: ldiskfs_find_dest_de:2297
      [63936.610905] : inode 182190082
      [63936.612597] : inode 73138214
      [63936.614252] : block 5830242584
      [63936.615147] : block 60217102
      [63936.615989] 
      [63936.616849] 
      [63936.617671] LDISKFS-fs (sdap): last error at time 1723874458: ldiskfs_evict_inode:257
      [63936.618286] LDISKFS-fs (sdao): last error at time 1723878231: ldiskfs_evict_inode:257
      [63936.618893] 
      [63936.620394] 
      [75516.204047] LustreError: sfa18k03-OST002c: not available for connect from 172.25.80.50@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
      [75516.206751] LustreError: sfa18k03-OST002b-osc-MDT0001: operation ost_statfs to node 172.25.80.53@tcp failed: rc = -107
      [75516.206773] Lustre: sfa18k03-MDT0003-osp-MDT0001: Connection to sfa18k03-MDT0003 (at 172.25.80.53@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [75516.206775] Lustre: Skipped 1 previous similar message
      [75516.209189] LustreError: Skipped 238 previous similar messages
      [75516.211099] LustreError: Skipped 10 previous similar messages
      [ESC[0;32m  OK  ESC[0m] Stopped target resource-agents dependencies.
               Stopping Restore /run/initramfs on shutdown...
               Stopping LVM event activation on device 8:0...
               Stopping LVM event activation on device 8:48...
               Stopping LVM event activation on device 252:3...
      [ESC[0;32m  OK  ESC[0m] Stopped target rpc_pipefs.target.
               Unmounting RPC Pipe File System...
               Stopping LVM event activation on device 8:32...
               Stopping LVM event activation on device 252:2...
               Stopping LVM event activation on device 8:64...
               Stopping Hostname Service...
      [ESC[0;32m  OK  ESC[0m] Stopped t[ESC[0;32m  OK  ESC[0m] Stopped irqbalance daemon.
      [ESC[0;32m  OK  ESC[0m] Stopped Self Monitoring and Reporting Technology (SMART) Daemon.
      [ESC[0;32m  OK  ESC[0m] Stopped Prometheus exporter for Lustre filesystem.
      [ESC[0;32m  OK  ESC[0m] Stopped Hostname Service.
      
      

      Attachments

        1. debugfs-bd
          16 kB
        2. debugfs-c-R-1014
          15 kB
        3. debugfs-Rbd
          16 kB
        4. debugfs-stat
          15 kB
        5. e2fsck-fn-1015
          3 kB
        6. e2fsck-output
          12 kB
        7. messages-20240923.gz
          1.11 MB
        8. sdao-dump
          2 kB

        Issue Links

          Activity

            [LU-18225] LDISKFS-fs: initial error at time 1724092382: ldiskfs_generic_delete_entry
            sarah Sarah Liu added a comment -

            sure, I will load the patch and update the ticket if it happens again

            sarah Sarah Liu added a comment - sure, I will load the patch and update the ticket if it happens again
            dongyang Dongyang Li added a comment -

            Looks like the block content has already been changed since Sep 24, the new name len is now 6:

            4120  1ef4 7401 1000 0601 6631 6239 6630 0000  ..t.....f1b9f0..
            

            I guess after the e2fsck the corrupted data is all gone.
            I just made a debug patch, which should give us more info when the error is triggered again. could you give that a try? Thanks

            dongyang Dongyang Li added a comment - Looks like the block content has already been changed since Sep 24, the new name len is now 6: 4120 1ef4 7401 1000 0601 6631 6239 6630 0000 ..t.....f1b9f0.. I guess after the e2fsck the corrupted data is all gone. I just made a debug patch, which should give us more info when the error is triggered again. could you give that a try? Thanks

            "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56706
            Subject: LU-18225 ldiskfs: add ext4-de-debug.patch
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 738d9c83967e75370ae333a1b43cac5d7536f74f

            gerrit Gerrit Updater added a comment - "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56706 Subject: LU-18225 ldiskfs: add ext4-de-debug.patch Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 738d9c83967e75370ae333a1b43cac5d7536f74f
            sarah Sarah Liu added a comment - - edited

            I can collect the info. the server vm is 10.25.80.52

            please check debugfs-Rbd and e2fsck-fn-1015

            sarah Sarah Liu added a comment - - edited I can collect the info. the server vm is 10.25.80.52 please check debugfs-Rbd and e2fsck-fn-1015
            dongyang Dongyang Li added a comment -

            And this the same inode with the same block 2349208195, with the same offset 2128.
            could we dump the block with

            debugfs -R "bd 2349208195" /dev/sdao
            

            and the output of read-only e2fsck, need to umount sdao first

            e2fsck -fn -m [num of cpus] /dev/sdao
            

            And if it's easier, could you point me to the server and I can login and collect/debug there?

            dongyang Dongyang Li added a comment - And this the same inode with the same block 2349208195, with the same offset 2128. could we dump the block with debugfs -R "bd 2349208195" /dev/sdao and the output of read-only e2fsck, need to umount sdao first e2fsck -fn -m [num of cpus] /dev/sdao And if it's easier, could you point me to the server and I can login and collect/debug there?
            sarah Sarah Liu added a comment -

            I have the same ldiskfs error on same node, same sdao and inode on Sep 24

            [root@snp-18k-03-v3 log]# less messages|grep "LDISKFS-fs er"
            Sep 24 13:30:57 snp-18k-03-v3 kernel: LDISKFS-fs error (device sdao): ldiskfs_find_dest_de:2295: inode #73400343: block 2349208195: comm ll_ost03_040: bad entry in directory: rec_len is too small for name_len - offset=2128, inode=24441886, rec_len=16, name_len=3, size=4096
            

            upload the output of "debugfs -c -R 'stat <73400343>' /dev/sdao", I have not run e2fsck, please let me know any other information is needed.

            sarah Sarah Liu added a comment - I have the same ldiskfs error on same node, same sdao and inode on Sep 24 [root@snp-18k-03-v3 log]# less messages|grep "LDISKFS-fs er" Sep 24 13:30:57 snp-18k-03-v3 kernel: LDISKFS-fs error (device sdao): ldiskfs_find_dest_de:2295: inode #73400343: block 2349208195: comm ll_ost03_040: bad entry in directory: rec_len is too small for name_len - offset=2128, inode=24441886, rec_len=16, name_len=3, size=4096 upload the output of "debugfs -c -R 'stat <73400343>' /dev/sdao", I have not run e2fsck, please let me know any other information is needed.

            "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56644
            Subject: LU-18225 kernel: silent page allocation failure in virtio_scsi
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 01cfbb23defae93dbdeef344ce6d9687410d8b4c

            gerrit Gerrit Updater added a comment - "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56644 Subject: LU-18225 kernel: silent page allocation failure in virtio_scsi Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 01cfbb23defae93dbdeef344ce6d9687410d8b4c

            If the allocation failure is handled transparently at a higher level, then using __GFP_NOWARN to quiet this failure is fine. Possibly that should also go upstream, or is there already a patch fixing this code that could be backported?

            It looks like the GFP flag is passed down from the caller through a few layers, so it is a bit tricky to know where is the right place to add __GFP_NOWARN. Ideally, this would be done at the same layer that is doing the allocation fallback, since it knows that the allocation failure is not harmful.

            adilger Andreas Dilger added a comment - If the allocation failure is handled transparently at a higher level, then using __GFP_NOWARN to quiet this failure is fine. Possibly that should also go upstream, or is there already a patch fixing this code that could be backported? It looks like the GFP flag is passed down from the caller through a few layers, so it is a bit tricky to know where is the right place to add __GFP_NOWARN . Ideally, this would be done at the same layer that is doing the allocation fallback, since it knows that the allocation failure is not harmful.
            dongyang Dongyang Li added a comment -

            Yes the allocation failure is a separate problem that needs looking into.

            dongyang Dongyang Li added a comment - Yes the allocation failure is a separate problem that needs looking into.
            sarah Sarah Liu added a comment -

            hi dongyang, no, there was no ldiskfs error on vm1. After I ran e2fsck and restarted the test, there was no ldiskfs error on any server vm when it was timeout. sorry I didn't recall the output of e2fsck.

            I rebooted all the server vms since the fs was no response(ls hung). it ran overnight, testing seems ongoing but I found on all server vms, there are page allocation error, dmesg like
            should I open new ticket for this?

            [16247.823902] Lustre: 68562:0:(mdt_recovery.c:148:mdt_req_from_lrd()) @@@ restoring transno  req@000000001dd58399 x1811746499175296/t523986305680(0) o101->7ec4b321-fc8d-49b2-9fc1-4ae12b6cd4c0@172.25.80.33@tcp:657/0 lens 376/1784 e 0 to 0 dl 1728427442 ref 1 fl Interpret:H/202/0 rc 0/0 job:'' uid:0 gid:0
            [22473.618001] Lustre: 127072:0:(mdt_recovery.c:148:mdt_req_from_lrd()) @@@ restoring transno  req@00000000abb6e3d3 x1811759166812672/t523989095606(0) o101->ce0abb5d-3cd7-43eb-bb94-be1985fd7ad4@172.25.80.22@tcp:63/0 lens 376/1864 e 0 to 0 dl 1728433643 ref 1 fl Interpret:H/202/0 rc 0/0 job:'' uid:0 gid:0
            [22474.401775] Lustre: 129905:0:(mdt_recovery.c:148:mdt_req_from_lrd()) @@@ restoring transno  req@000000007252b13d x1811759170103296/t523989095619(0) o101->ce0abb5d-3cd7-43eb-bb94-be1985fd7ad4@172.25.80.22@tcp:64/0 lens 376/1864 e 0 to 0 dl 1728433644 ref 1 fl Interpret:H/202/0 rc 0/0 job:'' uid:0 gid:0
            [22581.834967] ll_ost_io02_039: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
            [22581.838485] CPU: 8 PID: 127347 Comm: ll_ost_io02_039 Kdump: loaded Tainted: G           OE     -------- -  - 4.18.0-553.16.1.el8_lustre.x86_64 #1
            [22581.841877] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
            [22581.844339] Call Trace:
            [22581.845495]  dump_stack+0x41/0x60
            [22581.846780]  warn_alloc.cold.127+0x7b/0x108
            [22581.848191]  __alloc_pages_slowpath+0xcb2/0xcd0
            [22581.849511]  __alloc_pages_nodemask+0x2e2/0x330
            [22581.850575]  kmalloc_order+0x28/0x90
            [22581.851486]  kmalloc_order_trace+0x1d/0xb0
            [22581.852635]  __kmalloc+0x25c/0x2d0
            [22581.853670]  virtqueue_add+0x493/0xc70
            [22581.854731]  ? finish_wait+0x80/0x80
            [22581.855904]  virtqueue_add_sgs+0x80/0xa0
            [22581.857186]  __virtscsi_add_cmd+0x148/0x270 [virtio_scsi]
            [22581.858910]  ? scsi_alloc_sgtables+0x84/0x1c0
            [22581.860249]  virtscsi_add_cmd+0x38/0xa0 [virtio_scsi]
            [22581.861689]  virtscsi_queuecommand+0x186/0x2d0 [virtio_scsi]
            [22581.863242]  scsi_queue_rq+0x512/0xb10
            [22581.864494]  __blk_mq_try_issue_directly+0x163/0x200
            [22581.866006]  blk_mq_request_issue_directly+0x4e/0xb0
            [22581.867558]  blk_mq_try_issue_list_directly+0x62/0x100
            [22581.869078]  blk_mq_sched_insert_requests+0xa4/0xf0
            [22581.870562]  blk_mq_flush_plug_list+0x135/0x220
            [22581.871979]  blk_flush_plug_list+0xd7/0x100
            [22581.873287]  blk_finish_plug+0x25/0x36
            [22581.874491]  osd_do_bio+0x7f1/0xc70 [osd_ldiskfs]
            [22581.876012]  ? ldiskfs_map_blocks+0x607/0x610 [ldiskfs]
            [22581.877596]  ? osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs]
            [22581.879360]  osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs]
            [22581.881043]  osd_write_commit+0x4c6/0x990 [osd_ldiskfs]
            [22581.882571]  ofd_commitrw_write+0x9df/0x1b00 [ofd]
            [22581.884021]  ofd_commitrw+0x5dc/0xd80 [ofd]
            [22581.885272]  ? obd_commitrw+0x1b6/0x370 [ptlrpc]
            [22581.887103]  obd_commitrw+0x1b6/0x370 [ptlrpc]
            [22581.888655]  tgt_brw_write+0xf54/0x1f90 [ptlrpc]
            [22581.890213]  ? _cond_resched+0x15/0x30
            [22581.891400]  ? flush_work+0x42/0x1d0
            [22581.892500]  ? internal_add_timer+0x42/0x70
            [22581.893720]  ? _cond_resched+0x15/0x30
            [22581.894884]  ? mutex_lock+0xe/0x30
            [22581.895949]  tgt_request_handle+0x3f4/0x1a30 [ptlrpc]
            [22581.897495]  ? ptlrpc_update_export_timer+0x3d/0x500 [ptlrpc]
            [22581.899225]  ptlrpc_server_handle_request+0x2aa/0xcf0 [ptlrpc]
            [22581.900939]  ? lprocfs_counter_add+0x10e/0x180 [obdclass]
            [22581.902509]  ptlrpc_main+0xc9e/0x15c0 [ptlrpc]
            [22581.903979]  ? ptlrpc_wait_event+0x5b0/0x5b0 [ptlrpc]
            [22581.904744] ll_ost_io01_034: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
            [22581.905528]  kthread+0x134/0x150
            [22581.907356] ll_ost_io03_029: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
            [22581.909345] ll_ost_io01_064: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
            [22581.911672]  ? set_kthread_struct+0x50/0x50
            [22581.912135] ll_ost_io02_073: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
            [22581.917402]  ret_from_fork+0x1f/0x40
            [22581.918250] Mem-Info:
            [22581.918251] CPU: 12 PID: 127683 Comm: ll_ost_io03_029 Kdump: loaded Tainted: G           OE     -------- -  - 4.18.0-553.16.1.el8_lustre.x86_64 #1
            [22581.918886] active_anon:9586 inactive_anon:235852 isolated_anon:0
                            active_file:26214948 inactive_file:9997372 isolated_file:34
                            unevictable:0 dirty:172 writeback:0
                            slab_reclaimable:462546 slab_unreclaimable:544066
                            mapped:29737 shmem:197 pagetables:3250 bounce:0
                            free:558197 free_pcp:4297 free_cma:0
            [22581.921880] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
            [22581.928597] Node 0 active_anon:38344kB inactive_anon:943408kB active_file:104869260kB inactive_file:39989456kB unevictable:0kB isolated(anon):0kB isolated(file):620kB mapped:118948kB dirty:688kB writeback:0kB shmem:788kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:34656kB pagetables:13000kB all_unreclaimable? no
            [22581.930768] Call Trace:
            [22581.936536] Node 0 DMA free:11264kB min:4kB low:16kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
            [22581.937430]  dump_stack+0x41/0x60
            [22581.941697] lowmem_reserve[]: 0 913 150007 150007 150007
            [22581.942766]  warn_alloc.cold.127+0x7b/0x108
            [22581.943921] Node 0 DMA32 free:596536kB min:300kB low:1232kB high:2164kB active_anon:0kB inactive_anon:83612kB active_file:176000kB inactive_file:60624kB unevictable:0kB writepending:0kB present:2080608kB managed:966496kB mlocked:0kB bounce:0kB free_pcp:15096kB local_pcp:372kB free_cma:0kB
            [22581.946334]  __alloc_pages_slowpath+0xcb2/0xcd0
            [22581.951399] lowmem_reserve[]: 0 0 149093 149093 149093
            [22581.952769]  __alloc_pages_nodemask+0x2e2/0x330
            [22581.954004] Node 0 Normal free:1694996kB min:49268kB low:201940kB high:354612kB active_anon:38344kB inactive_anon:859796kB active_file:104623944kB inactive_file:39919140kB unevictable:0kB writepending:688kB present:155189248kB managed:152680592kB mlocked:0kB bounce:0kB free_pcp:2208kB local_pcp:24kB free_cma:0kB
            [22581.955501]  kmalloc_order+0x28/0x90
            [22581.961168] lowmem_reserve[]: 0 0 0 0 0
            [22581.962424]  kmalloc_order_trace+0x1d/0xb0
            [22581.963433] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 
            [22581.964729]  __kmalloc+0x25c/0x2d0
            [22581.964734] 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
            [22581.967434]  virtqueue_add+0x493/0xc70
            [22581.968864] Node 0 DMA32: 8556*4kB (UME) 4961*8kB (UME) 2066*16kB (ME) 
            [22581.970182]  virtqueue_add_sgs+0x80/0xa0
            [22581.970186] 725*32kB (UME) 311*64kB (UMEH) 122*128kB (UME) 73*256kB (UM) 51*512kB (UM) 17*1024kB (ME) 18*2048kB 
            [22581.973014]  __virtscsi_add_cmd+0x148/0x270 [virtio_scsi]
            [22581.973019] (UM) 81*4096kB (UM) = 596536kB
            [22581.976714]  ? scsi_alloc_sgtables+0x84/0x1c0
            [22581.977790] Node 0 Normal: 279769*4kB (UMEH) 108102*8kB (UMEH) 9366*16kB (UMH) 12*32kB 
            [22581.979216]  virtscsi_add_cmd+0x38/0xa0 [virtio_scsi]
            [22581.980930] (UMEH) 6*64kB (EH) 4*128kB (E) 2*256kB (E) 1*512kB (E) 0*1024kB 0*2048kB 0*4096kB = 2136052kB
            [22581.982477]  virtscsi_queuecommand+0x186/0x2d0 [virtio_scsi]
            [22581.984429] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
            [22581.986082]  scsi_queue_rq+0x512/0xb10
            [22581.987909] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
            [22581.989257]  __blk_mq_try_issue_directly+0x163/0x200
            [22581.991057] 36019216 total pagecache pages
            [22581.992606]  blk_mq_request_issue_directly+0x4e/0xb0
            [22581.993679] 4589 pages in swap cache
            [22581.995184]  blk_mq_try_issue_list_directly+0x62/0x100
            [22581.996150] Swap cache stats: add 236588, delete 231998, find 24220/31008
            [22581.997572]  blk_mq_sched_insert_requests+0xa4/0xf0
            [22581.999028] Free swap  = 10341124kB
            [22582.000489]  blk_mq_flush_plug_list+0x135/0x220
            [22582.001436] Total swap = 11075580kB
            [22582.002794]  blk_flush_plug_list+0xd7/0x100
            [22582.003723] 39321462 pages RAM
            [22582.005003]  blk_finish_plug+0x25/0x36
            [22582.005933] 0 pages HighMem/MovableOnly
            [22582.007125]  osd_do_bio+0x7f1/0xc70 [osd_ldiskfs]
            [22582.008103] 905850 pages reserved
            [22582.009447]  ? ldiskfs_map_blocks+0x607/0x610 [ldiskfs]
            [22582.010354] 0 pages hwpoisoned
            [22582.011810]  ? osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs]
            [22582.014521]  osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs]
            [22582.016448]  osd_write_commit+0x4c6/0x990 [osd_ldiskfs]
            [22582.017985]  ofd_commitrw_write+0x9df/0x1b00 [ofd]
            [22582.019536]  ofd_commitrw+0x5dc/0xd80 [ofd]
            [22582.020816]  ? obd_commitrw+0x1b6/0x370 [ptlrpc]
            [22582.022466]  obd_commitrw+0x1b6/0x370 [ptlrpc]
            [22582.023992]  tgt_brw_write+0xf54/0x1f90 [ptlrpc]
            [22582.025495]  ? _cond_resched+0x15/0x30
            [22582.026671]  ? flush_work+0x42/0x1d0
            [22582.027752]  ? internal_add_timer+0x42/0x70
            [22582.028976]  ? _cond_resched+0x15/0x30
            [22582.030064]  ? mutex_lock+0xe/0x30
            [22582.031072]  tgt_request_handle+0x3f4/0x1a30 [ptlrpc]
            [22582.032636]  ? ptlrpc_update_export_timer+0x3d/0x500 [ptlrpc]
            [22582.034103]  ptlrpc_server_handle_request+0x2aa/0xcf0 [ptlrpc]
            [22582.035813]  ? lprocfs_counter_add+0x10e/0x180 [obdclass]
            [22582.037335]  ptlrpc_main+0xc9e/0x15c0 [ptlrpc]
            [22582.038763]  ? ptlrpc_wait_event+0x5b0/0x5b0 [ptlrpc]
            [22582.040292]  kthread+0x134/0x150
            [22582.041227]  ? set_kthread_struct+0x50/0x50
            [22582.042376]  ret_from_fork+0x1f/0x40
            [22582.043397] CPU: 9 PID: 127537 Comm: ll_ost_io02_073 Kdump: loaded Tainted: G           OE     -------- -  - 4.18.0-553.16.1.el8_lustre.x86_64 #1
            [22582.045848] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
            [22582.047678] Call Trace:
            
            sarah Sarah Liu added a comment - hi dongyang , no, there was no ldiskfs error on vm1. After I ran e2fsck and restarted the test, there was no ldiskfs error on any server vm when it was timeout. sorry I didn't recall the output of e2fsck. I rebooted all the server vms since the fs was no response(ls hung). it ran overnight, testing seems ongoing but I found on all server vms, there are page allocation error, dmesg like should I open new ticket for this? [16247.823902] Lustre: 68562:0:(mdt_recovery.c:148:mdt_req_from_lrd()) @@@ restoring transno req@000000001dd58399 x1811746499175296/t523986305680(0) o101->7ec4b321-fc8d-49b2-9fc1-4ae12b6cd4c0@172.25.80.33@tcp:657/0 lens 376/1784 e 0 to 0 dl 1728427442 ref 1 fl Interpret:H/202/0 rc 0/0 job:'' uid:0 gid:0 [22473.618001] Lustre: 127072:0:(mdt_recovery.c:148:mdt_req_from_lrd()) @@@ restoring transno req@00000000abb6e3d3 x1811759166812672/t523989095606(0) o101->ce0abb5d-3cd7-43eb-bb94-be1985fd7ad4@172.25.80.22@tcp:63/0 lens 376/1864 e 0 to 0 dl 1728433643 ref 1 fl Interpret:H/202/0 rc 0/0 job:'' uid:0 gid:0 [22474.401775] Lustre: 129905:0:(mdt_recovery.c:148:mdt_req_from_lrd()) @@@ restoring transno req@000000007252b13d x1811759170103296/t523989095619(0) o101->ce0abb5d-3cd7-43eb-bb94-be1985fd7ad4@172.25.80.22@tcp:64/0 lens 376/1864 e 0 to 0 dl 1728433644 ref 1 fl Interpret:H/202/0 rc 0/0 job:'' uid:0 gid:0 [22581.834967] ll_ost_io02_039: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [22581.838485] CPU: 8 PID: 127347 Comm: ll_ost_io02_039 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.16.1.el8_lustre.x86_64 #1 [22581.841877] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [22581.844339] Call Trace: [22581.845495] dump_stack+0x41/0x60 [22581.846780] warn_alloc.cold.127+0x7b/0x108 [22581.848191] __alloc_pages_slowpath+0xcb2/0xcd0 [22581.849511] __alloc_pages_nodemask+0x2e2/0x330 [22581.850575] kmalloc_order+0x28/0x90 [22581.851486] kmalloc_order_trace+0x1d/0xb0 [22581.852635] __kmalloc+0x25c/0x2d0 [22581.853670] virtqueue_add+0x493/0xc70 [22581.854731] ? finish_wait+0x80/0x80 [22581.855904] virtqueue_add_sgs+0x80/0xa0 [22581.857186] __virtscsi_add_cmd+0x148/0x270 [virtio_scsi] [22581.858910] ? scsi_alloc_sgtables+0x84/0x1c0 [22581.860249] virtscsi_add_cmd+0x38/0xa0 [virtio_scsi] [22581.861689] virtscsi_queuecommand+0x186/0x2d0 [virtio_scsi] [22581.863242] scsi_queue_rq+0x512/0xb10 [22581.864494] __blk_mq_try_issue_directly+0x163/0x200 [22581.866006] blk_mq_request_issue_directly+0x4e/0xb0 [22581.867558] blk_mq_try_issue_list_directly+0x62/0x100 [22581.869078] blk_mq_sched_insert_requests+0xa4/0xf0 [22581.870562] blk_mq_flush_plug_list+0x135/0x220 [22581.871979] blk_flush_plug_list+0xd7/0x100 [22581.873287] blk_finish_plug+0x25/0x36 [22581.874491] osd_do_bio+0x7f1/0xc70 [osd_ldiskfs] [22581.876012] ? ldiskfs_map_blocks+0x607/0x610 [ldiskfs] [22581.877596] ? osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs] [22581.879360] osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs] [22581.881043] osd_write_commit+0x4c6/0x990 [osd_ldiskfs] [22581.882571] ofd_commitrw_write+0x9df/0x1b00 [ofd] [22581.884021] ofd_commitrw+0x5dc/0xd80 [ofd] [22581.885272] ? obd_commitrw+0x1b6/0x370 [ptlrpc] [22581.887103] obd_commitrw+0x1b6/0x370 [ptlrpc] [22581.888655] tgt_brw_write+0xf54/0x1f90 [ptlrpc] [22581.890213] ? _cond_resched+0x15/0x30 [22581.891400] ? flush_work+0x42/0x1d0 [22581.892500] ? internal_add_timer+0x42/0x70 [22581.893720] ? _cond_resched+0x15/0x30 [22581.894884] ? mutex_lock+0xe/0x30 [22581.895949] tgt_request_handle+0x3f4/0x1a30 [ptlrpc] [22581.897495] ? ptlrpc_update_export_timer+0x3d/0x500 [ptlrpc] [22581.899225] ptlrpc_server_handle_request+0x2aa/0xcf0 [ptlrpc] [22581.900939] ? lprocfs_counter_add+0x10e/0x180 [obdclass] [22581.902509] ptlrpc_main+0xc9e/0x15c0 [ptlrpc] [22581.903979] ? ptlrpc_wait_event+0x5b0/0x5b0 [ptlrpc] [22581.904744] ll_ost_io01_034: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [22581.905528] kthread+0x134/0x150 [22581.907356] ll_ost_io03_029: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [22581.909345] ll_ost_io01_064: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [22581.911672] ? set_kthread_struct+0x50/0x50 [22581.912135] ll_ost_io02_073: page allocation failure: order:3, mode:0x484020(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 [22581.917402] ret_from_fork+0x1f/0x40 [22581.918250] Mem-Info: [22581.918251] CPU: 12 PID: 127683 Comm: ll_ost_io03_029 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.16.1.el8_lustre.x86_64 #1 [22581.918886] active_anon:9586 inactive_anon:235852 isolated_anon:0 active_file:26214948 inactive_file:9997372 isolated_file:34 unevictable:0 dirty:172 writeback:0 slab_reclaimable:462546 slab_unreclaimable:544066 mapped:29737 shmem:197 pagetables:3250 bounce:0 free:558197 free_pcp:4297 free_cma:0 [22581.921880] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [22581.928597] Node 0 active_anon:38344kB inactive_anon:943408kB active_file:104869260kB inactive_file:39989456kB unevictable:0kB isolated(anon):0kB isolated(file):620kB mapped:118948kB dirty:688kB writeback:0kB shmem:788kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:34656kB pagetables:13000kB all_unreclaimable? no [22581.930768] Call Trace: [22581.936536] Node 0 DMA free:11264kB min:4kB low:16kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [22581.937430] dump_stack+0x41/0x60 [22581.941697] lowmem_reserve[]: 0 913 150007 150007 150007 [22581.942766] warn_alloc.cold.127+0x7b/0x108 [22581.943921] Node 0 DMA32 free:596536kB min:300kB low:1232kB high:2164kB active_anon:0kB inactive_anon:83612kB active_file:176000kB inactive_file:60624kB unevictable:0kB writepending:0kB present:2080608kB managed:966496kB mlocked:0kB bounce:0kB free_pcp:15096kB local_pcp:372kB free_cma:0kB [22581.946334] __alloc_pages_slowpath+0xcb2/0xcd0 [22581.951399] lowmem_reserve[]: 0 0 149093 149093 149093 [22581.952769] __alloc_pages_nodemask+0x2e2/0x330 [22581.954004] Node 0 Normal free:1694996kB min:49268kB low:201940kB high:354612kB active_anon:38344kB inactive_anon:859796kB active_file:104623944kB inactive_file:39919140kB unevictable:0kB writepending:688kB present:155189248kB managed:152680592kB mlocked:0kB bounce:0kB free_pcp:2208kB local_pcp:24kB free_cma:0kB [22581.955501] kmalloc_order+0x28/0x90 [22581.961168] lowmem_reserve[]: 0 0 0 0 0 [22581.962424] kmalloc_order_trace+0x1d/0xb0 [22581.963433] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB [22581.964729] __kmalloc+0x25c/0x2d0 [22581.964734] 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB [22581.967434] virtqueue_add+0x493/0xc70 [22581.968864] Node 0 DMA32: 8556*4kB (UME) 4961*8kB (UME) 2066*16kB (ME) [22581.970182] virtqueue_add_sgs+0x80/0xa0 [22581.970186] 725*32kB (UME) 311*64kB (UMEH) 122*128kB (UME) 73*256kB (UM) 51*512kB (UM) 17*1024kB (ME) 18*2048kB [22581.973014] __virtscsi_add_cmd+0x148/0x270 [virtio_scsi] [22581.973019] (UM) 81*4096kB (UM) = 596536kB [22581.976714] ? scsi_alloc_sgtables+0x84/0x1c0 [22581.977790] Node 0 Normal: 279769*4kB (UMEH) 108102*8kB (UMEH) 9366*16kB (UMH) 12*32kB [22581.979216] virtscsi_add_cmd+0x38/0xa0 [virtio_scsi] [22581.980930] (UMEH) 6*64kB (EH) 4*128kB (E) 2*256kB (E) 1*512kB (E) 0*1024kB 0*2048kB 0*4096kB = 2136052kB [22581.982477] virtscsi_queuecommand+0x186/0x2d0 [virtio_scsi] [22581.984429] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [22581.986082] scsi_queue_rq+0x512/0xb10 [22581.987909] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [22581.989257] __blk_mq_try_issue_directly+0x163/0x200 [22581.991057] 36019216 total pagecache pages [22581.992606] blk_mq_request_issue_directly+0x4e/0xb0 [22581.993679] 4589 pages in swap cache [22581.995184] blk_mq_try_issue_list_directly+0x62/0x100 [22581.996150] Swap cache stats: add 236588, delete 231998, find 24220/31008 [22581.997572] blk_mq_sched_insert_requests+0xa4/0xf0 [22581.999028] Free swap = 10341124kB [22582.000489] blk_mq_flush_plug_list+0x135/0x220 [22582.001436] Total swap = 11075580kB [22582.002794] blk_flush_plug_list+0xd7/0x100 [22582.003723] 39321462 pages RAM [22582.005003] blk_finish_plug+0x25/0x36 [22582.005933] 0 pages HighMem/MovableOnly [22582.007125] osd_do_bio+0x7f1/0xc70 [osd_ldiskfs] [22582.008103] 905850 pages reserved [22582.009447] ? ldiskfs_map_blocks+0x607/0x610 [ldiskfs] [22582.010354] 0 pages hwpoisoned [22582.011810] ? osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs] [22582.014521] osd_ldiskfs_map_inode_pages+0x2c2/0xb20 [osd_ldiskfs] [22582.016448] osd_write_commit+0x4c6/0x990 [osd_ldiskfs] [22582.017985] ofd_commitrw_write+0x9df/0x1b00 [ofd] [22582.019536] ofd_commitrw+0x5dc/0xd80 [ofd] [22582.020816] ? obd_commitrw+0x1b6/0x370 [ptlrpc] [22582.022466] obd_commitrw+0x1b6/0x370 [ptlrpc] [22582.023992] tgt_brw_write+0xf54/0x1f90 [ptlrpc] [22582.025495] ? _cond_resched+0x15/0x30 [22582.026671] ? flush_work+0x42/0x1d0 [22582.027752] ? internal_add_timer+0x42/0x70 [22582.028976] ? _cond_resched+0x15/0x30 [22582.030064] ? mutex_lock+0xe/0x30 [22582.031072] tgt_request_handle+0x3f4/0x1a30 [ptlrpc] [22582.032636] ? ptlrpc_update_export_timer+0x3d/0x500 [ptlrpc] [22582.034103] ptlrpc_server_handle_request+0x2aa/0xcf0 [ptlrpc] [22582.035813] ? lprocfs_counter_add+0x10e/0x180 [obdclass] [22582.037335] ptlrpc_main+0xc9e/0x15c0 [ptlrpc] [22582.038763] ? ptlrpc_wait_event+0x5b0/0x5b0 [ptlrpc] [22582.040292] kthread+0x134/0x150 [22582.041227] ? set_kthread_struct+0x50/0x50 [22582.042376] ret_from_fork+0x1f/0x40 [22582.043397] CPU: 9 PID: 127537 Comm: ll_ost_io02_073 Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.16.1.el8_lustre.x86_64 #1 [22582.045848] Hardware name: DDN SFA18KXE, BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [22582.047678] Call Trace:
            dongyang Dongyang Li added a comment -

            Do we see the same ldiskfs error on vm1 for the timeout? If so it would be great to dump the debugfs output and the e2fsck log.

            Back to sdao, dumpe2fs confirms dirdata is not enabled, and the directory inode is large with many entries. looks like one of the d[0-31] directories?
            The dump of block 2349208195 does look like a directory block, with the entry in question:

            4120  1ef4 7401 1000 0601 6631 6239 6630 0000  ..t.....f1b9f0..
            

            So it looks like ofd was precreating the objects and the entries were adding to d[0-31],
            as ldiskfs goes through the dir block trying to find the space for the new entry, it checks on all the entries it walk past, in this case the entry at offset 2128 fails the sanity check.

            given e2fsck has been done and the entry looks ok, the only thing I can think of is the file_type flag somehow got flipped in the upper 4 bits, makes ldiskfs think that there are dirdata after the name, so the rec_len was off.

            The dir block bh was from disk, so the flipper either happened on disk or in memory? Just in case there are no hardware errors like MCE events on the hypervisor?

            and sarah do you recall anything related to the dir block during the e2fsck?

            dongyang Dongyang Li added a comment - Do we see the same ldiskfs error on vm1 for the timeout? If so it would be great to dump the debugfs output and the e2fsck log. Back to sdao, dumpe2fs confirms dirdata is not enabled, and the directory inode is large with many entries. looks like one of the d [0-31] directories? The dump of block 2349208195 does look like a directory block, with the entry in question: 4120 1ef4 7401 1000 0601 6631 6239 6630 0000 ..t.....f1b9f0.. So it looks like ofd was precreating the objects and the entries were adding to d [0-31] , as ldiskfs goes through the dir block trying to find the space for the new entry, it checks on all the entries it walk past, in this case the entry at offset 2128 fails the sanity check. given e2fsck has been done and the entry looks ok, the only thing I can think of is the file_type flag somehow got flipped in the upper 4 bits, makes ldiskfs think that there are dirdata after the name, so the rec_len was off. The dir block bh was from disk, so the flipper either happened on disk or in memory? Just in case there are no hardware errors like MCE events on the hypervisor? and sarah do you recall anything related to the dir block during the e2fsck?

            People

              dongyang Dongyang Li
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: