Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.17.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This is an environment of kernel and Lustre which is built by clang instead of gcc on ubuntu24.04.

      $ sudo dmesg | grep 'Linux version'
      [    0.000000] Linux version 6.8.12clang (ubuntu@u2404-srv1) (Ubuntu clang version 18.1.3 (1ubuntu1), Ubuntu LLD 18.1.3) #1 SMP PREEMPT_DYNAMIC Thu Dec  5 09:13:19 UTC 2024 (Ubuntu 6.8.0-49.49-generic 6.8.12)
      

      Lustre master branch was able to be built by clang except LU-18517.

      $ sh ./autogen.sh; ./configure CC=clang 'CFLAGS=-Wno-frame-address -Wno-incompatible-function-pointer-types' --with-linux=/home/ubuntu/linux-source-6.8.0/ LLVM=1
      
      $ sudo modprobe lustre
      $ modinfo lustre
      filename:       /lib/modules/6.8.0-49-generic-clang-2/updates/kernel/fs/lustre/lustre.ko
      alias:          fs-lustre
      author:         OpenSFS, Inc. <http://www.lustre.org/>
      description:    Lustre Client File System
      version:        2.16.50_87_g64e158b
      license:        GPL
      vermagic:       6.8.0-49-generic-clang-2 SMP preempt mod_unload modversions 
      name:           lustre
      retpoline:      Y
      depends:        ptlrpc,libcfs,obdclass,mdc,lmv,lov,lnet,fid
      srcversion:     F6E501204471DEE570F0971
      

      However, when a device is mounted as Lustre, it hangs. Even device is just MGT.

      $ sudo mkfs.lustre --mgs --device-size=100000 /tmp/mdt
      $ sudo mount /tmp/mdt -o loop /mnt/ -t lustre
      

      mount doesn't complete and stuck here.

      [25204.948835] ------------[ cut here ]------------
      [25204.948843] UBSAN: array-index-out-of-bounds in /home/ubuntu/lustre-release/ldiskfs/htree_lock.c:874:35
      [25204.951242] index 1 is out of range for type 'struct htree_lock_node[0]'
      [25204.952946] CPU: 15 PID: 74322 Comm: mount.lustre Tainted: G           OE      6.8.0-49-generic-clang-2 #4
      [25204.952960] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/Incus, BIOS unknown 02/02/2022
      [25204.952966] Call Trace:
      [25204.952989]  <TASK>
      [25204.952997]  dump_stack_lvl+0x2f/0xa0
      [25204.953051]  dump_stack+0x10/0x20
      [25204.953058]  __ubsan_handle_out_of_bounds+0xe8/0x110
      [25204.953094]  htree_lock_alloc+0xdf/0x1b0 [ldiskfs]
      [25204.953204]  ldiskfs_htree_lock_alloc+0x18/0x30 [ldiskfs]
      [25204.953287]  osd_key_init+0x1c2/0x370 [osd_ldiskfs]
      [25204.953350]  keys_fill+0x8f/0x120 [obdclass]
      [25204.953503]  lu_context_init+0x15b/0x190 [obdclass]
      [25204.953644]  lu_env_init+0x1a/0x30 [obdclass]
      [25204.953817]  obd_setup+0x1f5/0x3e0 [obdclass]
      [25204.953961]  class_setup+0x487/0x840 [obdclass]
      [25204.954103]  class_process_config+0x1bdf/0x31a0 [obdclass]
      [25204.954247]  ? libcfs_debug_msg+0xab7/0xe20 [libcfs]
      [25204.954291]  ? __kmalloc+0x1bd/0x460
      [25204.954304]  do_lcfg+0x495/0x830 [obdclass]
      [25204.954447]  lustre_start_simple+0x136/0x1e0 [obdclass]
      [25204.954590]  server_fill_super+0x7c8/0x14d0 [ptlrpc]
      [25204.954853]  lustre_fill_super+0x255/0x480 [lustre]
      [25204.954961]  ? __pfx_lustre_fill_super+0x10/0x10 [lustre]
      [25204.955050]  mount_nodev+0x52/0xb0
      [25204.955065]  lustre_mount+0x18/0x30 [lustre]
      [25204.955200]  legacy_get_tree+0x2c/0x60
      [25204.955221]  vfs_get_tree+0x2f/0xf0
      [25204.955228]  do_new_mount+0x143/0x3a0
      [25204.955239]  path_mount+0x311/0x540
      [25204.955245]  __se_sys_mount+0x174/0x1f0
      [25204.955251]  __x64_sys_mount+0x25/0x40
      [25204.955256]  x64_sys_call+0x2726/0x2be0
      [25204.955268]  do_syscall_64+0x89/0x160
      [25204.955276]  ? fsnotify_destroy_marks+0x87/0x1c0
      [25204.955288]  ? call_rcu+0x25/0x60
      [25204.955301]  ? evict+0x202/0x230
      [25204.955318]  ? kmem_cache_free+0x2b/0x2d0
      [25204.955329]  ? fpregs_restore_userregs+0x81/0x150
      [25204.955348]  ? switch_fpu_return+0xe/0x20
      [25204.955354]  ? arch_exit_to_user_mode_prepare+0x68/0x70
      [25204.955365]  ? syscall_exit_to_user_mode+0xb6/0xe0
      [25204.955375]  ? do_syscall_64+0x95/0x160
      [25204.955383]  ? __memcg_slab_free_hook+0x48/0x1d0
      [25204.955391]  ? __fput+0x1a7/0x2d0
      [25204.955396]  ? kmem_cache_free+0x193/0x2d0
      [25204.955402]  ? __fput+0x1a7/0x2d0
      [25204.955407]  ? __fput_sync+0x15/0x20
      [25204.955411]  ? arch_exit_to_user_mode_prepare+0x22/0x70
      [25204.955417]  ? syscall_exit_to_user_mode+0xb6/0xe0
      [25204.955422]  ? do_syscall_64+0x95/0x160
      [25204.955429]  ? irqentry_exit+0x12/0x50
      [25204.955434]  ? exc_page_fault+0x74/0xf0
      [25204.955444]  entry_SYSCALL_64_after_hwframe+0x78/0x80
      [25204.955465] RIP: 0033:0x75a56932af0e
      [25204.955473] Code: 48 8b 0d 0d 7f 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d da 7e 0d 00 f7 d8 64 89 01 48
      [25204.955479] RSP: 002b:00007ffe239b4078 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
      [25204.955487] RAX: ffffffffffffffda RBX: 00007ffe239b83d8 RCX: 000075a56932af0e
      [25204.955492] RDX: 000061f0e643668a RSI: 00007ffe239b7280 RDI: 000061f0e71dd930
      [25204.955495] RBP: 00007ffe239b82b0 R08: 000061f0e71dd950 R09: 0000000000000007
      [25204.955499] R10: 0000000001000000 R11: 0000000000000206 R12: 0000000000000005
      [25204.955503] R13: 0000000000000000 R14: 000061f0e6439b48 R15: 000075a5695a9000
      [25204.955510]  </TASK>
      [25204.955556] ---[ end trace ]---
      

      Attachments

        Activity

          [LU-18518] lustre device mount stuck
          pjones Peter Jones added a comment -

          Merged for 2.17

          pjones Peter Jones added a comment - Merged for 2.17

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57327/
          Subject: LU-18518 ldiskfs: fix htree_lock array-index-out-of-bounds
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 51b425d63a4d9482448da633224d82fa9fb4ec5d

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57327/ Subject: LU-18518 ldiskfs: fix htree_lock array-index-out-of-bounds Project: fs/lustre-release Branch: master Current Patch Set: Commit: 51b425d63a4d9482448da633224d82fa9fb4ec5d

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57333/
          Subject: LU-18518 ldiskfs: fix LC_SRC_NFS_FILLDIR_USE_CTX check
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 7155b844c39f0d444b6b94cad73ab7c9f6a8818f

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57333/ Subject: LU-18518 ldiskfs: fix LC_SRC_NFS_FILLDIR_USE_CTX check Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7155b844c39f0d444b6b94cad73ab7c9f6a8818f
          timday Tim Day added a comment -

          @Shuichi Ihara Have you tried ZFS? When I was working on Clang build support (i.e. LU-16518), I only ever tested ZFS and never saw hangs. So I think your issue must be ldiskfs somehow.

          timday Tim Day added a comment - @Shuichi Ihara Have you tried ZFS? When I was working on Clang build support (i.e. LU-16518 ), I only ever tested ZFS and never saw hangs. So I think your issue must be ldiskfs somehow.

          "Sohei Koyama <skoyama@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57333
          Subject: LU-18518 ldiskfs: fix LC_SRC_NFS_FILLDIR_USE_CTX check
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: f21bbf235d992fce174a6cf8f82eb3e98e5c9e22

          gerrit Gerrit Updater added a comment - "Sohei Koyama <skoyama@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57333 Subject: LU-18518 ldiskfs: fix LC_SRC_NFS_FILLDIR_USE_CTX check Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f21bbf235d992fce174a6cf8f82eb3e98e5c9e22

          I wonder if LU-17888 could be related? we are missing something when if Lustre is built by clang instead of gcc?

          sihara Shuichi Ihara added a comment - I wonder if LU-17888 could be related? we are missing something when if Lustre is built by clang instead of gcc?

          Thank you Jian. patch worked and UBSAN array-index-out-of-bounds warning was gone.
          However, mounting device still stuck and remained. This is now a real defect.
          Here is stack trace of mount.lustre.

          2024-12-07T10:07:39.059352+00:00 u2404-srv1 kernel: task:mount.lustre    state:R  running task     stack:0     pid:1070  tgid:1070  ppid:1069   flags:0x00004002
          2024-12-07T10:07:39.059354+00:00 u2404-srv1 kernel: Call Trace:
          2024-12-07T10:07:39.059355+00:00 u2404-srv1 kernel:  <TASK>
          2024-12-07T10:07:39.059371+00:00 u2404-srv1 kernel:  ? ldiskfs_readdir+0xab4/0xe60 [ldiskfs]
          2024-12-07T10:07:39.059373+00:00 u2404-srv1 kernel:  ? sysvec_apic_timer_interrupt+0x49/0x90
          2024-12-07T10:07:39.059374+00:00 u2404-srv1 kernel:  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
          2024-12-07T10:07:39.059376+00:00 u2404-srv1 kernel:  ? iterate_dir+0x90/0x170
          2024-12-07T10:07:39.059377+00:00 u2404-srv1 kernel:  ? osd_ios_general_scan+0x17f/0x250 [osd_ldiskfs]
          2024-12-07T10:07:39.059378+00:00 u2404-srv1 kernel:  ? __pfx_osd_ios_root_fill+0x10/0x10 [osd_ldiskfs]
          2024-12-07T10:07:39.059380+00:00 u2404-srv1 kernel:  ? osd_scrub_setup+0x7c0/0x14d0 [osd_ldiskfs]
          2024-12-07T10:07:39.059398+00:00 u2404-srv1 kernel:  ? _raw_spin_unlock_irqrestore+0x11/0x50
          2024-12-07T10:07:39.059405+00:00 u2404-srv1 kernel:  ? lprocfs_oh_alloc_pcpu+0x55/0xf0 [obdclass]
          2024-12-07T10:07:39.059871+00:00 u2404-srv1 kernel:  ? osd_device_alloc+0x64f/0xa70 [osd_ldiskfs]
          2024-12-07T10:07:39.059884+00:00 u2404-srv1 kernel:  ? obd_setup+0x21f/0x3e0 [obdclass]
          2024-12-07T10:07:39.059886+00:00 u2404-srv1 kernel:  ? class_setup+0x487/0x840 [obdclass]
          2024-12-07T10:07:39.059888+00:00 u2404-srv1 kernel:  ? class_process_config+0x1bdf/0x31a0 [obdclass]
          2024-12-07T10:07:39.060940+00:00 u2404-srv1 kernel:  ? do_lcfg+0x495/0x830 [obdclass]
          2024-12-07T10:07:39.060960+00:00 u2404-srv1 kernel:  ? lustre_start_simple+0x136/0x1e0 [obdclass]
          2024-12-07T10:07:39.060961+00:00 u2404-srv1 kernel:  ? server_fill_super+0x7c8/0x14d0 [ptlrpc]
          2024-12-07T10:07:39.060963+00:00 u2404-srv1 kernel:  ? lustre_fill_super+0x255/0x480 [lustre]
          2024-12-07T10:07:39.060964+00:00 u2404-srv1 kernel:  ? __pfx_lustre_fill_super+0x10/0x10 [lustre]
          2024-12-07T10:07:39.060966+00:00 u2404-srv1 kernel:  ? mount_nodev+0x52/0xb0
          2024-12-07T10:07:39.060968+00:00 u2404-srv1 kernel:  ? lustre_mount+0x18/0x30 [lustre]
          2024-12-07T10:07:39.060969+00:00 u2404-srv1 kernel:  ? legacy_get_tree+0x2c/0x60
          2024-12-07T10:07:39.060971+00:00 u2404-srv1 kernel:  ? vfs_get_tree+0x2f/0xf0
          2024-12-07T10:07:39.060972+00:00 u2404-srv1 kernel:  ? do_new_mount+0x143/0x3a0
          2024-12-07T10:07:39.060973+00:00 u2404-srv1 kernel:  ? path_mount+0x311/0x540
          2024-12-07T10:07:39.060975+00:00 u2404-srv1 kernel:  ? __se_sys_mount+0x174/0x1f0
          2024-12-07T10:07:39.060976+00:00 u2404-srv1 kernel:  ? __x64_sys_mount+0x25/0x40
          2024-12-07T10:07:39.060978+00:00 u2404-srv1 kernel:  ? x64_sys_call+0x2726/0x2be0
          2024-12-07T10:07:39.060979+00:00 u2404-srv1 kernel:  ? do_syscall_64+0x89/0x160
          2024-12-07T10:07:39.060980+00:00 u2404-srv1 kernel:  ? fsnotify_destroy_marks+0x87/0x1c0
          2024-12-07T10:07:39.060981+00:00 u2404-srv1 kernel:  ? call_rcu+0x25/0x60
          2024-12-07T10:07:39.060982+00:00 u2404-srv1 kernel:  ? evict+0x202/0x230
          2024-12-07T10:07:39.060984+00:00 u2404-srv1 kernel:  ? kmem_cache_free+0x2b/0x2d0
          2024-12-07T10:07:39.060985+00:00 u2404-srv1 kernel:  ? fpregs_restore_userregs+0x81/0x150
          2024-12-07T10:07:39.061009+00:00 u2404-srv1 kernel:  ? switch_fpu_return+0xe/0x20
          2024-12-07T10:07:39.061011+00:00 u2404-srv1 kernel:  ? arch_exit_to_user_mode_prepare+0x68/0x70
          2024-12-07T10:07:39.061013+00:00 u2404-srv1 kernel:  ? syscall_exit_to_user_mode+0xb6/0xe0
          2024-12-07T10:07:39.061014+00:00 u2404-srv1 kernel:  ? do_syscall_64+0x95/0x160
          2024-12-07T10:07:39.061016+00:00 u2404-srv1 kernel:  ? do_syscall_64+0x95/0x160
          2024-12-07T10:07:39.061020+00:00 u2404-srv1 kernel:  ? arch_exit_to_user_mode_prepare+0x22/0x70
          2024-12-07T10:07:39.061021+00:00 u2404-srv1 kernel:  ? irqentry_exit_to_user_mode+0xac/0xd0
          2024-12-07T10:07:39.061023+00:00 u2404-srv1 kernel:  ? irqentry_exit+0x12/0x50
          2024-12-07T10:07:39.061024+00:00 u2404-srv1 kernel:  ? exc_page_fault+0x74/0xf0
          2024-12-07T10:07:39.061026+00:00 u2404-srv1 kernel:  ? entry_SYSCALL_64_after_hwframe+0x78/0x80
          2024-12-07T10:07:39.061027+00:00 u2404-srv1 kernel:  </TASK>
          

          here is what Lustre debug says.

          00040000:00000001:12.0F:1733566009.851434:0:1076:0:(lquota_lib.c:508:lquota_init()) Process entered
          00040000:00000001:12.0:1733566009.851478:0:1076:0:(qmt_dev.c:477:qmt_glb_init()) Process entered
          00000020:00000001:12.0:1733566009.851480:0:1076:0:(genops.c:240:class_register_type()) Process entered
          00000020:00000010:12.0:1733566009.851486:0:1076:0:(genops.c:255:class_register_type()) kmalloced '(type)': 112 at 0000000087fcbcb5.
          00000020:00000001:12.0:1733566009.851514:0:1076:0:(genops.c:304:class_register_type()) Process leaving (rc=0 : 0 : 0)
          00040000:00000001:12.0:1733566009.851516:0:1076:0:(qmt_dev.c:481:qmt_glb_init()) Process leaving (rc=0 : 0 : 0)
          00040000:00000001:12.0:1733566009.851525:0:1076:0:(lquota_lib.c:529:lquota_init()) Process leaving (rc=0 : 0 : 0)
          00000020:00000001:12.0:1733566009.889191:0:1076:0:(genops.c:240:class_register_type()) Process entered
          00000020:00000010:12.0:1733566009.889198:0:1076:0:(genops.c:255:class_register_type()) kmalloced '(type)': 112 at 000000007ac64fef.
          00000020:00000001:12.0:1733566009.889220:0:1076:0:(genops.c:304:class_register_type()) Process leaving (rc=0 : 0 : 0)
          00080000:00000010:11.0F:1733566016.106071:0:1078:0:(osd_handler.c:1884:osd_trans_commit_cb()) kfreed 'oh': 288 at 00000000597e600a.
          00080000:00000010:11.0:1733566016.106084:0:1078:0:(osd_handler.c:1884:osd_trans_commit_cb()) kfreed 'oh': 288 at 00000000d9d7de44.
          00100000:00000001:6.0:1733566018.126109:0:1070:0:(osd_scrub.c:2186:osd_ios_root_fill()) Process entered
          00100000:00000001:6.0:1733566018.126110:0:1070:0:(osd_scrub.c:2203:osd_ios_root_fill()) Process leaving (rc=0 : 0 : 0)
          00100000:00000001:6.0:1733566018.126110:0:1070:0:(osd_scrub.c:2186:osd_ios_root_fill()) Process entered
          00100000:00000001:6.0:1733566018.126111:0:1070:0:(osd_scrub.c:2203:osd_ios_root_fill()) Process leaving (rc=0 : 0 : 0)
          00100000:00000001:6.0:1733566018.126111:0:1070:0:(osd_scrub.c:2186:osd_ios_root_fill()) Process entered
          00100000:00000001:6.0:1733566018.126112:0:1070:0:(osd_scrub.c:2203:osd_ios_root_fill()) Process leaving (rc=0 : 0 : 0)
          00100000:00000001:6.0:1733566018.126112:0:1070:0:(osd_scrub.c:2186:osd_ios_root_fill()) Process entered
          00100000:00000001:6.0:1733566018.126113:0:1070:0:(osd_scrub.c:2203:osd_ios_root_fill()) Process leaving (rc=0 : 0 : 0)
          
          loop osd_ios_root_fill() forever
          
          sihara Shuichi Ihara added a comment - Thank you Jian. patch worked and UBSAN array-index-out-of-bounds warning was gone. However, mounting device still stuck and remained. This is now a real defect. Here is stack trace of mount.lustre. 2024-12-07T10:07:39.059352+00:00 u2404-srv1 kernel: task:mount.lustre state:R running task stack:0 pid:1070 tgid:1070 ppid:1069 flags:0x00004002 2024-12-07T10:07:39.059354+00:00 u2404-srv1 kernel: Call Trace: 2024-12-07T10:07:39.059355+00:00 u2404-srv1 kernel: <TASK> 2024-12-07T10:07:39.059371+00:00 u2404-srv1 kernel: ? ldiskfs_readdir+0xab4/0xe60 [ldiskfs] 2024-12-07T10:07:39.059373+00:00 u2404-srv1 kernel: ? sysvec_apic_timer_interrupt+0x49/0x90 2024-12-07T10:07:39.059374+00:00 u2404-srv1 kernel: ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 2024-12-07T10:07:39.059376+00:00 u2404-srv1 kernel: ? iterate_dir+0x90/0x170 2024-12-07T10:07:39.059377+00:00 u2404-srv1 kernel: ? osd_ios_general_scan+0x17f/0x250 [osd_ldiskfs] 2024-12-07T10:07:39.059378+00:00 u2404-srv1 kernel: ? __pfx_osd_ios_root_fill+0x10/0x10 [osd_ldiskfs] 2024-12-07T10:07:39.059380+00:00 u2404-srv1 kernel: ? osd_scrub_setup+0x7c0/0x14d0 [osd_ldiskfs] 2024-12-07T10:07:39.059398+00:00 u2404-srv1 kernel: ? _raw_spin_unlock_irqrestore+0x11/0x50 2024-12-07T10:07:39.059405+00:00 u2404-srv1 kernel: ? lprocfs_oh_alloc_pcpu+0x55/0xf0 [obdclass] 2024-12-07T10:07:39.059871+00:00 u2404-srv1 kernel: ? osd_device_alloc+0x64f/0xa70 [osd_ldiskfs] 2024-12-07T10:07:39.059884+00:00 u2404-srv1 kernel: ? obd_setup+0x21f/0x3e0 [obdclass] 2024-12-07T10:07:39.059886+00:00 u2404-srv1 kernel: ? class_setup+0x487/0x840 [obdclass] 2024-12-07T10:07:39.059888+00:00 u2404-srv1 kernel: ? class_process_config+0x1bdf/0x31a0 [obdclass] 2024-12-07T10:07:39.060940+00:00 u2404-srv1 kernel: ? do_lcfg+0x495/0x830 [obdclass] 2024-12-07T10:07:39.060960+00:00 u2404-srv1 kernel: ? lustre_start_simple+0x136/0x1e0 [obdclass] 2024-12-07T10:07:39.060961+00:00 u2404-srv1 kernel: ? server_fill_super+0x7c8/0x14d0 [ptlrpc] 2024-12-07T10:07:39.060963+00:00 u2404-srv1 kernel: ? lustre_fill_super+0x255/0x480 [lustre] 2024-12-07T10:07:39.060964+00:00 u2404-srv1 kernel: ? __pfx_lustre_fill_super+0x10/0x10 [lustre] 2024-12-07T10:07:39.060966+00:00 u2404-srv1 kernel: ? mount_nodev+0x52/0xb0 2024-12-07T10:07:39.060968+00:00 u2404-srv1 kernel: ? lustre_mount+0x18/0x30 [lustre] 2024-12-07T10:07:39.060969+00:00 u2404-srv1 kernel: ? legacy_get_tree+0x2c/0x60 2024-12-07T10:07:39.060971+00:00 u2404-srv1 kernel: ? vfs_get_tree+0x2f/0xf0 2024-12-07T10:07:39.060972+00:00 u2404-srv1 kernel: ? do_new_mount+0x143/0x3a0 2024-12-07T10:07:39.060973+00:00 u2404-srv1 kernel: ? path_mount+0x311/0x540 2024-12-07T10:07:39.060975+00:00 u2404-srv1 kernel: ? __se_sys_mount+0x174/0x1f0 2024-12-07T10:07:39.060976+00:00 u2404-srv1 kernel: ? __x64_sys_mount+0x25/0x40 2024-12-07T10:07:39.060978+00:00 u2404-srv1 kernel: ? x64_sys_call+0x2726/0x2be0 2024-12-07T10:07:39.060979+00:00 u2404-srv1 kernel: ? do_syscall_64+0x89/0x160 2024-12-07T10:07:39.060980+00:00 u2404-srv1 kernel: ? fsnotify_destroy_marks+0x87/0x1c0 2024-12-07T10:07:39.060981+00:00 u2404-srv1 kernel: ? call_rcu+0x25/0x60 2024-12-07T10:07:39.060982+00:00 u2404-srv1 kernel: ? evict+0x202/0x230 2024-12-07T10:07:39.060984+00:00 u2404-srv1 kernel: ? kmem_cache_free+0x2b/0x2d0 2024-12-07T10:07:39.060985+00:00 u2404-srv1 kernel: ? fpregs_restore_userregs+0x81/0x150 2024-12-07T10:07:39.061009+00:00 u2404-srv1 kernel: ? switch_fpu_return+0xe/0x20 2024-12-07T10:07:39.061011+00:00 u2404-srv1 kernel: ? arch_exit_to_user_mode_prepare+0x68/0x70 2024-12-07T10:07:39.061013+00:00 u2404-srv1 kernel: ? syscall_exit_to_user_mode+0xb6/0xe0 2024-12-07T10:07:39.061014+00:00 u2404-srv1 kernel: ? do_syscall_64+0x95/0x160 2024-12-07T10:07:39.061016+00:00 u2404-srv1 kernel: ? do_syscall_64+0x95/0x160 2024-12-07T10:07:39.061020+00:00 u2404-srv1 kernel: ? arch_exit_to_user_mode_prepare+0x22/0x70 2024-12-07T10:07:39.061021+00:00 u2404-srv1 kernel: ? irqentry_exit_to_user_mode+0xac/0xd0 2024-12-07T10:07:39.061023+00:00 u2404-srv1 kernel: ? irqentry_exit+0x12/0x50 2024-12-07T10:07:39.061024+00:00 u2404-srv1 kernel: ? exc_page_fault+0x74/0xf0 2024-12-07T10:07:39.061026+00:00 u2404-srv1 kernel: ? entry_SYSCALL_64_after_hwframe+0x78/0x80 2024-12-07T10:07:39.061027+00:00 u2404-srv1 kernel: </TASK> here is what Lustre debug says. 00040000:00000001:12.0F:1733566009.851434:0:1076:0:(lquota_lib.c:508:lquota_init()) Process entered 00040000:00000001:12.0:1733566009.851478:0:1076:0:(qmt_dev.c:477:qmt_glb_init()) Process entered 00000020:00000001:12.0:1733566009.851480:0:1076:0:(genops.c:240:class_register_type()) Process entered 00000020:00000010:12.0:1733566009.851486:0:1076:0:(genops.c:255:class_register_type()) kmalloced '(type)': 112 at 0000000087fcbcb5. 00000020:00000001:12.0:1733566009.851514:0:1076:0:(genops.c:304:class_register_type()) Process leaving (rc=0 : 0 : 0) 00040000:00000001:12.0:1733566009.851516:0:1076:0:(qmt_dev.c:481:qmt_glb_init()) Process leaving (rc=0 : 0 : 0) 00040000:00000001:12.0:1733566009.851525:0:1076:0:(lquota_lib.c:529:lquota_init()) Process leaving (rc=0 : 0 : 0) 00000020:00000001:12.0:1733566009.889191:0:1076:0:(genops.c:240:class_register_type()) Process entered 00000020:00000010:12.0:1733566009.889198:0:1076:0:(genops.c:255:class_register_type()) kmalloced '(type)': 112 at 000000007ac64fef. 00000020:00000001:12.0:1733566009.889220:0:1076:0:(genops.c:304:class_register_type()) Process leaving (rc=0 : 0 : 0) 00080000:00000010:11.0F:1733566016.106071:0:1078:0:(osd_handler.c:1884:osd_trans_commit_cb()) kfreed 'oh': 288 at 00000000597e600a. 00080000:00000010:11.0:1733566016.106084:0:1078:0:(osd_handler.c:1884:osd_trans_commit_cb()) kfreed 'oh': 288 at 00000000d9d7de44. 00100000:00000001:6.0:1733566018.126109:0:1070:0:(osd_scrub.c:2186:osd_ios_root_fill()) Process entered 00100000:00000001:6.0:1733566018.126110:0:1070:0:(osd_scrub.c:2203:osd_ios_root_fill()) Process leaving (rc=0 : 0 : 0) 00100000:00000001:6.0:1733566018.126110:0:1070:0:(osd_scrub.c:2186:osd_ios_root_fill()) Process entered 00100000:00000001:6.0:1733566018.126111:0:1070:0:(osd_scrub.c:2203:osd_ios_root_fill()) Process leaving (rc=0 : 0 : 0) 00100000:00000001:6.0:1733566018.126111:0:1070:0:(osd_scrub.c:2186:osd_ios_root_fill()) Process entered 00100000:00000001:6.0:1733566018.126112:0:1070:0:(osd_scrub.c:2203:osd_ios_root_fill()) Process leaving (rc=0 : 0 : 0) 00100000:00000001:6.0:1733566018.126112:0:1070:0:(osd_scrub.c:2186:osd_ios_root_fill()) Process entered 00100000:00000001:6.0:1733566018.126113:0:1070:0:(osd_scrub.c:2203:osd_ios_root_fill()) Process leaving (rc=0 : 0 : 0) loop osd_ios_root_fill() forever
          yujian Jian Yu added a comment -

          Hi sihara,
          Could you please try the above patch?

          yujian Jian Yu added a comment - Hi sihara , Could you please try the above patch?

          "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57327
          Subject: LU-18518 ldiskfs: fix htree_lock array-index-out-of-bounds
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 043260c2c82e1e1757079bb2bc90da9b5c615249

          gerrit Gerrit Updater added a comment - "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57327 Subject: LU-18518 ldiskfs: fix htree_lock array-index-out-of-bounds Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 043260c2c82e1e1757079bb2bc90da9b5c615249

          People

            yujian Jian Yu
            sihara Shuichi Ihara
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: