Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      client crash with gdsio 32k < PAGE_SIZE of ARM CPU (64K) when

      # getconf PAGE_SIZE
      65536
      # /usr/local/cuda/gds/tools/gdsio -f /lustre/file -d 0 -n 0 -w 1 -s 1m -i 32k -x 0 -I 1
      
      [66108.386817] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000fffd787a1000
      [66108.397771] Mem abort info:
      [66108.400627]   ESR = 0x000000009600000f
      [66108.404455]   EC = 0x25: DABT (current EL), IL = 32 bits
      [66108.409886]   SET = 0, FnV = 0
      [66108.413002]   EA = 0, S1PTW = 0
      [66108.416206]   FSC = 0x0f: level 3 permission fault
      [66108.421104] Data abort info:
      [66108.424041]   ISV = 0, ISS = 0x0000000f, ISS2 = 0x00000000
      [66108.429649]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      [66108.434809]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
      [66108.440239] user pgtable: 64k pages, 48-bit VAs, pgdp=00000001528da000
      [66108.446911] [0000fffd787a1000] pgd=080000048cb30003, p4d=080000048cb30003, pud=080000048cb30003, pmd=08000004953f0003, pte=00e8000642910f43
      [66108.459722] Internal error: Oops: 000000009600000f [#1] SMP
      [66108.465419] Modules linked in: mgc(OE) lustre(OE) mdc(OE) fid(OE) lov(OE) osc(OE) lmv(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlxdevm(OE) ib_uverbs(OE) ib_core(OE) mlx_compat(OE) psample mlxfw(OE) macsec tls pci_hyperv_intf knem(OE) mst_pciconf(OE) crc32_generic rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs nvidia_uvm(OE) nvidia_drm(OE) nvidia_modeset(OE) nvidia(OE) video nouveau drm_exec gpu_sched drm_display_helper cec drm_ttm_helper ttm vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables libcrc32c nfnetlink qrtr sunrpc vfat fat acpi_ipmi spi_nor ipmi_ssif i2c_smbus arm_cspmu_module arm_spe_pmu mtd ipmi_devintf ipmi_msghandler
      [66108.465468]  coresight_stm coresight_tmc stm_core coresight_funnel cppc_cpufreq coresight ext4 mbcache jbd2 ast drm_shmem_helper i2c_algo_bit drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops nvme sha256_arm64 ixgbe drm sha1_ce nvme_core sbsa_gwdt mdio nvme_common spi_tegra210_quad acpi_power_meter fuse [last unloaded: libcfs]
      [66108.587367] CPU: 21 PID: 937378 Comm: gdsio Kdump: loaded Tainted: G           OE     -------  ---  5.14.0-427.18.1.el9_4.aarch64+64k #1
      [66108.599907] Hardware name: Giga Computing H223-V10-AAW1-000/MV13-HD0-000, BIOS F07 05/13/2024
      [66108.608625] pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
      [66108.615741] pc : cl_sub_dio_alloc+0x13c/0x300 [obdclass]
      [66108.621202] lr : cl_sub_dio_alloc+0x128/0x300 [obdclass]
      [66108.626648] sp : ffff8000bb4af650
      [66108.630030] x29: ffff8000bb4af650 x28: ffff0002a1389be8 x27: ffff00040ce5c698
      [66108.637325] x26: ffffa02065880000 x25: 0000000000000000 x24: 0000000000000001
      [66108.644619] x23: ffffa020672958c0 x22: ffff8000bb4af880 x21: 0000000000000006
      [66108.651914] x20: ffff0004364ed1a0 x19: ffff00040ab154a8 x18: 0000000000000000
      [66108.659209] x17: 0000000000000000 x16: ffffa020c0d4a100 x15: 0000000000000000
      [66108.666503] x14: 0000000000000fd4 x13: 0000000000000000 x12: 0000000000000fd3
      [66108.673798] x11: 0000000000000040 x10: 000000000002dcd5 x9 : ffffa020c0af9e64
      [66108.681093] x8 : ffff0000de8642e0 x7 : 0000000000000000 x6 : 0000000001704015
      [66108.688388] x5 : ffff6056f2bd0000 x4 : ffff0004538c3f00 x3 : ffff0000de8642d0
      [66108.695683] x2 : ffff6056f2bd0000 x1 : ffff0004538c3f00 x0 : 0000fffd787a1000
      [66108.702977] Call trace:
      [66108.705472]  cl_sub_dio_alloc+0x13c/0x300 [obdclass]
      [66108.710562]  ll_direct_IO_impl+0x328/0xa60 [lustre]
      [66108.715568]  ll_direct_IO+0x18/0x20 [lustre]
      [66108.719940]  generic_file_direct_write+0xd0/0x1dc
      [66108.724759]  __generic_file_write_iter+0x98/0x1b0
      [66108.729565]  vvp_io_write_start+0x32c/0xae0 [lustre]
      [66108.734648]  cl_io_start+0x78/0x140 [obdclass]
      [66108.739220]  cl_io_loop+0xac/0x210 [obdclass]
      [66108.743688]  ll_file_io_generic+0x428/0xc60 [lustre]
      [66108.748784]  do_file_write_iter+0x444/0x680 [lustre]
      [66108.753866]  ll_file_write_iter+0x58/0x120 [lustre]
      [66108.758858]  vfs_write+0x250/0x300
      [66108.762334]  ksys_pwrite64+0x78/0xc0
      [66108.765983]  __arm64_sys_pwrite64+0x24/0x30
      [66108.770255]  invoke_syscall.constprop.0+0x7c/0xd0
      [66108.775067]  do_el0_svc+0xb4/0xd0
      [66108.778449]  el0_svc+0xe8/0x1f4
      [66108.781657]  el0t_64_sync_handler+0x134/0x150
      [66108.786107]  el0t_64_sync+0x17c/0x180
      [66108.789850] Code: 37200560 f9404a63 b4000de3 f9400ec0 (a9400400) 
      [66108.796080] SMP: stopping secondary CPUs
      [66108.802025] Starting crashdump kernel...
      [66108.806032] Bye!
      

      without unaligned_dio, GDSIO fails since 32K IO size is not aligned against PAGE_SIZE. So, this is expected and its fine

      # lctl set_param llite.*.unaligned_dio=0
      #  /usr/local/cuda/gds/tools/gdsio -f /lustre/file -d 0 -n 0 -w 1 -s 1m -i 32k -x 0 -I 1
      
      write io failed of type 1 size: 32768 , ret: 0 
      failed to submit io of type 1 ret: -5 
      Error: IO failed stopping traffic, fd :35 ret:-5 errno :5
      io failed :ret :-5 errno :5, file offset :0, block size  :32768
      

      Tested commit: "ede8d928d6 LU-17871 ldlm: FLOCK ownlocks may be not set" in master branch.

      Attachments

        Activity

          People

            wc-triage WC Triage
            sihara Shuichi Ihara
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: