Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13377

potential dead loops on short writing

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      We see following dead loops on several cusomter sites:

      [167799.937527] NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [icon:246965]
      [167799.945027] Modules linked in: mst_pciconf(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache libcfs(OE) ve_peermem(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) ve_drv(OE) mlx4_core(OE) vp(OE) sunrpc dm_mirror dm_region_hash dm_log dm_mod vfat fat amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sg ipmi_si ipmi_devintf ipmi_msghandler pcc_cpufreq pinctrl_amd i2c_designware_platform k10temp i2c_piix4 acpi_cpufreq i2c_designware_core knem(OE) binfmt_misc ip_tables xfs libcrc32c mlx5_ib(OE) ib_uverbs(OE) sd_mod crc_t10dif crct10dif_generic ib_core(OE) ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci mlx5_core(OE) drm crct10dif_pclmul mlxfw(OE) crct10dif_common ptp crc32c_intel libahci pps_core devlink libata drm_panel_orientation_quirks mlx_compat(OE) nfit libnvdimm [last unloaded: mst_pci]
      [167800.047456] CPU: 21 PID: 246965 Comm: icon Kdump: loaded Tainted: G           OEL ------------   3.10.0-1062.4.1.el7.x86_64 #1
      [167800.058914] Hardware name: GIGABYTE G292-Z21-NJ/MZ22-G20-00, BIOS R07 01/02/2020
      [167800.066385] task: ffff9fb8d6751070 ti: ffffa0282bf84000 task.ti: ffffa0282bf84000
      [167800.073943] RIP: 0010:[<ffffffffc0ca3696>]  [<ffffffffc0ca3696>] vvp_io_rw_lock+0x346/0x7f0 [lustre]
      [167800.083183] RSP: 0018:ffffa0282bf87a48  EFLAGS: 00000202
      [167800.088571] RAX: 0000000000000000 RBX: 00002ae27e0da6d8 RCX: 0000000000000000
      [167800.095783] RDX: 000000000106f3d8 RSI: 0000000000000000 RDI: ffffa0282bf87a90
      [167800.102994] RBP: ffffa0282bf87b10 R08: 000000000106f400 R09: ffffa0282bf87ac0
      [167800.110206] R10: 00002ae27e0da6d8 R11: 000000000106f401 R12: ffffa0282bf879c0
      [167800.117417] R13: ffffa0282bf87a38 R14: ffffffffa78a8b09 R15: ffffa0282bf879b0
      [167800.124628] FS:  00002ae279437c00(0000) GS:ffffa025e7540000(0000) knlGS:0000000000000000
      [167800.132791] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [167800.138616] CR2: 000000000a390000 CR3: 00000016ab28e000 CR4: 0000000000340fe0
      [167800.145826] Call Trace:
      [167800.148367]  [<ffffffffc0ca3b85>] vvp_io_write_lock+0x45/0x80 [lustre]
      [167800.154990]  [<ffffffffc1d172af>] cl_io_lock+0x5f/0x3d0 [obdclass]
      [167800.161257]  [<ffffffffc1d1783a>] cl_io_loop+0xba/0x1c0 [obdclass]
      [167800.167518]  [<ffffffffc0c5ab8a>] ll_file_io_generic+0x61a/0xb10 [lustre]
      [167800.174392]  [<ffffffffc0c5b54c>] ll_file_aio_write+0x29c/0x6e0 [lustre]
      [167800.181174]  [<ffffffffa7a484fb>] do_sync_readv_writev+0x7b/0xd0
      [167800.187265]  [<ffffffffa7a4a13e>] do_readv_writev+0xce/0x260
      [167800.193009]  [<ffffffffc0c5b2b0>] ? ll_file_splice_read+0x230/0x230 [lustre]
      [167800.200143]  [<ffffffffc0c5b990>] ? ll_file_aio_write+0x6e0/0x6e0 [lustre]
      [167800.207097]  [<ffffffffa7a4bec1>] ? __sb_end_write+0x31/0x70
      [167800.212833]  [<ffffffffa7a4a365>] vfs_writev+0x35/0x60
      [167800.218050]  [<ffffffffa7a4a51f>] SyS_writev+0x7f/0x110
      [167800.223356]  [<ffffffffa7f8bede>] system_call_fastpath+0x25/0x2a
      [167800.229440] Code: e8 d0 78 d1 e6 4c 8b 7d 98 4d 85 ff 0f 84 f9 01 00 00 48 8b 45 80 48 8b 55 90 4c 8b 10 48 8b 40 08 48 29 d0 49 39 c7 49 0f 46 c7 <48> 85 c0 48 89 85 60 ff ff ff 74 be 49 01 d2 48 8b bd 70 ff ff 
      [167827.938477] NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [icon:246965]
      [167827.945977] Modules linked in: mst_pciconf(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache libcfs(OE) ve_peermem(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(O
      
      

      Attachments

        Activity

          People

            wshilong Wang Shilong (Inactive)
            wshilong Wang Shilong (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: