Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15081

set_nlink() can race with itself corrupting s_remove_count on the client

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      Mar  3 16:08:57 c-lmo161 kernel: ------------[ cut here ]------------
      Mar  3 16:08:57 c-lmo161 kernel: WARNING: CPU: 5 PID: 195090 at fs/inode.c:241 __destroy_inode+0xdb/0xf0
      Mar  3 16:08:57 c-lmo161 kernel: Modules linked in: mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) lov(OE) fld(OE) osc(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache mlx5_ib(OE) mlx5_core(OE) mlxfw sunrpc ib_isert(OE) ib_srpt(OE) ib_srp(OE) ib_ucm(OE) sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel iTCO_wdt iTCO_vendor_support lrw gf128mul glue_helper ablk_helper cryptd rpcrdma(OE) svcrdma(OE) xprtrdma(OE) ib_umad(OE) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_iser(OE) ib_ipoib(OE) ib_cm(OE) pcspkr joydev lpc_ich mei_me sg i2c_i801 mei wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_ib(OE) ib_uverbs(OE) mlx4_en(OE)
      Mar  3 16:08:57 c-lmo161 kernel: ib_core(OE) mgag200 mlx4_core(OE) drm_kms_helper syscopyarea isci sysfillrect igb crct10dif_pclmul crct10dif_common sysimgblt fb_sys_fops ahci crc32c_intel ttm libsas libahci scsi_transport_sas ptp drm pps_core libata dca mlx_compat(OE) drm_panel_orientation_quirks i2c_algo_bit devlink dm_mirror dm_region_hash dm_log dm_mod
      Mar  3 16:08:57 c-lmo161 kernel: CPU: 5 PID: 195090 Comm: mdtest Tainted: G        W  OE  ------------   3.10.0-957.5.1.el7.x86_64 #1
      Mar  3 16:08:57 c-lmo161 kernel: Hardware name: Intel Corporation S2600JF/S2600JF, BIOS SE5C600.86B.02.04.0003.102320141138 10/23/2014
      Mar  3 16:08:57 c-lmo161 kernel: Call Trace:
      Mar  3 16:08:57 c-lmo161 kernel: [<ffffffff9ff61e41>] dump_stack+0x19/0x1b
      Mar  3 16:08:57 c-lmo161 kernel: [<ffffffff9f897688>] __warn+0xd8/0x100
      Mar  3 16:08:57 c-lmo161 kernel: [<ffffffff9f8977cd>] warn_slowpath_null+0x1d/0x20
      Mar  3 16:08:57 c-lmo161 kernel: [<ffffffff9fa5ecbb>] __destroy_inode+0xdb/0xf0
      Mar  3 16:08:57 c-lmo161 kernel: [<ffffffff9fa5ecf2>] destroy_inode+0x22/0x60
      Mar  3 16:08:57 c-lmo161 kernel: [<ffffffff9fa5ee45>] evict+0x115/0x180
      Mar  3 16:08:57 c-lmo161 kernel: [<ffffffff9fa5f6ec>] iput+0xfc/0x190
      Mar  3 16:08:57 c-lmo161 kernel: [<ffffffff9fa5385e>] do_unlinkat+0x1ae/0x2d0
      Mar  3 16:08:57 c-lmo161 kernel: [<ffffffff9fa54916>] SyS_unlink+0x16/0x20
      Mar  3 16:08:57 c-lmo161 kernel: [<ffffffff9ff74ddb>] system_call_fastpath+0x22/0x27
      Mar  3 16:08:57 c-lmo161 kernel: ---[ end trace f53c1dca536ee43c ]---
      

      Despite the fact that s_remove_counter itself is an atomic, set_nlink() handling is not atomic. Two racing set_nlink()'s, depending on the race, will lead to one of the two nlink values. However, this race can corrupt s_remove_counter which will lead to spam in the kernel logs.

      Attachments

        Activity

          People

            panda Andrew Perepechko
            panda Andrew Perepechko
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: