Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5912

locking flaw generates logged errors

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.7.0, Lustre 2.5.4
    • Lustre 2.7.0
    • None
    • el6.6
    • 3
    • 16507

    Description

      we are violating the vfs locking scheme in el6.6 during our direct calling of f_op->fsync(). The call sequence cfs_tracefile_dump_all_pages() > filp_fsync() -> fp>f_op->fsync() -> ext4_sync_file() -> ext4_flush_unwritten_io() is triggering the following code:

              WARN_ON_ONCE(!mutex_is_locked(&inode->i_mutex) &&
                           !(inode->i_state & I_FREEING));
      

      Comment in ext4_sync_file() says:

      • i_mutex lock is held when entering and exiting this function

      We don't conform to that requirement, thus hitting the WARN_ON_ONCE() check in ext4_flush_unwritten_io().

      Not sure how to repair this problem. We can't just put a mutex_lock/mutex_unlock inside or around our filp_fsync() as there are other kernel versions we actively support that lock i_mutex inside their ext4_sync_file() routines, at least el7 and sles11sp3, maybe others too.

      While this isn't a fatal problem and doesn't cause panics & crashes, it is a regression.

      As an example a typical instance looks like this in the syslog:

      Nov 12 08:59:21 centos65-2 kernel: WARNING: at fs/ext4/inode.c:3929 ext4_flush_unwritten_io+0x74/0x80 [ext4]() (Not tainted)
      Nov 12 08:59:21 centos65-2 kernel: Hardware name: VMware Virtual Platform
      Nov 12 08:59:21 centos65-2 kernel: Modules linked in: jbd sha512_generic crc32c_intel libcfs(U) rfcomm sco bridge bnep l2cap autofs4 8021q garp stp llc ipv6 fuse uinput microcode vmware_balloon btusb bluetooth rfkill snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 sg i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: lnet]
      Nov 12 08:59:21 centos65-2 kernel: Pid: 106487, comm: lctl Not tainted 2.6.32.504.1.3.l1111 #1
      Nov 12 08:59:21 centos65-2 kernel: Call Trace:
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffff81074df7>] ? warn_slowpath_common+0x87/0xc0
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffff81074e4a>] ? warn_slowpath_null+0x1a/0x20
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffffa00e8bb4>] ? ext4_flush_unwritten_io+0x74/0x80 [ext4]
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffffa00e4fc8>] ? ext4_sync_file+0x88/0x1d0 [ext4]
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffffa03a3cf8>] ? cfs_tracefile_dump_all_pages+0x178/0x2c0 [libcfs]
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffffa03a3ecb>] ? cfs_trace_dump_debug_buffer_usrstr+0x8b/0x90 [libcfs]
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffffa039a583>] ? __proc_dump_kernel+0x23/0x30 [libcfs]
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffffa0399efb>] ? lprocfs_call_handler+0x2b/0x70 [libcfs]
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffffa0399f95>] ? proc_dump_kernel+0x25/0x30 [libcfs]
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffff81204157>] ? proc_sys_call_handler+0x97/0xd0
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffff812041a4>] ? proc_sys_write+0x14/0x20
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffff8118e4a8>] ? vfs_write+0xb8/0x1a0
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffff8118ee71>] ? sys_write+0x51/0x90
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffff810e5ece>] ? __audit_syscall_exit+0x25e/0x290
      Nov 12 08:59:21 centos65-2 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
      Nov 12 08:59:21 centos65-2 kernel: ---[ end trace f63906e52fe1f987 ]---
      

      Attachments

        Issue Links

          Activity

            People

              bogl Bob Glossman (Inactive)
              bogl Bob Glossman (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: