Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.7.0
-
None
-
el6.6
-
3
-
16507
Description
we are violating the vfs locking scheme in el6.6 during our direct calling of f_op->fsync(). The call sequence cfs_tracefile_dump_all_pages() > filp_fsync() -> fp>f_op->fsync() -> ext4_sync_file() -> ext4_flush_unwritten_io() is triggering the following code:
WARN_ON_ONCE(!mutex_is_locked(&inode->i_mutex) && !(inode->i_state & I_FREEING));
Comment in ext4_sync_file() says:
- i_mutex lock is held when entering and exiting this function
We don't conform to that requirement, thus hitting the WARN_ON_ONCE() check in ext4_flush_unwritten_io().
Not sure how to repair this problem. We can't just put a mutex_lock/mutex_unlock inside or around our filp_fsync() as there are other kernel versions we actively support that lock i_mutex inside their ext4_sync_file() routines, at least el7 and sles11sp3, maybe others too.
While this isn't a fatal problem and doesn't cause panics & crashes, it is a regression.
As an example a typical instance looks like this in the syslog:
Nov 12 08:59:21 centos65-2 kernel: WARNING: at fs/ext4/inode.c:3929 ext4_flush_unwritten_io+0x74/0x80 [ext4]() (Not tainted) Nov 12 08:59:21 centos65-2 kernel: Hardware name: VMware Virtual Platform Nov 12 08:59:21 centos65-2 kernel: Modules linked in: jbd sha512_generic crc32c_intel libcfs(U) rfcomm sco bridge bnep l2cap autofs4 8021q garp stp llc ipv6 fuse uinput microcode vmware_balloon btusb bluetooth rfkill snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 sg i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: lnet] Nov 12 08:59:21 centos65-2 kernel: Pid: 106487, comm: lctl Not tainted 2.6.32.504.1.3.l1111 #1 Nov 12 08:59:21 centos65-2 kernel: Call Trace: Nov 12 08:59:21 centos65-2 kernel: [<ffffffff81074df7>] ? warn_slowpath_common+0x87/0xc0 Nov 12 08:59:21 centos65-2 kernel: [<ffffffff81074e4a>] ? warn_slowpath_null+0x1a/0x20 Nov 12 08:59:21 centos65-2 kernel: [<ffffffffa00e8bb4>] ? ext4_flush_unwritten_io+0x74/0x80 [ext4] Nov 12 08:59:21 centos65-2 kernel: [<ffffffffa00e4fc8>] ? ext4_sync_file+0x88/0x1d0 [ext4] Nov 12 08:59:21 centos65-2 kernel: [<ffffffffa03a3cf8>] ? cfs_tracefile_dump_all_pages+0x178/0x2c0 [libcfs] Nov 12 08:59:21 centos65-2 kernel: [<ffffffffa03a3ecb>] ? cfs_trace_dump_debug_buffer_usrstr+0x8b/0x90 [libcfs] Nov 12 08:59:21 centos65-2 kernel: [<ffffffffa039a583>] ? __proc_dump_kernel+0x23/0x30 [libcfs] Nov 12 08:59:21 centos65-2 kernel: [<ffffffffa0399efb>] ? lprocfs_call_handler+0x2b/0x70 [libcfs] Nov 12 08:59:21 centos65-2 kernel: [<ffffffffa0399f95>] ? proc_dump_kernel+0x25/0x30 [libcfs] Nov 12 08:59:21 centos65-2 kernel: [<ffffffff81204157>] ? proc_sys_call_handler+0x97/0xd0 Nov 12 08:59:21 centos65-2 kernel: [<ffffffff812041a4>] ? proc_sys_write+0x14/0x20 Nov 12 08:59:21 centos65-2 kernel: [<ffffffff8118e4a8>] ? vfs_write+0xb8/0x1a0 Nov 12 08:59:21 centos65-2 kernel: [<ffffffff8118ee71>] ? sys_write+0x51/0x90 Nov 12 08:59:21 centos65-2 kernel: [<ffffffff810e5ece>] ? __audit_syscall_exit+0x25e/0x290 Nov 12 08:59:21 centos65-2 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Nov 12 08:59:21 centos65-2 kernel: ---[ end trace f63906e52fe1f987 ]---
Attachments
Issue Links
- is duplicated by
-
LU-6118 locking flaw generates logged errors
- Resolved