Description
Under RHEL5, we hit a kernel bug like following:
kernel BUG at fs/jbd2/commit.c:395
invalid opcode: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:09.0/0000:04:00.0/host2/target2:0:0/2:0:0:27/timeout
CPU 9
Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) exportfs ost(U) mgc(U) ldiskfs(U) jbd2 crc16 lustre(U) lov(U) osc(U) lquota(U) mdc(U) fid(U) fld(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs nfs_acl sg autofs4 ib_srp(U) lockd sunrpc rdma_ucm(U) ib_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6 xfrm_nalgo crypto_api ib_uverbs(U) ib_umad(U) iw_nes(U) mlx4_ib(U) ib_mthca(U) ib_mad(U) ib_core(U) loop dm_mirror dm_round_robin dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport mlx4_en(U) mlx4_core(U) i7core_edac tpm_tis bnx2 hpilo tpm edac_mc tpm_bios serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 13516, comm: jbd2/dm-14-8 Tainted: G -------------------- 2.6.18-348.6.1.el5_lustre #1
RIP: 0010:[<ffffffff88cadf98>] [<ffffffff88cadf98>] :jbd2:jbd2_journal_commit_transaction+0x50/0x1120
RSP: 0018:ffff810289fcbd70 EFLAGS: 00010246
RAX: 0000000000000008 RBX: ffff8105d7611800 RCX: 0000000000000246
RDX: 0000000000000008 RSI: 0000000000000286 RDI: 0000000000000000
RBP: ffff8105d7611800 R08: ffff8105d7611898 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000280 R12: ffff8105d7611824
R13: ffff8105d76118c0 R14: 0000000000000000 R15: ffff8105d7611890
FS: 0000000000000000(0000) GS:ffff81061fe5dc40(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000928d2c8 CR3: 000000061db55000 CR4: 00000000000006a0
Process jbd2/dm-14-8 (pid: 13516, threadinfo ffff810289fca000, task ffff8105ec0f27f0)
Stack: ffff8105e4eb6000 00000fdcec0f2828 0000000800000000 00000001ffffffff
0000000000000001 0000000084552f72 0000000000000000 00000000000000e4
0000000000000000 0000000000000000 0000000000000000 0000000000000001
Call Trace:
[<ffffffff8003dddd>] lock_timer_base+0x1b/0x3c
[<ffffffff8004b384>] try_to_del_timer_sync+0x7f/0x88
[<ffffffff88cb223e>] :jbd2:kjournald2+0x9a/0x1ec
[<ffffffff800a3cdf>] autoremove_wake_function+0x0/0x2e
[<ffffffff88cb21a4>] :jbd2:kjournald2+0x0/0x1ec
[<ffffffff800a3ac7>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032c4f>] kthread+0xfe/0x132
[<ffffffff8005dfc1>] child_rip+0xa/0x11
[<ffffffff800a3ac7>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032b51>] kthread+0x0/0x132
[<ffffffff8005dfb7>] child_rip+0x0/0x11
it could be reproduced by following steps will make rhel5 kernel crash.
Generating a big enough sequence id.
[root@localhost ~]# dumpe2fs sequence_image1 | grep sequence
dumpe2fs 1.42.9.wc1 (24-Feb-2014)
Journal sequence: 0xb2a5df5c
Then running following program under Ext4 filesystem:
[root@promote mnt]# cat test_fsync.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
int ret;
int fd = open(argv[1], O_RDWR | O_CREAT);
if (fd < 0)
return 1;
ret = fdatasync(fd);
if (ret < 0)
failed:
close(fd);
return 0;
}
- gcc test_fsync.c
#./a.out new_file ---->kernel will crash
Since @i_sync_tid and @i_datasync_tid is not properly assigned, it is initialized to zero.
these value is passed into ext4_sync_file().
the problem is 0 is a special value, tid_geq() will think 0 is a later commit..this will cause problems here.
This problem did not exist for rhel6, we need backport this patch for Lustre rhel5 series.
I confirm following patch could fix the problem:
commit 688f869ce3bdc892daa993534dc6df18c95df931
Author: Theodore Ts'o <tytso@mit.edu>
Date: Wed Mar 16 17:16:31 2011 -0400
ext4: Initialize fsync transaction ids in ext4_new_inode()
When allocating a new inode, we need to make sure i_sync_tid and
i_datasync_tid are initialized. Otherwise, one or both of these two
values could be left initialized to zero, which could potentially
result in BUG_ON in jbd2_journal_commit_transaction.
(This could happen by having journal->commit_request getting set to
zero, which could wake up the kjournald process even though there is
no running transaction, which then causes a BUG_ON via the
J_ASSERT(j_ruinning_transaction != NULL) statement.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 2fd3b0e..a679a48 100644
— a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1054,6 +1054,11 @@ got:
}
}
+ if (ext4_handle_valid(handle))
{ + ei->i_sync_tid = handle->h_transaction->t_tid; + ei->i_datasync_tid = handle->h_transaction->t_tid; + }+
err = ext4_mark_inode_dirty(handle, inode);
if (err) {
ext4_std_error(sb, err);