Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.4.1, Lustre 2.5.0
-
None
-
Lustre 2.5.0-RC1, RHEL 6, OpenSFS cluster with one MDS/MGS, two OSSs with two OSTs each, one node with Lustre mounted/NFS server exporting Lustre, three Lustre clients NFS mounting the file system
-
3
-
11320
Description
Running three instances of fsx writing to an NFS mounted Lustre file system with a single NFS server, the MDS crashes and reboots. From the MDS console:
Message from syslogd@c03 at Oct 29 13:04:27 ... kernel:LustreError: 16448:0:(osd_internal.h:963:osd_trans_exec_op()) ASSERTION( rb < OSD_OT_MAX ) failed: rb = 12 Message from syslogd@c03 at Oct 29 13:04:27 ... kernel:LustreError: 16448:0:(osd_internal.h:963:osd_trans_exec_op()) LBUG
From the crash dump:
crash> bt PID: 16448 TASK: ffff880738300040 CPU: 4 COMMAND: "mdt00_005" #0 [ffff8806565c1628] machine_kexec at ffffffff81035d6b #1 [ffff8806565c1688] crash_kexec at ffffffff810c0e22 #2 [ffff8806565c1758] panic at ffffffff8150de5f #3 [ffff8806565c17d8] lbug_with_loc at ffffffffa04f0eeb [libcfs] #4 [ffff8806565c17f8] osd_punch at ffffffffa0d12243 [osd_ldiskfs] #5 [ffff8806565c1848] llog_osd_write_blob at ffffffffa0639050 [obdclass] #6 [ffff8806565c18b8] llog_osd_write_rec at ffffffffa063d04e [obdclass] #7 [ffff8806565c1968] llog_write_rec at ffffffffa0610428 [obdclass] #8 [ffff8806565c19c8] llog_cat_add_rec at ffffffffa0619299 [obdclass] #9 [ffff8806565c1a38] llog_add at ffffffffa0610221 [obdclass] #10 [ffff8806565c1a88] mdd_changelog_store at ffffffffa0f54463 [mdd] #11 [ffff8806565c1af8] mdd_changelog_data_store at ffffffffa0f465c6 [mdd] #12 [ffff8806565c1b48] mdd_attr_set at ffffffffa0f4c9b1 [mdd] #13 [ffff8806565c1bc8] mdt_attr_set at ffffffffa0e1cc28 [mdt] #14 [ffff8806565c1c18] mdt_reint_setattr at ffffffffa0e1d4dd [mdt] #15 [ffff8806565c1c88] mdt_reint_rec at ffffffffa0e16eb1 [mdt] #16 [ffff8806565c1ca8] mdt_reint_internal at ffffffffa0dfec93 [mdt] #17 [ffff8806565c1ce8] mdt_reint at ffffffffa0dfef94 [mdt] #18 [ffff8806565c1d08] mdt_handle_common at ffffffffa0e01a8a [mdt] #19 [ffff8806565c1d58] mds_regular_handle at ffffffffa0e3bc55 [mdt] #20 [ffff8806565c1d68] ptlrpc_server_handle_request at ffffffffa07f2e25 [ptlrpc] #21 [ffff8806565c1e48] ptlrpc_main at ffffffffa07f418d [ptlrpc] #22 [ffff8806565c1ee8] kthread at ffffffff81096a36 #23 [ffff8806565c1f48] kernel_thread at ffffffff8100c0ca
On the OSTs, osd-ldiskfs.track_declares_assert=1
From the crash dump, here is the system information:
crash> sys KERNEL: /home/build/kernel/rpmbuild/BUILD/kernel-2.6.32.358.18.1.el6_lustre/vmlinux DUMPFILE: /var/crash/127.0.0.1-2013-10-29-13:05:16/vmcore [PARTIAL DUMP] CPUS: 24 DATE: Tue Oct 29 13:04:30 2013 UPTIME: 3 days, 22:31:58 LOAD AVERAGE: 0.17, 0.06, 0.02 TASKS: 362 NODENAME: c03 RELEASE: 2.6.32-358.18.1.el6_lustre.x86_64 VERSION: #1 SMP Fri Oct 11 16:41:53 PDT 2013 MACHINE: x86_64 (2399 Mhz) MEMORY: 32 GB PANIC: "Kernel panic - not syncing: LBUG"
Two of the three fsx commands were:
./fsx -c 5 /misc/export/nfs_x_1
./fsx /misc/export/nfs_x_2