Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.8.0
-
None
-
RHEL6.7 servers running latest lustre 2.8 using latest lustre 2.8 clients.
-
3
-
9223372036854775807
Description
In our first attempt to test lustre 2.8 servers at scale the MDS crashed at bring up. The backtrace is:
<2>[ 8011.527071] kernel BUG at fs/jbd2/transaction.c:1028!
<4>[ 8011.532938] invalid opcode: 0000 1 SMP
<4>[ 8011.537761] last sysfs file: /sys/devices/virtual/block/dm-7/uevent
<4>[ 8011.544986] CPU 5
<4>[ 8011.547040] Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) ldiskfs(U) mbcache jbd2 lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ptlrpc(U) obdclass(U) ko2iblnd(U) lnet(U) sha512_generic crc32c_intel libcfs(U) mpt2sas mptctl mptbase dell_rbu autofs4 8021q garp stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_REJECT xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_sa ib_mad ib_core ib_addr dm_mirror dm_region_hash dm_log dm_round_robin scsi_dh_rdac dm_multipath dm_mod sg ipmi_devintf sd_mod crc_t10dif iTCO_wdt iTCO_vendor_support microcode wmi power_meter acpi_ipmi ipmi_si ipmi_msghandler dcdbas mpt3sas scsi_transport_sas raid_class shpchp sb_edac edac_core lpc_ich mfd_core ahci ipv6 nfs lockd fscache auth_rpcgss nfs_acl sunrpc tg3 mlx4_en ptp pps_core mlx4_core [last unloaded: scsi_wait_scan]
<4>[ 8011.646238]
<4>[ 8011.648140] Pid: 30211, comm: mdt01_503 Not tainted 2.6.32-573.26.1.el6.dne2.x86_64 #1 Dell Inc. PowerEdge R630/0CNCJW
<4>[ 8011.660578] RIP: 0010:[<ffffffffa0c9d92d>] [<ffffffffa0c9d92d>] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
<4>[ 8011.672316] RSP: 0018:ffff883e6751f580 EFLAGS: 00010246
<4>[ 8011.678473] RAX: ffff883e3a8be380 RBX: ffff883ce650e3d8 RCX: ffff883e9a7656e0
<4>[ 8011.686676] RDX: 0000000000000000 RSI: ffff883e9a7656e0 RDI: ffff883ce650e3d8
<4>[ 8011.694873] RBP: ffff883e6751f5a0 R08: c010000000000000 R09: 0000000000000000
<4>[ 8011.703072] R10: 0000000000000001 R11: 0000000000000008 R12: ffff883df92c8ba8
<4>[ 8011.716541] R13: ffff883e9a7656e0 R14: ffff884023ca1800 R15: 0000000000000058
<4>[ 8011.724738] FS: 0000000000000000(0000) GS:ffff88018fca0000(0000) knlGS:0000000000000000
<4>[ 8011.734220] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>[ 8011.740863] CR2: 00007fc59a79b380 CR3: 0000000001a8d000 CR4: 00000000001407e0
<4>[ 8011.749059] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[ 8011.757263] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[ 8011.765460] Process mdt01_503 (pid: 30211, threadinfo ffff883e6751c000, task ffff883e6750c040)
<4>[ 8011.775518] Stack:
<4>[ 8011.777983] ffff883ce650e3d8 ffffffffa0d827b0 ffff883e9a7656e0 0000000000000000
<4>[ 8011.786313] <d> ffff883e6751f5e0 ffffffffa0cc0fab ffff883e6751f5e0 ffffffffa0ccf968
<4>[ 8011.795363] <d> 0000000000000000 0000000000000058 0000000000000058 ffff883e9a7656e0
<4>[ 8011.804839] Call Trace:
<4>[ 8011.807798] [<ffffffffa0cc0fab>] __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
<4>[ 8011.817097] [<ffffffffa0ccf968>] ? ldiskfs_bread+0x18/0x80 [ldiskfs]
<4>[ 8011.824517] [<ffffffffa0d6635c>] osd_ldiskfs_write_record+0xec/0x340 [osd_ldiskfs]
<4>[ 8011.833519] [<ffffffffa0d690a3>] osd_write+0x183/0x5b0 [osd_ldiskfs]
<4>[ 8011.840965] [<ffffffffa05f8b4d>] dt_record_write+0x3d/0x130 [obdclass]
<4>[ 8011.848680] [<ffffffffa05bb44f>] llog_osd_write_rec+0xb6f/0x1ad0 [obdclass]
<4>[ 8011.856787] [<ffffffffa0d7ba0b>] ? dynlock_unlock+0x16b/0x1d0 [osd_ldiskfs]
<4>[ 8011.864894] [<ffffffffa05a9426>] llog_write_rec+0xb6/0x270 [obdclass]
<4>[ 8011.872419] [<ffffffffa05b2893>] llog_cat_add_rec+0x1c3/0x7b0 [obdclass]
<4>[ 8011.880234] [<ffffffffa05a9239>] llog_add+0x89/0x1c0 [obdclass]
<4>[ 8011.887176] [<ffffffffa108641d>] osp_sync_add_rec+0x26d/0x9b0 [osp]
<4>[ 8011.894501] [<ffffffffa1086c07>] osp_sync_add+0x77/0x80 [osp]
<4>[ 8011.901248] [<ffffffffa0fb502e>] ? lod_sub_get_thandle+0x24e/0x3c0 [lod]
<4>[ 8011.909060] [<ffffffffa1077823>] osp_object_destroy+0x173/0x230 [osp]
<4>[ 8011.916582] [<ffffffffa0fb85ad>] lod_sub_object_destroy+0x1fd/0x440 [lod]
It is suspected that the concurrent multiple llog append operations, such as unlink large striped (large EA) files concurrently, that can fill up a llog file in very short time. Because the llog may be shared (at the same time) by many concurrent modifications, then under some extreme cases, some unlink large striped file operation may have not declared to create llog file, but during the real llog append for recording the striped OST-objects, it finds that the current llog is filled up, and it needs to create new llog file, but because it does not declare related credits, then the lower layer may hit trouble.
I am not sure whether it is just such case or not. But since it is reproducible on customer site, I will make a patch for verification.