Loading...

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.9.0
Affects Version/s: Lustre 2.8.0
Labels:
None
Environment:
RHEL6.7 servers running latest lustre 2.8 using latest lustre 2.8 clients.

Severity:
3
Rank (Obsolete):
9223372036854775807

In our first attempt to test lustre 2.8 servers at scale the MDS crashed at bring up. The backtrace is:

<2>[ 8011.527071] kernel BUG at fs/jbd2/transaction.c:1028!
<4>[ 8011.532938] invalid opcode: 0000 1 SMP
<4>[ 8011.537761] last sysfs file: /sys/devices/virtual/block/dm-7/uevent
<4>[ 8011.544986] CPU 5
<4>[ 8011.547040] Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) ldiskfs(U) mbcache jbd2 lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ptlrpc(U) obdclass(U) ko2iblnd(U) lnet(U) sha512_generic crc32c_intel libcfs(U) mpt2sas mptctl mptbase dell_rbu autofs4 8021q garp stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_REJECT xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_sa ib_mad ib_core ib_addr dm_mirror dm_region_hash dm_log dm_round_robin scsi_dh_rdac dm_multipath dm_mod sg ipmi_devintf sd_mod crc_t10dif iTCO_wdt iTCO_vendor_support microcode wmi power_meter acpi_ipmi ipmi_si ipmi_msghandler dcdbas mpt3sas scsi_transport_sas raid_class shpchp sb_edac edac_core lpc_ich mfd_core ahci ipv6 nfs lockd fscache auth_rpcgss nfs_acl sunrpc tg3 mlx4_en ptp pps_core mlx4_core [last unloaded: scsi_wait_scan]
<4>[ 8011.646238]
<4>[ 8011.648140] Pid: 30211, comm: mdt01_503 Not tainted 2.6.32-573.26.1.el6.dne2.x86_64 #1 Dell Inc. PowerEdge R630/0CNCJW
<4>[ 8011.660578] RIP: 0010:[<ffffffffa0c9d92d>] [<ffffffffa0c9d92d>] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
<4>[ 8011.672316] RSP: 0018:ffff883e6751f580 EFLAGS: 00010246
<4>[ 8011.678473] RAX: ffff883e3a8be380 RBX: ffff883ce650e3d8 RCX: ffff883e9a7656e0
<4>[ 8011.686676] RDX: 0000000000000000 RSI: ffff883e9a7656e0 RDI: ffff883ce650e3d8
<4>[ 8011.694873] RBP: ffff883e6751f5a0 R08: c010000000000000 R09: 0000000000000000
<4>[ 8011.703072] R10: 0000000000000001 R11: 0000000000000008 R12: ffff883df92c8ba8
<4>[ 8011.716541] R13: ffff883e9a7656e0 R14: ffff884023ca1800 R15: 0000000000000058
<4>[ 8011.724738] FS: 0000000000000000(0000) GS:ffff88018fca0000(0000) knlGS:0000000000000000
<4>[ 8011.734220] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>[ 8011.740863] CR2: 00007fc59a79b380 CR3: 0000000001a8d000 CR4: 00000000001407e0
<4>[ 8011.749059] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[ 8011.757263] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[ 8011.765460] Process mdt01_503 (pid: 30211, threadinfo ffff883e6751c000, task ffff883e6750c040)
<4>[ 8011.775518] Stack:
<4>[ 8011.777983] ffff883ce650e3d8 ffffffffa0d827b0 ffff883e9a7656e0 0000000000000000
<4>[ 8011.786313] <d> ffff883e6751f5e0 ffffffffa0cc0fab ffff883e6751f5e0 ffffffffa0ccf968
<4>[ 8011.795363] <d> 0000000000000000 0000000000000058 0000000000000058 ffff883e9a7656e0
<4>[ 8011.804839] Call Trace:
<4>[ 8011.807798] [<ffffffffa0cc0fab>] __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
<4>[ 8011.817097] [<ffffffffa0ccf968>] ? ldiskfs_bread+0x18/0x80 [ldiskfs]
<4>[ 8011.824517] [<ffffffffa0d6635c>] osd_ldiskfs_write_record+0xec/0x340 [osd_ldiskfs]
<4>[ 8011.833519] [<ffffffffa0d690a3>] osd_write+0x183/0x5b0 [osd_ldiskfs]
<4>[ 8011.840965] [<ffffffffa05f8b4d>] dt_record_write+0x3d/0x130 [obdclass]
<4>[ 8011.848680] [<ffffffffa05bb44f>] llog_osd_write_rec+0xb6f/0x1ad0 [obdclass]
<4>[ 8011.856787] [<ffffffffa0d7ba0b>] ? dynlock_unlock+0x16b/0x1d0 [osd_ldiskfs]
<4>[ 8011.864894] [<ffffffffa05a9426>] llog_write_rec+0xb6/0x270 [obdclass]
<4>[ 8011.872419] [<ffffffffa05b2893>] llog_cat_add_rec+0x1c3/0x7b0 [obdclass]
<4>[ 8011.880234] [<ffffffffa05a9239>] llog_add+0x89/0x1c0 [obdclass]
<4>[ 8011.887176] [<ffffffffa108641d>] osp_sync_add_rec+0x26d/0x9b0 [osp]
<4>[ 8011.894501] [<ffffffffa1086c07>] osp_sync_add+0x77/0x80 [osp]
<4>[ 8011.901248] [<ffffffffa0fb502e>] ? lod_sub_get_thandle+0x24e/0x3c0 [lod]
<4>[ 8011.909060] [<ffffffffa1077823>] osp_object_destroy+0x173/0x230 [osp]
<4>[ 8011.916582] [<ffffffffa0fb85ad>] lod_sub_object_destroy+0x1fd/0x440 [lod]

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

dmesg.txt
23/Aug/16 4:32 PM
126 kB
James A Simmons

is related to

LU-3102 kernel BUG at fs/jbd2/transaction.c:1033

Resolved

LU-8267 sanity test_129: (osd_internal.h:1137:osd_trans_exec_check()) LBUG

Resolved

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates