[LU-8527] Lustre 2.8 server crashed on bring up during large scale test shot - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.9.0
Affects Version/s: Lustre 2.8.0
Labels:
None
Environment:
RHEL6.7 servers running latest lustre 2.8 using latest lustre 2.8 clients.

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

In our first attempt to test lustre 2.8 servers at scale the MDS crashed at bring up. The backtrace is:

<2>[ 8011.527071] kernel BUG at fs/jbd2/transaction.c:1028!
<4>[ 8011.532938] invalid opcode: 0000 1 SMP
<4>[ 8011.537761] last sysfs file: /sys/devices/virtual/block/dm-7/uevent
<4>[ 8011.544986] CPU 5
<4>[ 8011.547040] Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) ldiskfs(U) mbcache jbd2 lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ptlrpc(U) obdclass(U) ko2iblnd(U) lnet(U) sha512_generic crc32c_intel libcfs(U) mpt2sas mptctl mptbase dell_rbu autofs4 8021q garp stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_REJECT xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_sa ib_mad ib_core ib_addr dm_mirror dm_region_hash dm_log dm_round_robin scsi_dh_rdac dm_multipath dm_mod sg ipmi_devintf sd_mod crc_t10dif iTCO_wdt iTCO_vendor_support microcode wmi power_meter acpi_ipmi ipmi_si ipmi_msghandler dcdbas mpt3sas scsi_transport_sas raid_class shpchp sb_edac edac_core lpc_ich mfd_core ahci ipv6 nfs lockd fscache auth_rpcgss nfs_acl sunrpc tg3 mlx4_en ptp pps_core mlx4_core [last unloaded: scsi_wait_scan]
<4>[ 8011.646238]
<4>[ 8011.648140] Pid: 30211, comm: mdt01_503 Not tainted 2.6.32-573.26.1.el6.dne2.x86_64 #1 Dell Inc. PowerEdge R630/0CNCJW
<4>[ 8011.660578] RIP: 0010:[<ffffffffa0c9d92d>] [<ffffffffa0c9d92d>] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
<4>[ 8011.672316] RSP: 0018:ffff883e6751f580 EFLAGS: 00010246
<4>[ 8011.678473] RAX: ffff883e3a8be380 RBX: ffff883ce650e3d8 RCX: ffff883e9a7656e0
<4>[ 8011.686676] RDX: 0000000000000000 RSI: ffff883e9a7656e0 RDI: ffff883ce650e3d8
<4>[ 8011.694873] RBP: ffff883e6751f5a0 R08: c010000000000000 R09: 0000000000000000
<4>[ 8011.703072] R10: 0000000000000001 R11: 0000000000000008 R12: ffff883df92c8ba8
<4>[ 8011.716541] R13: ffff883e9a7656e0 R14: ffff884023ca1800 R15: 0000000000000058
<4>[ 8011.724738] FS: 0000000000000000(0000) GS:ffff88018fca0000(0000) knlGS:0000000000000000
<4>[ 8011.734220] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>[ 8011.740863] CR2: 00007fc59a79b380 CR3: 0000000001a8d000 CR4: 00000000001407e0
<4>[ 8011.749059] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[ 8011.757263] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[ 8011.765460] Process mdt01_503 (pid: 30211, threadinfo ffff883e6751c000, task ffff883e6750c040)
<4>[ 8011.775518] Stack:
<4>[ 8011.777983] ffff883ce650e3d8 ffffffffa0d827b0 ffff883e9a7656e0 0000000000000000
<4>[ 8011.786313] <d> ffff883e6751f5e0 ffffffffa0cc0fab ffff883e6751f5e0 ffffffffa0ccf968
<4>[ 8011.795363] <d> 0000000000000000 0000000000000058 0000000000000058 ffff883e9a7656e0
<4>[ 8011.804839] Call Trace:
<4>[ 8011.807798] [<ffffffffa0cc0fab>] __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
<4>[ 8011.817097] [<ffffffffa0ccf968>] ? ldiskfs_bread+0x18/0x80 [ldiskfs]
<4>[ 8011.824517] [<ffffffffa0d6635c>] osd_ldiskfs_write_record+0xec/0x340 [osd_ldiskfs]
<4>[ 8011.833519] [<ffffffffa0d690a3>] osd_write+0x183/0x5b0 [osd_ldiskfs]
<4>[ 8011.840965] [<ffffffffa05f8b4d>] dt_record_write+0x3d/0x130 [obdclass]
<4>[ 8011.848680] [<ffffffffa05bb44f>] llog_osd_write_rec+0xb6f/0x1ad0 [obdclass]
<4>[ 8011.856787] [<ffffffffa0d7ba0b>] ? dynlock_unlock+0x16b/0x1d0 [osd_ldiskfs]
<4>[ 8011.864894] [<ffffffffa05a9426>] llog_write_rec+0xb6/0x270 [obdclass]
<4>[ 8011.872419] [<ffffffffa05b2893>] llog_cat_add_rec+0x1c3/0x7b0 [obdclass]
<4>[ 8011.880234] [<ffffffffa05a9239>] llog_add+0x89/0x1c0 [obdclass]
<4>[ 8011.887176] [<ffffffffa108641d>] osp_sync_add_rec+0x26d/0x9b0 [osp]
<4>[ 8011.894501] [<ffffffffa1086c07>] osp_sync_add+0x77/0x80 [osp]
<4>[ 8011.901248] [<ffffffffa0fb502e>] ? lod_sub_get_thandle+0x24e/0x3c0 [lod]
<4>[ 8011.909060] [<ffffffffa1077823>] osp_object_destroy+0x173/0x230 [osp]
<4>[ 8011.916582] [<ffffffffa0fb85ad>] lod_sub_object_destroy+0x1fd/0x440 [lod]

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

dmesg.txt
126 kB
23/Aug/16 4:32 PM

Issue Links

is related to

LU-3102 kernel BUG at fs/jbd2/transaction.c:1033

Resolved

LU-8267 sanity test_129: (osd_internal.h:1137:osd_trans_exec_check()) LBUG

Resolved

Activity

[LU-8527] Lustre 2.8 server crashed on bring up during large scale test shot

nasf (Inactive) added a comment - 26/Sep/16 8:16 AM

It is suspected that the concurrent multiple llog append operations, such as unlink large striped (large EA) files concurrently, that can fill up a llog file in very short time. Because the llog may be shared (at the same time) by many concurrent modifications, then under some extreme cases, some unlink large striped file operation may have not declared to create llog file, but during the real llog append for recording the striped OST-objects, it finds that the current llog is filled up, and it needs to create new llog file, but because it does not declare related credits, then the lower layer may hit trouble.

I am not sure whether it is just such case or not. But since it is reproducible on customer site, I will make a patch for verification.

nasf (Inactive) added a comment - 26/Sep/16 8:16 AM It is suspected that the concurrent multiple llog append operations, such as unlink large striped (large EA) files concurrently, that can fill up a llog file in very short time. Because the llog may be shared (at the same time) by many concurrent modifications, then under some extreme cases, some unlink large striped file operation may have not declared to create llog file, but during the real llog append for recording the striped OST-objects, it finds that the current llog is filled up, and it needs to create new llog file, but because it does not declare related credits, then the lower layer may hit trouble. I am not sure whether it is just such case or not. But since it is reproducible on customer site, I will make a patch for verification.

nasf (Inactive) added a comment - 23/Sep/16 2:43 AM

Lustre: 18235:0:(osd_handler.c:1272:osd_trans_dump_creds()) attr_set: 2/2/1, xattr_set: 1011/92/0

The "xattr_set" seems abnormal, under which case the oti_declare_ops[xattr_set] can be so much larger than oti_declare_ops_cred[xattr_set] ?

nasf (Inactive) added a comment - 23/Sep/16 2:43 AM Lustre: 18235:0:(osd_handler.c:1272:osd_trans_dump_creds()) attr_set: 2/2/1, xattr_set: 1011/92/0 The "xattr_set" seems abnormal, under which case the oti_declare_ops [xattr_set] can be so much larger than oti_declare_ops_cred [xattr_set] ?

Andreas Dilger added a comment - 14/Sep/16 9:56 PM

My first guess would be to take a look at the ldiskfs xattr inode (wide striping) patch. It may be that it is declaring too many credits in some cases.

Andreas Dilger added a comment - 14/Sep/16 9:56 PM My first guess would be to take a look at the ldiskfs xattr inode (wide striping) patch. It may be that it is declaring too many credits in some cases.

Alex Zhuravlev added a comment - 14/Sep/16 2:00 PM

Lustre: 18235:0:(osd_handler.c:1272:osd_trans_dump_creds()) attr_set: 2/2/1, xattr_set: 1011/92/0
Lustre: 18235:0:(osd_handler.c:1282:osd_trans_dump_creds()) write: 5043/43356/0, punch: 0/0/0, quota 4/52/0
Lustre: 18235:0:(osd_handler.c:1289:osd_trans_dump_creds()) insert: 1009/17152/0, delete: 2/9/1
Lustre: 18235:0:(osd_handler.c:1296:osd_trans_dump_creds()) ref_add: 1/1/0, ref_del: 2/2/1

the transaction declared at least 2+92+43356+52+17152+9+2+2=60667 credits, while consumed 1+1+1=3
well, probably a bit more as we don't track everything, but still way less than 60667.
it looks like a false assertion though I don't understand the root cause yet.

Alex Zhuravlev added a comment - 14/Sep/16 2:00 PM Lustre: 18235:0:(osd_handler.c:1272:osd_trans_dump_creds()) attr_set: 2/2/1, xattr_set: 1011/92/0 Lustre: 18235:0:(osd_handler.c:1282:osd_trans_dump_creds()) write: 5043/43356/0, punch: 0/0/0, quota 4/52/0 Lustre: 18235:0:(osd_handler.c:1289:osd_trans_dump_creds()) insert: 1009/17152/0, delete: 2/9/1 Lustre: 18235:0:(osd_handler.c:1296:osd_trans_dump_creds()) ref_add: 1/1/0, ref_del: 2/2/1 the transaction declared at least 2+92+43356+52+17152+9+2+2=60667 credits, while consumed 1+1+1=3 well, probably a bit more as we don't track everything, but still way less than 60667. it looks like a false assertion though I don't understand the root cause yet.

Alex Zhuravlev added a comment - 08/Sep/16 2:30 PM

i'm trying to come up with a debugging patch.

Alex Zhuravlev added a comment - 08/Sep/16 2:30 PM i'm trying to come up with a debugging patch.

James A Simmons added a comment - 07/Sep/16 7:46 PM

Any updates?

James A Simmons added a comment - 07/Sep/16 7:46 PM Any updates?

Jian Yu added a comment - 02/Sep/16 1:10 PM

Hi Alex,

It's Lustre b2_8_fe branch with commit 71badf2 on the tip.

Jian Yu added a comment - 02/Sep/16 1:10 PM Hi Alex, It's Lustre b2_8_fe branch with commit 71badf2 on the tip.

Alex Zhuravlev added a comment - 02/Sep/16 5:26 AM

please, tell the latest commit for the branch you're using.

Alex Zhuravlev added a comment - 02/Sep/16 5:26 AM please, tell the latest commit for the branch you're using.

James A Simmons added a comment - 01/Sep/16 4:38 PM

the first crash happened while running a large stripe single shared file with IOR. After the first crash this bug came back at random times.

James A Simmons added a comment - 01/Sep/16 4:38 PM the first crash happened while running a large stripe single shared file with IOR. After the first crash this bug came back at random times.

Andreas Dilger added a comment - 26/Aug/16 5:48 PM

It seems possible that this is related to creating or deleting a wide-striped file? I found ~~LU-3102~~ that may be related.

Andreas Dilger added a comment - 26/Aug/16 5:48 PM It seems possible that this is related to creating or deleting a wide-striped file? I found LU-3102 that may be related.

Jian Yu added a comment - 25/Aug/16 7:58 PM

Thank you Dustin for the logs.

Hi Alex,

Could you please look into the logs? And although patch http://review.whamcloud.com/22058 doesn't fix the issue in this ticket, do you think it is still a better-to-have patch for Lustre b2_8_fe servers?

Jian Yu added a comment - 25/Aug/16 7:58 PM Thank you Dustin for the logs. Hi Alex, Could you please look into the logs? And although patch http://review.whamcloud.com/22058 doesn't fix the issue in this ticket, do you think it is still a better-to-have patch for Lustre b2_8_fe servers?

People

Assignee:: Zhenyu Xu

Reporter:: James A Simmons

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 23/Aug/16 3:21 PM

Updated:: 18/Oct/16 9:53 PM

Resolved:: 08/Oct/16 7:04 PM