[LU-9848] LBUG: ASSERTION( len >= (24) && (len & 0x7) == 0 ) failed Created: 08/Aug/17  Updated: 18/Sep/17  Resolved: 28/Aug/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.1, Lustre 2.11.0

Type: Bug Priority: Major
Reporter: Olaf Faaland Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: llnl
Environment:

Linux version 3.10.0-514.26.2.1chaos.ch6_1.x86_64
Lustre 2.8.0_9.chaos
ZFS v0.7.0-8_g0f976d6


Issue Links:
Related
is related to LU-8422 llog_osd.c:165:llog_osd_pad()) ASSERT... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
LustreError: 43470:0:(llog_osd.c:165:llog_osd_pad()) ASSERTION( len >= (24) && (len & 0x7) == 0 ) failed:
LustreError: 43470:0:(llog_osd.c:165:llog_osd_pad()) LBUG
Pid: 43470, comm: mdt00_081

Kernel panic - not syncing: LBUG
CPU: 2 PID: 43470 Comm: mdt00_081 Tainted: P           OE  ------------   3.10.0-514.26.2.1chaos.ch6_1.x86_64 #1
Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
 ffffffffa0c50e0f 000000005a5d14f4 ffff887dd2457700 ffffffff8169d4bc
 ffff887dd2457780 ffffffff816966ff ffffffff00000008 ffff887dd2457790
 ffff887dd2457730 000000005a5d14f4 ffffffffa0d861e7 0000000000000246
Call Trace:
 [<ffffffff8169d4bc>] dump_stack+0x19/0x1b
 [<ffffffff816966ff>] panic+0xe3/0x1f2
 [<ffffffff810a6ac0>] ? call_usermodehelper_freeinfo+0x20/0x30
 [<ffffffffa0c34deb>] lbug_with_loc+0xab/0xc0 [libcfs]
 [<ffffffffa0d1727a>] llog_osd_pad+0x3ca/0x440 [obdclass]
 [<ffffffffa0d19967>] llog_osd_write_rec+0xe87/0x14d0 [obdclass]
 [<ffffffffa0d0b8da>] llog_write_rec+0xaa/0x280 [obdclass]
 [<ffffffffa0d100c0>] llog_cat_add_rec+0x210/0x8e0 [obdclass]
 [<ffffffffa0d08a3a>] llog_add+0x7a/0x1a0 [obdclass]
 [<ffffffffa1029f7c>] ? sub_updates_write+0x7f6/0xef8 [ptlrpc]
 [<ffffffffa102a373>] sub_updates_write+0xbed/0xef8 [ptlrpc]
 [<ffffffffa101899f>] top_trans_stop+0x62f/0x970 [ptlrpc]
 [<ffffffffa134c399>] lod_trans_stop+0x259/0x340 [lod]
 [<ffffffffa13c0c32>] ? mdd_links_rename+0x312/0x5d0 [mdd]
 [<ffffffffa13daafd>] mdd_trans_stop+0x1d/0x25 [mdd]
 [<ffffffffa13c5c18>] mdd_link+0x2e8/0x930 [mdd]
 [<ffffffffa0fa42d2>] ? lustre_msg_get_versions+0x22/0xf0 [ptlrpc]
 [<ffffffffa1296b6e>] mdt_reint_link+0xade/0xc30 [mdt]
 [<ffffffff8132f4d2>] ? strlcpy+0x42/0x60
 [<ffffffffa1298ef0>] mdt_reint_rec+0x80/0x210 [mdt]
 [<ffffffffa127bdf1>] mdt_reint_internal+0x5e1/0x990 [mdt]
 [<ffffffffa1285a07>] mdt_reint+0x67/0x140 [mdt]
 [<ffffffffa1004425>] tgt_request_handle+0x915/0x1320 [ptlrpc]
 [<ffffffffa0fb0e7b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
 [<ffffffffa0c41748>] ? lc_watchdog_touch+0x68/0x180 [libcfs]
 [<ffffffffa0faea4b>] ? ptlrpc_wait_event+0xab/0x350 [ptlrpc]
 [<ffffffff810c8aa2>] ? default_wake_function+0x12/0x20
 [<ffffffff810bdb18>] ? __wake_up_common+0x58/0x90
 [<ffffffffa0fb4f20>] ptlrpc_main+0xa90/0x1db0 [ptlrpc]
 [<ffffffff8102a569>] ? __switch_to+0xd9/0x4e0
 [<ffffffffa0fb4490>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc]
 [<ffffffff810b3a0f>] kthread+0xcf/0xe0

Appears to be regression of LU-8422



 Comments   
Comment by Olaf Faaland [ 09/Aug/17 ]

Intel has access to the patch stack we are running. See the lustre-release-fe-llnl project in gerrit.

Comment by Giuseppe Di Natale (Inactive) [ 10/Aug/17 ]

I'm able to reproduce this with a sizable simul run that only runs the symlink tests. Something similar to the following hits the issue within minutes on our testbed:

srun -N 72 -n $((72*36)) ./simul -d /p/lquake/dinatale/SIMUL -V 1 -n 20 -i "16,36,38,39,12,18,19,32"

Comment by Peter Jones [ 10/Aug/17 ]

Lai

Can you please advise?

Thanks

Peter

Comment by Olaf Faaland [ 10/Aug/17 ]

Lai,

We found that the directory the test, /p/lquake/dinatale/SIMUL/, is sharded (DNE2). We only run DNE1 in production and that may explain why we suddenly started encountering the error. We have no plans at this time to use DNE2 in production and so this may not need attention at all.

We will re-test with DNE1 directories and update this ticket.

Comment by Giuseppe Di Natale (Inactive) [ 11/Aug/17 ]

I ran the test on DNE1 directories and we don't hit the assertion. We can leave this ticket open since I believe the assert is still a problem, but we won't be hitting this in production.

Comment by Peter Jones [ 11/Aug/17 ]

What would be interesting to know is whether this issue still hits on the latest master (or at least 2.10.x)

Comment by Giuseppe Di Natale (Inactive) [ 14/Aug/17 ]

Peter, are you able to run those tests on your test hardware?

Comment by Gerrit Updater [ 15/Aug/17 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/28554
Subject: LU-9848 llog: check padding size for update reclen
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 254ac377c0a0dff333cd12bdc89a128a67648968

Comment by Lai Siyao [ 15/Aug/17 ]

This is a duplicate of LU-8422, but I think the fix https://review.whamcloud.com/#/c/21509/ for LU-8422 is not quite right, so I committed a new fix here.

Comment by Gerrit Updater [ 28/Aug/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28554/
Subject: LU-9848 llog: check padding size for update reclen
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6705d6cc40a83b0e94668d651c444c855482bd01

Comment by Peter Jones [ 28/Aug/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 28/Aug/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28762
Subject: LU-9848 llog: check padding size for update reclen
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 8d1a5d38b16ecde701efc6b0807291d9f76268e7

Comment by Gerrit Updater [ 06/Sep/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28762/
Subject: LU-9848 llog: check padding size for update reclen
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 18582ce8e6fbfd864bafbb7f92246f7a21f1681a

Generated at Sat Feb 10 02:29:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.