[LU-13615] mdt_big_xattr_get()) ASSERTION( info->mti_big_lmm_used == 0 ) failed Created: 31/May/20  Updated: 04/Oct/22  Resolved: 16/Jan/22

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mikhail Pershin Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: llnl

Issue Links:
Related
is related to LU-13599 LustreError: 30166:0:(service.c:189:p... Resolved
is related to LU-16206 PCC crashes MDS: mdt_big_xattr_get())... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Recent runs showed the assertion below:

LustreError: 25386:0:(mdt_handler.c:950:mdt_big_xattr_get()) ASSERTION( info->mti_big_lmm_used == 0 ) failed:
LustreError: 25386:0:(mdt_handler.c:950:mdt_big_xattr_get()) LBUG
Pid: 25386, comm: mdt00_003 3.10.0-1062.18.1.el7_lustre.x86_64 #1 SMP Wed May 27 23:19:17 UTC 2020
Call Trace:
[<ffffffffc08bf1dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[<ffffffffc08bf28c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[<ffffffffc11643a7>] mdt_big_xattr_get+0x667/0x830 [mdt]
[<ffffffffc11647a7>] mdt_stripe_get+0x237/0x410 [mdt]
[<ffffffffc118a92b>] mdt_reint_migrate+0x116b/0x16c0 [mdt]
[<ffffffffc118af03>] mdt_reint_rec+0x83/0x210 [mdt]
[<ffffffffc1163970>] mdt_reint_internal+0x710/0xae0 [mdt]
[<ffffffffc116ed47>] mdt_reint+0x67/0x140 [mdt]
[<ffffffffc0e3ac0a>] tgt_request_handle+0x96a/0x1640 [ptlrpc]
[<ffffffffc0ddc216>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
[<ffffffffc0de0744>] ptlrpc_main+0xbb4/0x1550 [ptlrpc]
[<ffffffff998c6321>] kthread+0xd1/0xe0
[<ffffffff99f8ed37>] ret_from_fork_nospec_end+0x0/0x39
[<ffffffffffffffff>] 0xffffffffffffffff

This is result of inappropriate usage of mti_big_lmm buffer in various places. Originally it was introduced to be used for getting big LOV/LMV EA and passing them to reply buffers. Meanwhile it is widely used now for internal server needs. These cases should be distinguished and if there is no intention to return EA in reply then flag {mti_big_lmm_used}} should not be set. Maybe it is worth to rename it as mti_big_lmm_keep to mark that is to be kept until reply is packed.



 Comments   
Comment by Sebastien Buisson [ 02/Jun/20 ]

+1 while running sanity-lfsck test_40a:
https://testing.whamcloud.com/test_sets/a41a98c7-e637-4c82-8320-3a8d1e646f04

Comment by Olaf Faaland [ 29/Jul/20 ]

Hit this migrating files on our test system, jet (server), with 
kernel-3.10.0-1127.13.1.1chaos.ch6.x86_64
zfs-0.7.11-9.4llnl.ch6.x86_64
lustre-2.12.5_2.chaos-1.ch6.x86_64

Comment by Mikhail Pershin [ 30/Jul/20 ]

Current fix of the problem for 2.12 is hare:
https://review.whamcloud.com/39521

It is not final solution but fixes wrong assertion

Comment by Olaf Faaland [ 15/Sep/20 ]

Mike,
Is the assertion fix likely all that is needed for Lustre 2.12, and the fixes to the usage of mti_big_lmm are only appropriate for master?
thanks

Comment by Mikhail Pershin [ 16/Sep/20 ]

Olaf, yes, all further changes are sort of improvements for master

Generated at Sat Feb 10 03:02:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.