[LU-7558] niobuf.c:721:ptl_send_rpc() LASSERT(AT_OFF || imp_state != LUSTRE_IMP_FULL || imp_msghdr_flags & MSGHDR_AT_SUPPORT ...) Created: 15/Dec/15  Updated: 08/Apr/19  Resolved: 16/Jan/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.9.0, Lustre 2.13.0, Lustre 2.10.7, Lustre 2.12.1

Type: Bug Priority: Critical
Reporter: Olaf Faaland Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: llnl
Environment:

BG/Q I/O nodes
lustre-client-ion-2.5.4-4chaos_2.6.32_504.8.2.bgq.3blueos.V1R2M3.bl2.2_1.ppc64.ppc64


Issue Links:
Duplicate
is duplicated by LU-9628 LBUG (niobuf.c:773:ptl_send_rpc()) AS... Resolved
Related
is related to LU-5528 Race - connect vs resend Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

/bgsys/logs/BGQ.sn/R04-ID-J00.log (among many others)

LustreError: 28558:0: (niobuf.c:721:ptl_send_rpc()) ASSERTION( (at_max == 0) || request->rq_import->imp_state != LUSTRE_IMP_FULL || (request->rq_import->imp_msghdr_flags & 0x1) || ! (request->rq_import->imp_connect_data.ocd_connect_flags & 0x1000000ULL) ) failed:
LustreError: 28558:0: (niobuf.c:721:ptl_send_rpc()) LBUG
Call Trace:
show_stack
libcfs_debug_dumpstack
lbug_with_loc
ptl_send_rpc
ptlrpc_send_new_req
ptlrpc_set_wait
ll_statfs_internal
ll_statfs
statfs_by_dentry
vfs_statfs
user_statfs
SyS_statfs
syscall_exit

Occurred on many tens of I/O nodes, then within the next 24 hours, occurred on many tens more. Continuing to occur.

We have not seen this issue before. The patch that introduced this assert was in the patch stack for our tag 2.5.4-1chaos, rolled out in April. We do not know what triggered this now.

c389652 LU-5528 ptlrpc: fix race between connect vs resend

There are no crash dumps for these nodes, nor much in the console logs.

Because several conditions were ASSERTed in a single statement, which failed is unknown.



 Comments   
Comment by Olaf Faaland [ 15/Dec/15 ]

Patch stack running on those nodes:

* ae35897 (tag: 2.5.4-4chaos) LU-2232 debug: print debug for prolonged lock
* 458c1f9 Revert "LU-2232 debug: lock handle in IO request may be stale"
* f69e69e (tag: 2.5.4-3chaos) LU-6389 llite: restart short read/write for normal IO
* 1e8f823 Revert "LU-6389 llite: restart short read/write for normal IO"
* 3028245 (tag: 2.5.4-2chaos) LU-6152 osd-zfs: ZFS large block compat
* fb6b94c Revert "LU-5053 ptlrpc: Add schedule point to ptlrpc_check_set()"
* 0dc7041 Fix ldiskfs source autodetect for CentOS 6
* 42edddb LU-6536 llapi: lmm_stripe_count used unswabbed
* 3d25e65 LU-2182 llapi: implementation of new llapi_layout API
* fe5c3dd LLNL-0000 llapi: get OST count from proc
* c1c8672 LU-4107 build: fix lustre_user.h to C++ compatible
* f139e17 LU-5042 ldlm: delay filling resource's LVB upon replay
* 35e5a78 (tag: 2.5.4-1chaos) LU-6389 llite: restart short read/write for normal IO
* a25d1d1 LU-1032 build: Honor --disable-modules option in spec file
* 01df569 LU-1032 build: Add Lustre DKMS spec file
* 1f47ba6 LU-6038 osd-zfs: sa_spill_alloc()/sa_spill_free() compat
* 16b79c6 LU-6038 osd-zfs: Avoid redefining KM_SLEEP
* 36eca32 LU-5326 libcfs: remove umode_t typedef
* 685c9ac LU-3353 ptlrpc: Suppress error message when imp_sec is freed
* 9ea4c83 LU-5984 obd: fix lastid file name in compat code
* c389652 LU-5528 ptlrpc: fix race between connect vs resend
* bba812b LU-5579 ldlm: re-sent enqueue vs lock destroy race
Comment by Olaf Faaland [ 15/Dec/15 ]

The console logs indicate a pair of OSS's (failover pair, in fact) had crashed and come back up. This error occured when they had timed out of recovery and evicted the clients. There's nothing else in the console log. The nodes are unresponsive after they hit this LBUG, so I'm not able to gather more information unless I can reproduce the situation while monitoring the node. Not clear right now how to do that without affecting the whole machine.

Seems wrong to ASSERT anything to do with connection state when not holding a lock on the import. Also not helpful to ASSERT several conditions collectively instead of individually so the bad state is clearer. So I'm looking into changing to code to return failure instead.

Comment by Olaf Faaland [ 15/Dec/15 ]

My statement about ASSERTing collectively is mistaken - OR not AND. But this still doesn't produce enough useful information, the patch will need to change that.

I don't understand why they chose to ASSERT here instead of fail. There is discussion of this ASSERT in code review from the patch, but I don't see a rationale for crashing the node.

Comment by Peter Jones [ 15/Dec/15 ]

Mike

Could you please advise?

Peter

Comment by Christopher Morrone [ 15/Dec/15 ]

This bug is taking out a large number of the I/O Nodes on Sequoia, our largest production supercomputer, so I bumped the severity to 1.

Comment by Peter Jones [ 15/Dec/15 ]

Chris

Just to acknowledge this - I am discussing with engineering

Peter

Comment by John Hammond [ 15/Dec/15 ]

Is 2.5.4-15chaos available somewhere? I do not see it on https://github.com/chaos/lustre.

Comment by Peter Jones [ 15/Dec/15 ]

John

I can direct you to it

Peter

Comment by Peter Jones [ 15/Dec/15 ]

Olaf

When did the last software update get applied to these nodes and which tag preceded it? Have any changes to the configuration been made in recent weeks?

Peter

Comment by Christopher Morrone [ 15/Dec/15 ]

The nodes are only running 2.5.4-4chaos, the newer tags haven't made it to the BG/Q systems.

Comment by Christopher Morrone [ 15/Dec/15 ]

No software changes have happened on these BG/Q clients in months. No recent change on the servers either.

Comment by Olaf Faaland [ 15/Dec/15 ]

Servers are at lustre-2.5.4-13chaos

Comment by Hongchao Zhang [ 16/Dec/15 ]

The possible case (but I wan't sure about it!) could cause this problem is the race between setting the state of the obd_import

in "ptlrpc_connect_import", the "imp_state" is set as "LUSTRE_IMP_CONNECTING", and if the check of the "imp_state"
in "ptlrpc_import_recovery_state_machine" has been passed before this setting in "ptlrpc_connect_import", the "imp_state"
could be set as "LUSTRE_IMP_FULL" by the delayed reply of last ping of recovery in "signal_completed_replay".

the race is,
thread 1: ptlrpc_import_recovery_state_machine
...
if (imp->imp_state == LUSTRE_IMP_RECOVER) {
struct ptlrpc_connection *conn = imp->imp_connection;

rc = ptlrpc_resend(imp);
if (rc)
GOTO(out, rc);

IMPORT_SET_STATE(imp, LUSTRE_IMP_FULL);
ptlrpc_activate_import(imp);
...
thread2: ptlrpc_connect_import is called during calling ptlrpc_resend, and the imp_state is set LUSTRE_IMP_CONNECTING and
MSGHDR_AT_SUPPORT is cleared.

thread1: the imp_state is set LUSTRE_IMP_FULL, and the LBUG is triggered.

Comment by Mikhail Pershin [ 16/Dec/15 ]

this assertion was added by patch for LU-5528 ( c389652 LU-5528 ptlrpc: fix race between connect vs resend ) so this situation is result of that commit. So either that patch is incomplete or assertion is not correct.

Comment by Olaf Faaland [ 16/Dec/15 ]

What is special about LUSTRE_IMP_FULL? Why would the assertion check for that state, but none of the other import states like _EVICTED or _CLOSED?

Also, there are many locations in the code within ptl_send_rpc() where something can cause the function to return an error. Do you know why the person who wrote the patch chose to ASSERT instead of return an error on import state?

thanks

Comment by Mikhail Pershin [ 16/Dec/15 ]

Olaf, it looks for me like assertion was added to prevent similar situations as LU-5528 fixed, but I agree that error handling with a message about details is better. Is it possible to revert this patch for now? We are going to check the correctness of that assertion and replace it with error handling instead.

Comment by Olaf Faaland [ 16/Dec/15 ]

Mikhail, yes, we could revert the patch. I need to talk it over with Chris and the BG team. It's trickier than it normally would be, because lots of people are out of the office next week and so it's difficult to handle anything new that comes up.

Should we add any debug code?

Comment by Mikhail Pershin [ 16/Dec/15 ]

Olaf, let me think a bit, I will provide patch with debug instead of assertion

Comment by Mikhail Pershin [ 16/Dec/15 ]

I agree with Hongchao, this assertion is result of race he mentioned, and that race itself was introduced by LU-5528 patch. It fixes the case when import flags are set after import state is set to LUSTRE_IMP_FULL but open new race when flags are set before LUSTRE_IMP_FULL. Strictly speaking, both import flags and its state should be changed together.

Comment by Mikhail Pershin [ 16/Dec/15 ]

Olaf, I think it is better to revert this commit, even if we will remove that assertion, the import will still have wrong flags causing all requests to behave wrongly, so this situation is not better than before that commit. Meanwhile I will think how to solve this properly, I don't see quick solution so far.

The possible solution I have in mind is to set new import bool value imp_connected which would indicate that import was reconnected (and imp flags are updated already) even if it is not in FULL state yet. So ptlrpc_connect_import() will not try to initiate new connect by checking this flag (now it checks import state) and will not clear flags for import which is connected but not FULL yet. I am not sure this is the only solution, it has to be discussed.

Comment by Olaf Faaland [ 16/Dec/15 ]

Mikhail, OK, we are reverting this patch for now.

Comment by Christopher Morrone [ 25/Mar/16 ]

This is still broken on master and b2_8. Since we hope to move to 2.8 in the near future, we would really like to see this fixed as soon as possible.

Comment by Gerrit Updater [ 04/Apr/16 ]

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/19312
Subject: LU-7558 import: don't reconnect during connect interpret
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 02f60229e4abb25e7da20ebaebe7e7bca18e99c3

Comment by Mikhail Pershin [ 04/Apr/16 ]

the patch implements approach I mentioned in comment above. It blocks new connect requests while current connect interpret is running, preventing the import connection flags re-setting.

Comment by Gerrit Updater [ 02/May/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19312/
Subject: LU-7558 import: don't reconnect during connect interpret
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d10320bafdb942a8dbc5a8ba9176873134a5ffa3

Comment by Joseph Gmitter (Inactive) [ 03/May/16 ]

Landed to master for 2.9.0

Comment by Alexey Lyashkov [ 20/Jul/16 ]

Mikhail,

i think it not a very good approach. It's easy to use a simple change like
if (imp->imp_state =! LUSTRE_IMP_DISCONN ) instead of add a new flag.

Comment by Oleg Drokin [ 06/Aug/18 ]

I am still hitting this regularly in my testing.

[16176.517655] LustreError: 2151:0:(client.c:1179:ptlrpc_import_delay_req()) @@@ invalidate in flight  req@ffff8802f4d5fc80 x1608074126206960/t0(0) o38->lustre-MDT0000-mdc-ffff8802acac9800@0@lo:12/10 lens 520/544 e 0 to 0 dl 0 ref 1 fl Rpc:N/0/ffffffff rc 0/-1
[16178.021121] LustreError: 2309:0:(niobuf.c:782:ptl_send_rpc()) ASSERTION( (at_max == 0) || imp->imp_state != LUSTRE_IMP_FULL || (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) || !(imp->imp_connect_data.ocd_connect_flags & 0x1000000ULL) ) failed: 
[16178.032291] LustreError: 2309:0:(niobuf.c:782:ptl_send_rpc()) LBUG
[16178.033617] CPU: 1 PID: 2309 Comm: rm Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.5-debug #1
[16178.036021] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[16178.037385] Call Trace:
[16178.038659]  [<ffffffff8176fc9a>] dump_stack+0x19/0x1b
[16178.042848]  [<ffffffffa01b37c2>] libcfs_call_trace+0x72/0x80 [libcfs]
[16178.044440]  [<ffffffffa01b384c>] lbug_with_loc+0x4c/0xb0 [libcfs]
[16178.046617]  [<ffffffffa05854e9>] ptl_send_rpc+0xb79/0xe80 [ptlrpc]
[16178.047953]  [<ffffffffa01b9f97>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[16178.049215]  [<ffffffffa0579d40>] ptlrpc_send_new_req+0x460/0xa70 [ptlrpc]
[16178.050257]  [<ffffffffa057e819>] ptlrpc_set_wait+0x289/0x790 [ptlrpc]
[16178.051188]  [<ffffffff810630a3>] ? kvm_clock_read+0x33/0x40
[16178.052007]  [<ffffffff810630b9>] ? kvm_clock_get_cycles+0x9/0x10
[16178.053405]  [<ffffffffa058abda>] ? lustre_msg_set_jobid+0x9a/0x110 [ptlrpc]
[16178.054486]  [<ffffffffa057ed9d>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
[16178.055636]  [<ffffffffa08a3ca7>] mdc_reint+0x57/0x160 [mdc]
[16178.056599]  [<ffffffffa08a4df6>] mdc_unlink+0x206/0x450 [mdc]
[16178.057700]  [<ffffffffa0500125>] lmv_unlink+0x5a5/0x950 [lmv]
[16178.059013]  [<ffffffffa146c764>] ? ll_i2gids+0x24/0xb0 [lustre]
[16178.060357]  [<ffffffffa1471b31>] ll_unlink+0x171/0x5e0 [lustre]
[16178.061621]  [<ffffffff8121d326>] vfs_unlink+0x106/0x190
[16178.062807]  [<ffffffff8121ff6e>] do_unlinkat+0x26e/0x2b0
[16178.063937]  [<ffffffff8178387b>] ? system_call_after_swapgs+0xc8/0x160
[16178.065792]  [<ffffffff8178386f>] ? system_call_after_swapgs+0xbc/0x160
[16178.068348]  [<ffffffff8178387b>] ? system_call_after_swapgs+0xc8/0x160
[16178.069707]  [<ffffffff8178386f>] ? system_call_after_swapgs+0xbc/0x160
[16178.070768]  [<ffffffff8178387b>] ? system_call_after_swapgs+0xc8/0x160
[16178.071896]  [<ffffffff8178386f>] ? system_call_after_swapgs+0xbc/0x160
[16178.075009]  [<ffffffff81220eab>] SyS_unlinkat+0x1b/0x40
[16178.076048]  [<ffffffff81783929>] system_call_fastpath+0x16/0x1b
[16178.077068]  [<ffffffff8178387b>] ? system_call_after_swapgs+0xc8/0x160

It fails in multiple different tests too so it's not any particular testcase

Comment by Gerrit Updater [ 26/Nov/18 ]

Andriy Skulysh (c17819@cray.com) uploaded a new patch: https://review.whamcloud.com/33718
Subject: LU-7558 ptlrpc: connect vs import invalidate race
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 994e02633ae373d454c20a9a6e5150d2d5312499

Comment by Gerrit Updater [ 16/Jan/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33718/
Subject: LU-7558 ptlrpc: connect vs import invalidate race
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b1827ff1da829ae5f320a417217757221eedda5f

Comment by Joseph Gmitter (Inactive) [ 16/Jan/19 ]

Landed for 2.13.0

Comment by Gerrit Updater [ 23/Feb/19 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34290
Subject: LU-7558 ptlrpc: connect vs import invalidate race
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 54f405709b1bef208d02468601a067a005dbb8e0

Comment by Gerrit Updater [ 25/Feb/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34293
Subject: LU-7558 ptlrpc: connect vs import invalidate race
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 9e4e28e244d0c2579ac5dc5f828be6dd77d708f6

Comment by Gerrit Updater [ 02/Mar/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34290/
Subject: LU-7558 ptlrpc: connect vs import invalidate race
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: d5a51f0b718ecf6fca81e15c396e56141b62df6c

Comment by Gerrit Updater [ 08/Apr/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34293/
Subject: LU-7558 ptlrpc: connect vs import invalidate race
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 69d1e9805172c3e8da59ad99f470831951253695

Generated at Sat Feb 10 02:09:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.