[LU-617] LBUG: (mdt_recovery.c:787:mdt_last_rcvd_update()) ASSERTION(req_is_replay(req)) failed Created: 22/Aug/11 Updated: 29/Jan/16 Resolved: 04/Jan/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | Lustre 2.2.0, Lustre 2.1.2 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jian Yu | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Tag: v2_1_0_0_RC0 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 4790 | ||||||||
| Description |
|
While running racer test, the MDS node (fat-amd-3-ib) hit LBUG as follows: LustreError: 10364:0:(mdt_recovery.c:787:mdt_last_rcvd_update()) ASSERTION(req_is_replay(req)) failed LustreError: 10364:0:(mdt_recovery.c:787:mdt_last_rcvd_update()) LBUG Pid: 10364, comm: mdt_11 Call Trace: [<ffffffffa03ad855>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa03ade95>] lbug_with_loc+0x75/0xe0 [libcfs] [<ffffffffa03b8d96>] libcfs_assertion_failed+0x66/0x70 [libcfs] [<ffffffffa09ca90d>] mdt_txn_stop_cb+0xbbd/0xbd0 [mdt] [<ffffffffa04bf2a2>] dt_txn_hook_stop+0x62/0x80 [obdclass] [<ffffffffa09598d8>] osd_trans_stop+0x98/0x290 [osd_ldiskfs] [<ffffffffa0958470>] ? osd_object_write_unlock+0x60/0xd0 [osd_ldiskfs] [<ffffffffa0922e3a>] mdd_trans_stop+0x1a/0x20 [mdd] [<ffffffffa0919cbe>] mdd_rename+0x31e/0x21b0 [mdd] [<ffffffffa03bcbef>] ? cfs_hash_bd_from_key+0x3f/0xc0 [libcfs] [<ffffffffa0986239>] ? cmm_mode_get+0x109/0x320 [cmm] [<ffffffffa0986d5a>] cml_rename+0x33a/0xbb0 [cmm] [<ffffffffa03bd007>] ? cfs_hash_bd_get+0x37/0x90 [libcfs] [<ffffffffa09864bd>] ? cmm_is_subdir+0x6d/0x2f0 [cmm] [<ffffffffa04bcef2>] ? lu_object_put+0x92/0x210 [obdclass] [<ffffffffa09c5196>] mdt_reint_rename+0x1f96/0x23e0 [mdt] [<ffffffffa03c394b>] ? upcall_cache_get_entry+0x28b/0xa14 [libcfs] [<ffffffffa09bd74f>] ? mdt_rename_unpack+0x44f/0x6a0 [mdt] [<ffffffffa091de56>] ? md_ucred+0x26/0x60 [mdd] [<ffffffffa09bd9df>] mdt_reint_rec+0x3f/0x100 [mdt] [<ffffffffa05b1984>] ? lustre_msg_get_flags+0x34/0xa0 [ptlrpc] [<ffffffffa09b5e84>] mdt_reint_internal+0x6d4/0x9f0 [mdt] [<ffffffffa09ab946>] ? mdt_reint_opcode+0x96/0x160 [mdt] [<ffffffffa09b61ec>] mdt_reint+0x4c/0x120 [mdt] [<ffffffffa05b1458>] ? lustre_msg_check_version+0xc8/0xe0 [ptlrpc] [<ffffffffa09a9bb5>] mdt_handle_common+0x8d5/0x1810 [mdt] [<ffffffffa05af104>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc] [<ffffffffa09aabc5>] mdt_regular_handle+0x15/0x20 [mdt] [<ffffffffa05bfc7e>] ptlrpc_main+0xb8e/0x1900 [ptlrpc] [<ffffffffa05bf0f0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc] [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffffa05bf0f0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc] [<ffffffffa05bf0f0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc] [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Kernel panic - not syncing: LBUG Maloo report: https://maloo.whamcloud.com/test_sets/10c0e904-ccc9-11e0-8d02-52540025f9af |
| Comments |
| Comment by Peter Jones [ 22/Aug/11 ] |
|
Mike Oleg thought that this ticket might be one that you might be interested in Peter |
| Comment by Jian Yu [ 23/Aug/11 ] |
|
Lustre Tag: v2_1_0_0_RC0 The same failure occurred: https://maloo.whamcloud.com/test_sets/b4b80e1a-cccf-11e0-8d02-52540025f9af |
| Comment by Peter Jones [ 23/Aug/11 ] |
|
Niu Could you please look into this one as your top priority? Thanks Peter |
| Comment by Mikhail Pershin [ 23/Aug/11 ] |
|
Peter, I'd like to track this because I wonder how that can be at all. I'll try to help Nui with this bug |
| Comment by Niu Yawei (Inactive) [ 24/Aug/11 ] |
|
After looking into the code, I found something interesting: There are two pairs of last_xid/last_transno in the last_rcvd file, one for close and the other for other reint operations, which means we can have only one reint rpc and one close rpc inflight, the cl_rpc_lock and cl_close_lock on client guarantee this. However, I found that the setattr with ATTR_FROM_OPEN breaks above rule, it can be parallel with other reint operations (it's serialized by another cl_setattr_lock), however, it uses same last_xid/last_transno slot on server with other reint operations. I think that'll break the resend and recovery badly. (of cousrse, it could trigger this assert as well) Hi, Mike |
| Comment by Alexey Lyashkov [ 24/Aug/11 ] |
|
Niu, I able to replicate that bug without any recovery. |
| Comment by Niu Yawei (Inactive) [ 24/Aug/11 ] |
|
Hi, Alex Right, since some setattr can be parallel with other reint operations, there will be race in mdt_txn_stop_cb(). setattr update it's mti_transno with 'lut_last_transno + 1', another reint operation update it's mti_transno with 'lut_last_transno + 2', the reint operation calls mdt_last_rcvd_update() first to set the lcd_last_transno as 'lut_last_transno + 2', then setattr calls mdt_last_rcvd_update() to trigger the assert. Same race apply to the last_xid, the last_xid on server sometimes will not be the last sent request from client, so resend is broken too. |
| Comment by Mikhail Pershin [ 24/Aug/11 ] |
|
Niu, note that assert occurs for non-recovery request, that means the logic issue on server - it assign transaction to the rename operation (in logs) but when it is being written the larger transaction number is ondisk. Maybe you are right about parallel setattr, which happens in parallel, transaction increment is done before last_rcvd update, so there can be a race. Can you reproduce bug locally by running racer and check that? It will be enough to use just cl_close_lock for setattr for testing purposes. |
| Comment by Alexey Lyashkov [ 24/Aug/11 ] |
|
Mike, Nui if you wait for hour i will upload full debug logs from that issue. it's easy replicate for me with racer. |
| Comment by Niu Yawei (Inactive) [ 24/Aug/11 ] |
|
Hi, Alex Did you reproduce it with rhel5 client or rhel6 client? I noticed that ATTR_OPEN is missed in rhel5 kernel, so if the root cause is the the race I mentioned in above comment, it should only can be reproduced on rhel6 client. |
| Comment by Alexey Lyashkov [ 24/Aug/11 ] |
|
Ni, I able to reproduce with RHEL6 client/server. that is trace for affected rpc. 00000100:00000040:12.0:1314164305.382194:0:9730:0:(lustre_net.h:1649:ptlrpc_rqphase_move()) @@@ move req "New" and one note - req_is_replay() is static inline int req_is_replay(struct ptlrpc_request *req) so it's need to be client marked that request as replay via but it's done just in ptlrpc_retain_replayable_request :/ so i think it's something like memory corruption or forget to clear area before use. |
| Comment by Alexey Lyashkov [ 24/Aug/11 ] |
|
btw. my backtrace is lots different when reported. and looks mdt server process only create requests in that time. |
| Comment by Niu Yawei (Inactive) [ 24/Aug/11 ] |
|
Thanks for your help, Alex. Could you try to apply this patch to your client to see if you can still reproduce it? |
| Comment by Alexey Lyashkov [ 24/Aug/11 ] |
|
btw. if it's help we can join in Skype chat when get any info from kernel crash dump. |
| Comment by Alexey Lyashkov [ 24/Aug/11 ] |
|
not sure, that patch can help. so crash> struct ptlrpc_request ffff8802b006d000 crash> struct lustre_msg_v2 0xffff8802db93a000 and crash> p/x 0xffff8802db93a000+6*sizeof(__u32)+sizeof(struct lustre_msg_v2) , pb_padding = {0, 0, 0, 0} } so pb_flags == 0, but should be MSG_REPLAY (0x0004) have set. In that case i think better to verify is assert has correct pointer to ptlrpc request. |
| Comment by Mikhail Pershin [ 24/Aug/11 ] |
|
Alexey, it expects replay: ASSERTION(req_is_replay(req)), so triggered assertion shows that is NOT replay. Therefore that can be right what Niu found about ATTR_FROM_OPEN, this assertion is not about replay, it happens for normal requests when they change last_transno improperly. |
| Comment by Alexey Lyashkov [ 24/Aug/11 ] |
|
Ah, i forget about inverse in LASSERT logic. |
| Comment by Mikhail Pershin [ 24/Aug/11 ] |
|
That's very good news, thanks for testing! I am not sure patch is fully correct but at least we know the reason and can think about this better |
| Comment by Alexey Lyashkov [ 24/Aug/11 ] |
|
Always welcome |
| Comment by Niu Yawei (Inactive) [ 24/Aug/11 ] |
|
Actually, the ATTR_FROM_OPEN logic looks wrong to me. On the server side, we don't acquire DLM lock for the setattr from open /* In the raw-setattr case, we lock the child inode. * In the write-back case or if being called from open, * the client holds a lock already. * We use the ATTR_FROM_OPEN (translated into MRF_SETATTR_LOCKED by * mdt_setattr_unpack()) flag to tell these cases apart. */ Howerver, the truth is that the client deosn't hold any lock at all, which means sometimes we could probably set attr to a file without DLM lock! I think this part of ancient code was introduced to resolve some kind of deadlock/livelock problem, and with code changes, the deadlock/livelock problem doesn't exist anymore, but this part of code was not removed accordingly. According to the current Lustre implementation, we don't gurantee the atomicity of open(O_TRUNC) (no DLM lock is holding through open+trunc), the open(O_TRUNC) can be regard as an open followed by a immediate truncate, so it looks safe to remove the ATTR_FROM_OPEN and SETATTR portal stuff. There might be some slight impact on open cache when removing the ATTR_FROM_OPEN stuff, the truncate from open(O_TRUNC) could cause it's open lock being revoked, this looks not so efficient, but I think it's necessary for correctness. |
| Comment by Oleg Drokin [ 25/Aug/11 ] |
|
Ok, I tracked this change to commit 040033cef24c5aca2967daf2da7a862abcd074cf From Phil on March 2nd, 2003. It seems two things happened, a new setattr portal was added and at the same time new lock added, as if we can safely send multiple RPCs to different portals back then (we certainly cannot now). Anyway the patch seems to be totally correct. Can you please get rid of all cl_setattr_lock references and submit a patch like this to gerrit? |
| Comment by Oleg Drokin [ 25/Aug/11 ] |
|
Ah, on the other stuff you asked about, yes getting rid of attr_from_open and setattr portal seems to be possible. On the other hand open with O_TRUNC is probably not that a great target for openlock anyway. |
| Comment by Niu Yawei (Inactive) [ 25/Aug/11 ] |
|
Thank you, Oleg. patch for master: http://review.whamcloud.com/#change,1295 Will backport it to b1_8 when it's passed review. |
| Comment by James A Simmons [ 26/Aug/11 ] |
|
I tried this patch and still got errors. I posted the file mds-kern-log to help give insite to what going wrong for runracer. |
| Comment by Niu Yawei (Inactive) [ 26/Aug/11 ] |
|
Hi, James The error you got is different from this bug. Did you use rhel5 client or rhel6 client? Can you reproduce it without the patch? Thanks. |
| Comment by James A Simmons [ 26/Aug/11 ] |
|
It was with rhl5 clients. Yes I can run it without the patch and open a new bug. |
| Comment by Build Master (Inactive) [ 31/Aug/11 ] |
|
Integrated in Oleg Drokin : 71fc2a4d9d37c488b64ef882bc83e5096171c703
|
| Comment by Build Master (Inactive) [ 31/Aug/11 ] |
|
Integrated in Oleg Drokin : 71fc2a4d9d37c488b64ef882bc83e5096171c703
|
| Comment by Build Master (Inactive) [ 31/Aug/11 ] |
|
Integrated in Oleg Drokin : 71fc2a4d9d37c488b64ef882bc83e5096171c703
|
| Comment by Build Master (Inactive) [ 31/Aug/11 ] |
|
Integrated in Oleg Drokin : 71fc2a4d9d37c488b64ef882bc83e5096171c703
|
| Comment by Build Master (Inactive) [ 31/Aug/11 ] |
|
Integrated in Oleg Drokin : 71fc2a4d9d37c488b64ef882bc83e5096171c703
|
| Comment by Build Master (Inactive) [ 31/Aug/11 ] |
|
Integrated in Oleg Drokin : 71fc2a4d9d37c488b64ef882bc83e5096171c703
|
| Comment by Build Master (Inactive) [ 31/Aug/11 ] |
|
Integrated in Oleg Drokin : 71fc2a4d9d37c488b64ef882bc83e5096171c703
|
| Comment by Build Master (Inactive) [ 31/Aug/11 ] |
|
Integrated in Oleg Drokin : 71fc2a4d9d37c488b64ef882bc83e5096171c703
|
| Comment by Build Master (Inactive) [ 31/Aug/11 ] |
|
Integrated in Oleg Drokin : 71fc2a4d9d37c488b64ef882bc83e5096171c703
|
| Comment by Build Master (Inactive) [ 31/Aug/11 ] |
|
Integrated in Oleg Drokin : 71fc2a4d9d37c488b64ef882bc83e5096171c703
|
| Comment by Build Master (Inactive) [ 31/Aug/11 ] |
|
Integrated in Oleg Drokin : 71fc2a4d9d37c488b64ef882bc83e5096171c703
|
| Comment by Build Master (Inactive) [ 31/Aug/11 ] |
|
Integrated in Oleg Drokin : 71fc2a4d9d37c488b64ef882bc83e5096171c703
|
| Comment by Build Master (Inactive) [ 31/Aug/11 ] |
|
Integrated in Oleg Drokin : 71fc2a4d9d37c488b64ef882bc83e5096171c703
|
| Comment by Build Master (Inactive) [ 31/Aug/11 ] |
|
Integrated in Oleg Drokin : 71fc2a4d9d37c488b64ef882bc83e5096171c703
|
| Comment by Peter Jones [ 01/Sep/11 ] |
|
Dropping in priority as workaround landed for 2.1 but keeping open for now as alternative fix to land for 2.2 |
| Comment by Niu Yawei (Inactive) [ 05/Nov/11 ] |
|
fix for 2.2: http://review.whamcloud.com/1654 |
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Peter Jones [ 04/Jan/12 ] |
|
Landed for 2.2 |
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|