[LU-6071] LBUG during mkdir -i 1 from 2.6.0 client to 2.5.3 server Created: 26/Dec/14  Updated: 04/Dec/19  Resolved: 22/Jan/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.5.3
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Critical
Reporter: Artem Blagodarenko (Inactive) Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: HB, mq115, patch
Environment:

Patchless client from branch b2_6
07-19-2014 Intel Corporation

  • version 2.6.0

Server from b2_5 (e835226b17309ec21fd7b46cf397e5fd557049bd)
09-03-2014 Intel Corporation

  • version 2.5.3

Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 16896

 Description   
 kernel:LustreError: 3652:0:(mdt_handler.c:2706:mdt_object_lock0()) ASSERTION( !(ibits & (MDS_INODELOCK_UPDATE | MDS_INODELOCK_PERM)) ) failed: lustre-MDT0001: wrong bit 0x2 for remote obj [0x200000007:0x1:0x0]
 kernel:LustreError: 3652:0:(mdt_handler.c:2706:mdt_object_lock0()) LBUG

To reproduce the issue two-node cluster is enough.

On server:

#MDSCOUNT=3 lustre/tests/llmount.sh
#lctl set_param mdt.*.enable_remote_dir=1
#lctl set_param mdt.*.enable_remote_dir_gid=1122

Client

#mount -t lustre 172.16.157.136@tcp0:/lustre /mnt/lustre
#lustre/utils/lfs mkdir -i 1 /mnt/lustre/mdt1

Server is failed with LBUG above.



 Comments   
Comment by Gerrit Updater [ 26/Dec/14 ]

Artem Blagodarenko (artem_blagodarenko@xyratex.com) uploaded a new patch: http://review.whamcloud.com/13189
Subject: LU-6071 client: Fix mkdir -i 1 from DNE2 client to DNE1 server
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 590ee871adccda9ac52d02d888b2c27916be22cb

Comment by Gerrit Updater [ 26/Dec/14 ]

Artem Blagodarenko (artem_blagodarenko@xyratex.com) uploaded a new patch: http://review.whamcloud.com/13190
Subject: LU-6071 client: Fix mkdir -i 1 from DNE2 client to DNE1 server
Project: fs/lustre-release
Branch: b2_6
Current Patch Set: 1
Commit: cc680fa5575c8ccc110e8293d1e5bff3cd92007d

Comment by Andreas Dilger [ 28/Dec/14 ]

Artem, thanks for the patch. We're not planning to land any patches for b2_6, since that is not a maintenance branch. Is this patch needed for b2_5 or master? Is the MDS LASSERT() removed on master or can it still be triggered by a bad client? Could you please also submit a patch for b2_5 to make the LASSERT() just return an error in this case (-EREMOTE? or even better pass the RPC on to the proper MDT).

Comment by Artem Blagodarenko (Inactive) [ 29/Dec/14 ]

Artem, thanks for the patch. We're not planning to land any patches for b2_6, since that is not a maintenance branch. Is this patch needed for b2_5 or master?

Andreas, all clients >= b2_6 should be patched. So, I believe I need to rebase this patch to master branch and upload again. Should I?

Is the MDS LASSERT() removed on master or can it still be triggered by a bad client?

MDS LASSERT is removed on master. MDS from master can fine accept old and new clients.

Could you please also submit a patch for b2_5 to make the LASSERT() just return an error in this case (-EREMOTE? or even better pass the RPC on to the proper MDT).

So, we need another one patch for b2_5 server? Not a problem I will make it. I believe adding -EREMOTE message is better. Quite the same error returns server from master then it gets request to wrong MDT (see CLIENT-MDT compatibility chapter in http://wiki.opensfs.org/images/f/ff/DNE_StripedDirectories_HighLevelDesign.pdf).

In DNE Phase II rename request will be sent to the MDT where the target file is located. This is different from DNE Phase I. An old client (<= Lustre software version 2.4.0) will still send the request to the MDT where the source parent is, and the source parent will return -EREMOTE to the old client. A 2.4.0 client does not understand -EREMOTE so a patch will be added to 2.4 series to redirect rename request to the MDT where the target file is, if it gets -EREMOTE from the MDT.

Comment by Gerrit Updater [ 13/Jan/15 ]

Artem Blagodarenko (artem_blagodarenko@xyratex.com) uploaded a new patch: http://review.whamcloud.com/13361
Subject: LU-6071 mdt: Fix mkdir -i 1 from DNE2 client to DNE1 server
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: 7b8cffa906b6ae0de5c16a92b315ea973cdd920a

Comment by Artem Blagodarenko (Inactive) [ 13/Jan/15 ]

I uploaded patch for 2.5 server. Now 2.6 client returns if try lfs mkdir -i 1 to 2.5 server

# lustre/utils/lfs mkdir -i 1 /mnt/lustre/mdt1
error on LL_IOC_LMV_SETSTRIPE '/mnt/lustre/mdt1' (3): Object is remote
error: mkdir: create stripe dir '/mnt/lustre/mdt1' failed
Comment by Gerrit Updater [ 22/Jan/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13189/
Subject: LU-6071 client: Fix mkdir -i 1 from DNE2 client to DNE1 server
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 426f9c0365d1e8528b07b89f1d7c0a7f2f80e3a5

Comment by Peter Jones [ 22/Jan/15 ]

Landed for 2.7

Comment by Artem Blagodarenko (Inactive) [ 23/Jan/15 ]

There is also server-side patch patch: http://review.whamcloud.com/13361 for 2_5 branch that need to be inspected (only one +1)

Comment by Gerrit Updater [ 08/Apr/15 ]

Artem Blagodarenko (artem_blagodarenko@xyratex.com) uploaded a new patch: http://review.whamcloud.com/14401
Subject: LU-6071 client: Fix mkdir -i 1 from DNE2 client to DNE1 server
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: 8c6cd3b49a5cceb4045825651a6a0543ab893b4a

Comment by Artem Blagodarenko (Inactive) [ 06/May/15 ]

http://review.whamcloud.com/14401 is abandoned, but http://review.whamcloud.com/13361 inspection is still in progress.

Comment by Artem Blagodarenko (Inactive) [ 24/Aug/16 ]

>http://review.whamcloud.com/14401 is abandoned, but http://review.whamcloud.com/13361 inspection is still in progress.
I see this issue is closed. Sorry, are we going to land this sever patch to b2_5? Thanks.

Generated at Sat Feb 10 01:56:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.