[LU-6582] Wireshark fails to parse LDLM_ENQUEUE RPC - likely for layout lock Created: 07/May/15  Updated: 03/Jun/17  Resolved: 03/Jun/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: Andrew Uselton (Inactive) Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: wireshark

Issue Links:
Related
is related to LU-6671 Wireshark: LDLM_ENQUEUE reply with un... Resolved
is related to LU-6353 Push Wireshark Support Upstream Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

It looks to me like there is a least one bug in the Wireshark parsing. What I see is an LDLM_ENQUEUE that I think is wanting to do RQF_LDLM_INTENT_GETXATTR in order to establish a layout lock on the MDT. Some of the structures/fields appear to be missing in the parsed output,
the LDLM_ENQUEUE request appears in the output with ³LustreBUG² in place
of ³LUSTRE², and the LDLM_ENQUEUE reply says ³malformed packet².

A simple experiment that will replicate this behavior is to run the shell command 'touch -c file'. I do it from a client that has not previously visited the file or directory, so I know that nothing is cached. I also have the file precreated from a separate client, so that there really is a file to 'touch'. The '-c' suppresses any file create/file open that otherwise might be attempted, so you just get the simple 'setattr' behaviour. After the resulting MDS_REINT:REINT_SETATTR RPCs are exchanged, and before the subsequent MDS_GETXATTR:trusted.lov that get the actual layout, there is an LDLM_ENQUEUE exchanged that exhibits the problem. It's a good guess that it is trying to get a layout lock. The actual activities appear to be working fine, it is just the Wireshark parsing that appears to be a problem.



 Comments   
Comment by Nathaniel Clark [ 07/May/15 ]

Reproduction:
On client 1

echo > /lustre/path/to/file1

Wireshark dump of following transaction:

On client 2

touch -c /lustre/path/to/file2
Comment by Andrew Uselton (Inactive) [ 07/May/15 ]

I would make to modifications:

client1:
dd if=/dev/zero of=/lustre/path/to/touchTest bs=1024 count=1024

That way there is definitely stuff out on the OSTs and you will need to get the layout on client2.

cleint2:
touch -c /lustre/path/to/touchTest

You want to go after that same file. The path lookup may have the MDS ask for locks back from client1.
Once that is done I see these RPCs:

11:18:41.986718 MDS <- Clien2 Lustre 650 MDS_REINT request [REINT_SETATTR][MINMODE]
11:18:41.986882 MDS -> Clien2 Lustre 586 MDS_REINT reply
11:18:41.987461 MDS <- Clien2 LustreBUG 538 LDLM_ENQUEUE request [Concurrent Read][ intent : ]
11:18:41.987597 MDS -> Clien2 Lustre 530 LDLM_ENQUEUE reply [Concurrent Read][Malformed Packet]
11:18:41.988215 MDS <- Clien2 Lustre 626 MDS_GETXATTR request filename : trusted.lov
11:18:41.988315 MDS -> Clien2 Lustre 634 MDS_GETXATTR reply
11:18:41.989389 OSS <- Clien2 Lustre 602 OST_SETATTR request
11:18:41.989698 OSS -> Clien2 Lustre 562 OST_SETATTR reply

It is the LDLM_ENQUE that has the problem. The request has LustreBUG and the reply has [Malformed Packet]. The detailed breakouts of those two messages would help me with my protocol doc.
-A

Comment by Gerrit Updater [ 08/May/15 ]

Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: http://review.whamcloud.com/14732
Subject: LU-6582 wireshark: Add new LDLM intent bits
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1e492ba077a2e8d6ead417928562737488cd9f4e

Comment by Gerrit Updater [ 03/Jun/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/14732/
Subject: LU-6582 wireshark: Add new LDLM intent bits
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 05e3d5243425c02796d83dfd8d1487e73e2c1f1f

Comment by Peter Jones [ 03/Jun/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:01:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.