[LU-4705] LustreError: 89827:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2 Created: 04/Mar/14  Updated: 26/Oct/17  Resolved: 24/Oct/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.1
Fix Version/s: Lustre 2.11.0, Lustre 2.10.2

Type: Bug Priority: Minor
Reporter: Brett Lee (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None
Environment:

Running tip of Lustre b2_5, 1 MGS, 1 MDS, 2 OSS, 12 clients.


Issue Links:
Related
is related to LU-4973 MDD does not check nlink maximum limi... Resolved
is related to LU-4522 ldlm_cli_enqueue and ll_inode_revali... Closed
Severity: 3
Rank (Obsolete): 12942

 Description   

Unexpected MDC LustreError's on most clients.

Client 10:
Mar 4 03:27:11 lustre10 kernel: LustreError: 183913:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

Client 11:
Mar 4 00:37:25 lustre11 kernel: LustreError: 89827:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

Client 12:
Mar 4 00:39:36 lustre12 kernel: LustreError: 11-0: cal-MDT0000-mdc-ffff8807b75c4000: Communicating with 192.168.20.1@tcp1, operation ldlm_enqueue failed with -116.
Mar 4 00:39:36 lustre12 kernel: LustreError: 70225:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -116
Mar 4 00:39:36 lustre12 kernel: LustreError: 70225:0:(vvp_io.c:1227:vvp_io_init()) cal: refresh file layout [0x200001c0b:0x176e:0x0] error -116.
Mar 4 03:09:33 lustre12 kernel: LustreError: 70225:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

Client 13:
Mar 4 00:29:54 lustre13 kernel: LustreError: 167294:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

Client 14:
Mar 4 01:18:04 lustre14 kernel: LustreError: 11-0: cal-MDT0000-mdc-ffff880787af8400: Communicating with 192.168.20.1@tcp1, operation ldlm_enqueue failed with -116.
Mar 4 01:18:04 lustre14 kernel: LustreError: 11503:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -116
Mar 4 01:18:04 lustre14 kernel: LustreError: 11503:0:(vvp_io.c:1227:vvp_io_init()) cal: refresh file layout [0x200001c12:0xbbe2:0x0] error -116.

Client 16:
Mar 4 01:00:46 lustre16 kernel: LustreError: 141605:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

Client 17:
Mar 4 00:13:39 lustre17 kernel: LustreError: 11-0: cal-MDT0000-mdc-ffff8808038aa000: Communicating with 192.168.20.1@tcp1, operation ldlm_enqueue failed with -116.
Mar 4 00:13:39 lustre17 kernel: LustreError: 126770:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -116
Mar 4 00:13:39 lustre17 kernel: LustreError: 126770:0:(vvp_io.c:1227:vvp_io_init()) cal: refresh file layout [0x200001beb:0x1aedf:0x0] error -116.
Mar 4 02:02:43 lustre17 kernel: LustreError: 126770:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2

Client 18:
Mar 1 05:34:03 lustre18 kernel: LustreError: 146331:0:(mdc_locks.c:916:mdc_enqueue()) ldlm_cli_enqueue: -2



 Comments   
Comment by Keith Mannthey (Inactive) [ 10/Mar/14 ]

I see these same errors with a Lustre 2.5.0 Client. The do not seem to impact the usability of the filesystem. But this is listed as a Error so there could be something happening.

Comment by Andreas Dilger [ 10/Mar/14 ]

Is the filesystem re-exported via NFS, or possibly have concurrent threads that are accessing and unlinking files?

These messages mean that the client was looking up some file, but it was deleted by the time it tried to access it.

-116 = -ESTALE, -2 = -ENOENT.

The errors are not really fatal, and could probably be quieted from the console.

Comment by Keith Mannthey (Inactive) [ 10/Mar/14 ]

I have seen this error with IOR no NFS. I am not sure if the errors were generated during one single file or file per process.

Comment by Brett Lee (Inactive) [ 11/Mar/14 ]

No, there was no re-exporting, but each Lustre client did have four (4) mounts of the file system - each mount appearing active via the stats files in /proc.

Comment by Andreas Dilger [ 13/Mar/14 ]

Brett, what was the workload being run here? Something that is creating and deleting files concurrently (e.g. racer), or possibly multiple threads doing "rm -r" on the same tree? Either this is "normal" and maybe we should quiet the error messages, or it might imply some sort of bug on the MDS with inode lookup or files unexpectedly being deleted. Are there application-visible errors that are unexpected ("No such file or directory")?

Comment by Brett Lee (Inactive) [ 27/Mar/14 ]

Andreas, the workload was a mix of real jobs with varying IO patterns - most prominent of which was many small reads from large files. There was no artificial creating/deleting of files. As for the application, am now noticing that a setting disabled printing of "some" error an warning messages during this run, however, each job completed successfully. No unexpected application-visible errors were seen.

Comment by Mike O'Connor [ 30/Jan/16 ]

This is being seen at Gulfstream. In their environment, there doesn't appear to be any operational consequence to it. But, it scared them. It'd be nice if we could mute these errors, as discussed in https://jira.hpdd.intel.com/browse/LU-4705?focusedCommentId=79255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-79255

Comment by Kurt J. Strosahl (Inactive) [ 18/Feb/16 ]

I just saw an instance of this error in the Lustre file system at TJNAF. It is the only instance I can recall of it being seen here, we are running lustre 2.5.3 pristine

To expand a bit more... I have a test environment that I'm using to benchmark oss systems. Presently I have three osts on a single server running lustre 2.5.3. I've mounted it on a single client and am running IOR tests with the following parameters:

mpirun -np 12 -bynode -machinefile ./nodelist ./ior -F -e -m -g -i 10 -t 1024k -b 42G -o /testL/benchmark/test

where nodelist contains a single node.

Comment by Gerrit Updater [ 13/Sep/17 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/28978
Subject: LU-4705 mdc: improve mdc_enqueue() error message
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9d8f53da6ac5482262c188ba1e0ca3fb395aedfd

Comment by Gerrit Updater [ 24/Oct/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28978/
Subject: LU-4705 mdc: improve mdc_enqueue() error message
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 471c5303eb29d5ea1ba5a683173bda63095dae78

Comment by Peter Jones [ 24/Oct/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 24/Oct/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29736
Subject: LU-4705 mdc: improve mdc_enqueue() error message
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: b51accd1a652406afbe41ad764d116d0f361a0fb

Comment by Gerrit Updater [ 26/Oct/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/29736/
Subject: LU-4705 mdc: improve mdc_enqueue() error message
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: c27470755cf40ee33056011883a0d0600ce00340

Generated at Sat Feb 10 01:45:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.