Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.4.1
-
None
-
SL6.4, 2.4.1 servers and clients with some patches, which have landed to b2_4 after 2.4.1 freeze.
-
3
-
10838
Description
One of our user noticed a strange problem during metadata operations, it looks like a memory allocation issue:
[root@XXX ~]# ls -l /mnt/lustre/scratch/people/YYYY/SPE.SPIN/050524/28/temp.a438
ls: cannot access /mnt/lustre/scratch/people/YYYY/SPE.SPIN/050524/28/temp.a438: Cannot allocate memory
The client log says:
Oct 1 16:20:11 zeus kernel: LustreError: 11-0: scratch-OST0013-osc-ffff8804925f1400: Communicating with 172.16.126.4@tcp, operation ldlm_enqueue failed with -12.
Oct 1 16:20:11 zeus kernel: LustreError: 23207:0:(cl_lock.c:1420:cl_unuse_try()) result = -12, this is unlikely!
OSS log has:
Oct 1 16:20:11 scratch02 kernel: LustreError: 4630:0:(ldlm_resource.c:1165:ldlm_resource_get()) scratch-OST0013: lvbo_init failed for resource 0x40d9dcf:0x0: rc = -2
Of course both servers and cients still have plenty of memory available. I've tried to look at similar issues in Jira, however I wasn't able to find a ticket with 1:1 relation to our issue.