[LU-3702] Failure on test suite parallel-scale test_iorssf: client out of memory Created: 05/Aug/13  Updated: 13/Feb/14  Resolved: 13/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Keith Mannthey (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Environment:

server and client : lustre-master build #1592
client is runnig SLES11 SP2


Issue Links:
Duplicate
duplicates LU-4357 page allocation failure. mode:0x40 ca... Resolved
Severity: 3
Rank (Obsolete): 9551

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/16f0f346-fd6e-11e2-9fdb-52540035b04c.

The sub-test test_iorssf failed with the following error:

ior failed! 1

test log

ERROR in aiori-POSIX.c (line 362): cannot get status of written file.
ERROR: Cannot allocate memory

client dmesg:

[71266.413667] LNetError: 29131:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: out of memory at /var/lib/jenkins/workspace/lustre-master/arch/x86_64/build_type/client/distro/sles11sp2/ib_stack/inkernel/BUILD/BUILD/lustre-2.4.53/lnet/include/lnet/lib-lnet.h:457 (tried to alloc '(md)' = 4208)
[71266.413673] LNetError: 29131:0:(lib-lnet.h:457:lnet_md_alloc()) Skipped 4 previous similar messages
[71266.413676] LNetError: 29131:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: 10747055 total bytes allocated by lnet
[71266.413679] LNetError: 29131:0:(lib-lnet.h:457:lnet_md_alloc()) Skipped 4 previous similar messages
[71266.413684] LustreError: 29131:0:(niobuf.c:376:ptlrpc_register_bulk()) lustre-OST0003-osc-ffff880065e08000: LNetMDAttach failed x1442416780175324/0: rc = -12
[71266.413687] LustreError: 29131:0:(niobuf.c:376:ptlrpc_register_bulk()) Skipped 4 previous similar messages
[71266.466449] The following is only an harmless informational message.
[71266.466452] Unless you get a _continuous_flood_ of these messages it means
[71266.466454] everything is working fine. Allocations from irqs cannot be
[71266.466455] perfectly reliable and the kernel is designed to handle that.
[71266.466457] ptlrpcd_1: page allocation failure: order:1, mode:0x40
[71266.466460] Pid: 29131, comm: ptlrpcd_1 Tainted: G           N  3.0.80-0.7-default #1
[71266.466462] Call Trace:
[71266.466476]  [<ffffffff810048b5>] dump_trace+0x75/0x310
[71266.466483]  [<ffffffff81444163>] dump_stack+0x69/0x6f
[71266.466488]  [<ffffffff810f7882>] warn_alloc_failed+0x102/0x1a0
[71266.466493]  [<ffffffff810f9369>] __alloc_pages_slowpath+0x559/0x7f0
[71266.466496]  [<ffffffff810f97e9>] __alloc_pages_nodemask+0x1e9/0x200
[71266.466501]  [<ffffffff8113ad16>] kmem_getpages+0x56/0x170
[71266.466505]  [<ffffffff8113bb5b>] fallback_alloc+0x19b/0x270
[71266.466508]  [<ffffffff8113c1f4>] __kmalloc+0x284/0x330
[71266.466527]  [<ffffffffa0515063>] LNetMDAttach+0x163/0x5b0 [lnet]
[71266.466593]  [<ffffffffa07f1848>] ptlrpc_register_bulk+0x258/0x9e0 [ptlrpc]
[71266.466654]  [<ffffffffa07f2a73>] ptl_send_rpc+0x173/0xc30 [ptlrpc]
[71266.466704]  [<ffffffffa07e8990>] ptlrpc_send_new_req+0x4a0/0x870 [ptlrpc]
[71266.466751]  [<ffffffffa07eb958>] ptlrpc_check_set+0x408/0x1af0 [ptlrpc]
[71266.466800]  [<ffffffffa081770b>] ptlrpcd_check+0x52b/0x550 [ptlrpc]
[71266.466862]  [<ffffffffa0817ba7>] ptlrpcd+0x197/0x3a0 [ptlrpc]
[71266.466896]  [<ffffffff8107b666>] kthread+0x96/0xa0
[71266.466901]  [<ffffffff8144ff44>] kernel_thread_helper+0x4/0x10
[71266.466904] Mem-Info:


 Comments   
Comment by Keith Mannthey (Inactive) [ 05/Aug/13 ]

The dmesg for the client shows that allocation were failing all over the place. Are you sure the test was configured correctly?

Are we suddenly using more memory to the point the system is out of memory with the same parameters to the test?

I would think IOR failing for some -ENOMEM reasons would be ok if the system did not panic.

Comment by Andreas Dilger [ 13/Feb/14 ]

Shows mode:0x40 == __GFP_IO, but missing __GFP_WAIT from LU-4357.

Generated at Sat Feb 10 01:36:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.