[LU-3702] Failure on test suite parallel-scale test_iorssf: client out of memory Created: 05/Aug/13 Updated: 13/Feb/14 Resolved: 13/Feb/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Keith Mannthey (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server and client : lustre-master build #1592 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9551 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/16f0f346-fd6e-11e2-9fdb-52540035b04c. The sub-test test_iorssf failed with the following error:
test log ERROR in aiori-POSIX.c (line 362): cannot get status of written file. ERROR: Cannot allocate memory client dmesg: [71266.413667] LNetError: 29131:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: out of memory at /var/lib/jenkins/workspace/lustre-master/arch/x86_64/build_type/client/distro/sles11sp2/ib_stack/inkernel/BUILD/BUILD/lustre-2.4.53/lnet/include/lnet/lib-lnet.h:457 (tried to alloc '(md)' = 4208) [71266.413673] LNetError: 29131:0:(lib-lnet.h:457:lnet_md_alloc()) Skipped 4 previous similar messages [71266.413676] LNetError: 29131:0:(lib-lnet.h:457:lnet_md_alloc()) LNET: 10747055 total bytes allocated by lnet [71266.413679] LNetError: 29131:0:(lib-lnet.h:457:lnet_md_alloc()) Skipped 4 previous similar messages [71266.413684] LustreError: 29131:0:(niobuf.c:376:ptlrpc_register_bulk()) lustre-OST0003-osc-ffff880065e08000: LNetMDAttach failed x1442416780175324/0: rc = -12 [71266.413687] LustreError: 29131:0:(niobuf.c:376:ptlrpc_register_bulk()) Skipped 4 previous similar messages [71266.466449] The following is only an harmless informational message. [71266.466452] Unless you get a _continuous_flood_ of these messages it means [71266.466454] everything is working fine. Allocations from irqs cannot be [71266.466455] perfectly reliable and the kernel is designed to handle that. [71266.466457] ptlrpcd_1: page allocation failure: order:1, mode:0x40 [71266.466460] Pid: 29131, comm: ptlrpcd_1 Tainted: G N 3.0.80-0.7-default #1 [71266.466462] Call Trace: [71266.466476] [<ffffffff810048b5>] dump_trace+0x75/0x310 [71266.466483] [<ffffffff81444163>] dump_stack+0x69/0x6f [71266.466488] [<ffffffff810f7882>] warn_alloc_failed+0x102/0x1a0 [71266.466493] [<ffffffff810f9369>] __alloc_pages_slowpath+0x559/0x7f0 [71266.466496] [<ffffffff810f97e9>] __alloc_pages_nodemask+0x1e9/0x200 [71266.466501] [<ffffffff8113ad16>] kmem_getpages+0x56/0x170 [71266.466505] [<ffffffff8113bb5b>] fallback_alloc+0x19b/0x270 [71266.466508] [<ffffffff8113c1f4>] __kmalloc+0x284/0x330 [71266.466527] [<ffffffffa0515063>] LNetMDAttach+0x163/0x5b0 [lnet] [71266.466593] [<ffffffffa07f1848>] ptlrpc_register_bulk+0x258/0x9e0 [ptlrpc] [71266.466654] [<ffffffffa07f2a73>] ptl_send_rpc+0x173/0xc30 [ptlrpc] [71266.466704] [<ffffffffa07e8990>] ptlrpc_send_new_req+0x4a0/0x870 [ptlrpc] [71266.466751] [<ffffffffa07eb958>] ptlrpc_check_set+0x408/0x1af0 [ptlrpc] [71266.466800] [<ffffffffa081770b>] ptlrpcd_check+0x52b/0x550 [ptlrpc] [71266.466862] [<ffffffffa0817ba7>] ptlrpcd+0x197/0x3a0 [ptlrpc] [71266.466896] [<ffffffff8107b666>] kthread+0x96/0xa0 [71266.466901] [<ffffffff8144ff44>] kernel_thread_helper+0x4/0x10 [71266.466904] Mem-Info: |
| Comments |
| Comment by Keith Mannthey (Inactive) [ 05/Aug/13 ] |
|
The dmesg for the client shows that allocation were failing all over the place. Are you sure the test was configured correctly? Are we suddenly using more memory to the point the system is out of memory with the same parameters to the test? I would think IOR failing for some -ENOMEM reasons would be ok if the system did not panic. |
| Comment by Andreas Dilger [ 13/Feb/14 ] |
|
Shows mode:0x40 == __GFP_IO, but missing __GFP_WAIT from |