[LU-1472] Test failure on test suite parallel-scale-nfsv4, subtest test_iorssf Created: 04/Jun/12  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Lai Siyao
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 10281

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/4d845966-aa9a-11e1-971d-52540035b04c.

The sub-test test_iorssf failed with the following error:

test failed to respond and timed out

OOM on MDS:
11:43:25:Out of memory: Kill process 1342 (hald) score 1 or sacrifice child
11:43:25:Killed process 1343, UID 0, (hald-runner) total-vm:18088kB, anon-rss:0kB, file-rss:4kB
11:43:25:sendmail invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
11:43:25:sendmail cpuset=/ mems_allowed=0
11:43:25:Pid: 1530, comm: sendmail Not tainted 2.6.32-220.17.1.el6_lustre.x86_64 #1
11:43:25:Call Trace:
11:43:25: [<ffffffff810c2f21>] ? cpuset_print_task_mems_allowed+0x91/0xb0
11:43:25: [<ffffffff81113c80>] ? dump_header+0x90/0x1b0
11:43:25: [<ffffffff810e144e>] ? __delayacct_freepages_end+0x2e/0x30
11:43:25: [<ffffffff8120dcbc>] ? security_real_capable_noaudit+0x3c/0x70
11:43:25: [<ffffffff8111410a>] ? oom_kill_process+0x8a/0x2c0
11:43:25: [<ffffffff81113ffe>] ? select_bad_process+0x9e/0x120
11:43:25: [<ffffffff81114560>] ? out_of_memory+0x220/0x3c0
11:43:26: [<ffffffff8112427e>] ? __alloc_pages_nodemask+0x89e/0x940
11:43:26: [<ffffffff8115867a>] ? alloc_pages_vma+0x9a/0x150
11:43:26: [<ffffffff8114b832>] ? read_swap_cache_async+0xf2/0x150
11:43:26: [<ffffffff8114c249>] ? valid_swaphandles+0x69/0x150
11:43:26: [<ffffffff8114b917>] ? swapin_readahead+0x87/0xc0
11:43:26: [<ffffffff8113c0db>] ? handle_pte_fault+0x70b/0xb50
11:43:26: [<ffffffff8113c704>] ? handle_mm_fault+0x1e4/0x2b0
11:43:26: [<ffffffff81042c29>] ? __do_page_fault+0x139/0x480
11:43:26: [<ffffffff81168d37>] ? __mem_cgroup_uncharge_common+0x87/0x270
11:43:26: [<ffffffff811697ee>] ? mem_cgroup_uncharge_swapcache+0x2e/0xb0
11:43:26: [<ffffffff81110b6e>] ? find_get_page+0x1e/0xa0
11:43:26: [<ffffffff81111f83>] ? filemap_fault+0xd3/0x500
11:43:26: [<ffffffff81169b17>] ? mem_cgroup_update_file_mapped+0x17/0x90
11:43:26: [<ffffffff814f2e3e>] ? do_page_fault+0x3e/0xa0
11:43:26: [<ffffffff814f01f5>] ? page_fault+0x25/0x30
11:43:26: [<ffffffff812761e6>] ? copy_user_generic_unrolled+0x86/0xb0
11:43:26: [<ffffffff81010b2e>] ? copy_user_generic+0xe/0x20
11:43:26: [<ffffffff8118b4a9>] ? set_fd_set+0x49/0x60
11:43:26: [<ffffffff8118c85c>] ? core_sys_select+0x1bc/0x2c0
11:43:27: [<ffffffff81042cd4>] ? __do_page_fault+0x1e4/0x480
11:43:27: [<ffffffff81038488>] ? pvclock_clocksource_read+0x58/0xd0
11:43:27: [<ffffffff8103758c>] ? kvm_clock_read+0x1c/0x20
11:43:27: [<ffffffff81037599>] ? kvm_clock_get_cycles+0x9/0x10
11:43:27: [<ffffffff8109b949>] ? ktime_get_ts+0xa9/0xe0
11:43:27: [<ffffffff8118cbb7>] ? sys_select+0x47/0x110
11:43:27: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
11:43:27:Mem-Info:



 Comments   
Comment by Sarah Liu [ 11/Jun/12 ]

another failure: https://maloo.whamcloud.com/test_sets/81d2c9b8-b1df-11e1-bb61-52540035b04c

Comment by Peter Jones [ 25/Jun/12 ]

Lai

Could you please look into this one?

Thanks

Peter

Comment by Lai Siyao [ 28/Jun/12 ]

This looks to be the same as LU-907, but this time OOM on MDS. Maloo failed to collect debuglog, I'll try to reproduce first.

Comment by Lai Siyao [ 31/Jul/12 ]

This failure is NFSv4 test, open file handle is cached on NFS server, therefore LU-907 is not the cause.

Comment by Lai Siyao [ 06/Aug/12 ]

Hi Sarah, I can't reproduce it anyway. Did you see this OOM failure always for this test? If possible, could you reserve a test environment for me which can reproduce this failure?

Comment by Jodi Levi (Inactive) [ 06/Aug/12 ]

This cannot be reproduced and Sarah has not seen the problem in quite some time. Reducing from Blocker to minor.

Comment by Lai Siyao [ 06/Aug/12 ]

Hi Sarah, is this a interop test yet? I mean the same test as LU-1699?

Comment by Sarah Liu [ 06/Aug/12 ]

No, this is not a interop

Comment by Andreas Dilger [ 29/May/17 ]

Close old ticket.

Generated at Sat Feb 10 01:16:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.