[LU-10056] sanity test_60a invokes oom-killer in subtest 7f and times out Created: 02/Oct/17 Updated: 27/Jan/18 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jian Yu | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Console log on MDS: Lustre: 32199:0:(llog_test.c:1018:llog_test_7_sub()) 7_sub: records are not aligned, written 64071 from 64767 Lustre: 32199:0:(llog_test.c:1124:llog_test_7()) 7f: test llog_changelog_user_rec sssd_ssh invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 sssd_ssh cpuset=/ mems_allowed=0 CPU: 0 PID: 665 Comm: sssd_ssh Tainted: P OE ------------ 3.10.0-693.1.1.el7_lustre.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 ffff88003690dee0 000000007415b4cd ffff88007a0b39f0 ffffffff816a3d6d ffff88007a0b3a80 ffffffff8169f186 ffff88007a0b3ae8 ffff88007a0b3a40 ffffffff816b04dc ffffffff81a6ea00 0000000000000000 0000000000000000 Call Trace: [<ffffffff816a3d6d>] dump_stack+0x19/0x1b [<ffffffff8169f186>] dump_header+0x90/0x229 [<ffffffff816b04dc>] ? notifier_call_chain+0x4c/0x70 [<ffffffff810b6ab8>] ? __blocking_notifier_call_chain+0x58/0x70 [<ffffffff8118653e>] check_panic_on_oom+0x2e/0x60 [<ffffffff8118695b>] out_of_memory+0x23b/0x4f0 [<ffffffff8169fc8a>] __alloc_pages_slowpath+0x5d6/0x724 [<ffffffff8118cd85>] __alloc_pages_nodemask+0x405/0x420 [<ffffffff811d412f>] alloc_pages_vma+0xaf/0x1f0 [<ffffffff811c3830>] ? end_swap_bio_write+0x80/0x80 [<ffffffff811c453d>] read_swap_cache_async+0xed/0x160 [<ffffffff811c4658>] swapin_readahead+0xa8/0x110 [<ffffffff811b235b>] handle_mm_fault+0xadb/0xfa0 [<ffffffff8109ea4c>] ? signal_setup_done+0x3c/0x60 [<ffffffff816affb4>] __do_page_fault+0x154/0x450 [<ffffffff816b02e5>] do_page_fault+0x35/0x90 [<ffffffff816ac508>] page_fault+0x28/0x30 Maloo reports: |
| Comments |
| Comment by Bruno Faccini (Inactive) [ 24/Oct/17 ] |
|
+1 at https://testing.hpdd.intel.com/test_sets/68e44f30-b87d-11e7-9abd-52540065bddc I have done some debug on the associated MDS crash-dump due to OOM. It looks like again, the kmalloc-512 kmem_cache's Slabs consume almost all available memory (>1.2GB vs 1.6GB), like for Could it be that something in the auto-test VMs/OS/daemons/... configs has changed ?? And thus we need to apply the same fix (dt_sync() calls to flush journal callbacks) to prior log_test's sub-tests than only in llog_test_10() ?? |
| Comment by Andreas Dilger [ 08/Nov/17 ] |
|
This has been hit a few more times in the past 4 weeks. |
| Comment by Bob Glossman (Inactive) [ 27/Jan/18 ] |
|
another on b2_10: |