Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.9.0
-
None
-
review-dne-part-1
-
3
-
9223372036854775807
Description
sanity test 60a invokes oom-killer and times out on subtest 10b. From a recent failure, the MDS console log has
02:35:42:Lustre: 6822:0:(llog_test.c:1468:llog_test_10()) 10a: create a catalog log with name: 851d315f 02:35:42:Lustre: 6822:0:(llog_test.c:1487:llog_test_10()) 10b: write 65536 log records 02:35:42:ntpd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 02:35:42:ntpd cpuset=/ mems_allowed=0 02:35:42:Pid: 10244, comm: ntpd Not tainted 2.6.32-573.18.1.el6_lustre.ge5f28dc.x86_64 #1 02:35:42:Call Trace: 02:35:42: [<ffffffff810d71a1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 02:35:42: [<ffffffff8112a9a0>] ? dump_header+0x90/0x1b0 02:35:42: [<ffffffff8112ab0e>] ? check_panic_on_oom+0x4e/0x80 02:35:42: [<ffffffff8112b1fb>] ? out_of_memory+0x1bb/0x3c0 02:35:42: [<ffffffff81137c3c>] ? __alloc_pages_nodemask+0x93c/0x950 02:35:42: [<ffffffff811709ba>] ? alloc_pages_current+0xaa/0x110 02:35:42: [<ffffffff81127d97>] ? __page_cache_alloc+0x87/0x90 02:35:42: [<ffffffff8112777e>] ? find_get_page+0x1e/0xa0 02:35:42: [<ffffffff81128d37>] ? filemap_fault+0x1a7/0x500 02:35:42: [<ffffffff81152314>] ? __do_fault+0x54/0x530 02:35:42: [<ffffffff811a9f30>] ? pollwake+0x0/0x60 02:35:42: [<ffffffff811528e7>] ? handle_pte_fault+0xf7/0xb20 02:35:42: [<ffffffff8153cd75>] ? _read_unlock_bh+0x15/0x20 02:35:42: [<ffffffff8145b227>] ? sock_i_uid+0x47/0x60 02:35:42: [<ffffffff81012bbe>] ? copy_user_generic+0xe/0x20 02:35:42: [<ffffffff811535a9>] ? handle_mm_fault+0x299/0x3d0 02:35:42: [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 02:35:42: [<ffffffff81042fc9>] ? kvm_clock_get_cycles+0x9/0x10 02:35:42: [<ffffffff810ad40f>] ? ktime_get_ts+0xbf/0x100 02:35:42: [<ffffffff811a9b78>] ? poll_select_copy_remaining+0xf8/0x150 02:35:42: [<ffffffff8153fd1e>] ? do_page_fault+0x3e/0xa0 02:35:42: [<ffffffff8153d0c5>] ? page_fault+0x25/0x30 02:35:42:Mem-Info: 02:35:42:Node 0 DMA per-cpu: 02:35:42:CPU 0: hi: 0, btch: 1 usd: 0 02:35:42:CPU 1: hi: 0, btch: 1 usd: 0 02:35:42:Node 0 DMA32 per-cpu: 02:35:42:CPU 0: hi: 186, btch: 31 usd: 75 02:35:42:CPU 1: hi: 186, btch: 31 usd: 0 02:35:42:active_anon:3065 inactive_anon:3126 isolated_anon:0 02:35:42: active_file:1149 inactive_file:1114 isolated_file:0 02:35:42: unevictable:0 dirty:0 writeback:3125 unstable:0 02:35:42: free:13241 slab_reclaimable:2692 slab_unreclaimable:420617 02:35:42: mapped:1 shmem:3 pagetables:1219 bounce:0 02:35:42:Node 0 DMA free:8356kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:7360kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1 all_unreclaimable? yes 02:35:42:lowmem_reserve[]: 0 2004 2004 2004 02:35:42:Node 0 DMA32 free:44608kB min:44720kB low:55900kB high:67080kB active_anon:12260kB inactive_anon:12504kB active_file:4596kB inactive_file:4452kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:3324kB mapped:4kB shmem:12kB slab_reclaimable:10764kB slab_unreclaimable:1674856kB kernel_stack:4592kB pagetables:4876kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:279133 all_unreclaimable? yes 02:35:42:lowmem_reserve[]: 0 0 0 0 02:35:42:Node 0 DMA: 3*4kB 1*8kB 3*16kB 3*32kB 2*64kB 3*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8356kB 02:35:42:Node 0 DMA32: 1325*4kB 753*8kB 501*16kB 426*32kB 90*64kB 2*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 44876kB 02:35:42:5204 total pagecache pages 02:35:42:2638 pages in swap cache 02:35:42:Swap cache stats: add 3431, delete 825, find 6/6 02:35:42:Free swap = 4115064kB 02:35:42:Total swap = 4128764kB 02:35:42:524284 pages RAM 02:35:42:43737 pages reserved 02:35:42:2420 pages shared 02:35:42:437717 pages non-shared 02:35:42:[ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name 02:35:42:[ 456] 0 456 2695 4 0 -17 -1000 udevd 02:35:42:[ 761] 0 761 2695 9 0 -17 -1000 udevd 02:35:42:[ 915] 0 915 2694 12 0 -17 -1000 udevd 02:35:42:[ 1175] 0 1175 2280 1 0 0 0 dhclient 02:35:42:[ 1238] 0 1238 6905 13 1 -17 -1000 auditd 02:35:42:[ 1272] 0 1272 63854 276 0 0 0 rsyslogd 02:35:42:[ 1306] 0 1306 4561 1 0 0 0 irqbalance 02:35:42:[ 1324] 32 1324 4744 1 0 0 0 rpcbind 02:35:42:[ 1340] 0 1340 52816 5 0 0 0 sssd 02:35:42:[ 1341] 0 1341 74242 55 1 0 0 sssd_be 02:35:42:[ 1342] 0 1342 54172 6 1 0 0 sssd_nss 02:35:42:[ 1343] 0 1343 50533 2 1 0 0 sssd_pam 02:35:42:[ 1344] 0 1344 50020 26 0 0 0 sssd_ssh 02:35:42:[ 1345] 0 1345 55112 2 1 0 0 sssd_pac 02:35:42:[ 1367] 29 1367 6357 1 1 0 0 rpc.statd 02:35:42:[ 1401] 81 1401 5878 3 0 0 0 dbus-daemon 02:35:42:[ 1422] 0 1422 47234 1 1 0 0 cupsd 02:35:42:[ 1466] 0 1466 1020 0 0 0 0 acpid 02:35:42:[ 1478] 68 1478 10493 1 0 0 0 hald 02:35:42:[ 1479] 0 1479 5099 1 1 0 0 hald-runner 02:35:42:[ 1511] 0 1511 5629 1 0 0 0 hald-addon-inpu 02:35:42:[ 1523] 68 1523 4501 1 0 0 0 hald-addon-acpi 02:35:42:[ 1547] 0 1547 169290 182 1 0 0 automount 02:35:42:[ 1601] 0 1601 26827 0 0 0 0 rpc.rquotad 02:35:42:[ 1606] 0 1606 5417 0 0 0 0 rpc.mountd 02:35:42:[ 1651] 0 1651 5774 1 0 0 0 rpc.idmapd 02:35:42:[ 1688] 496 1688 56786 138 0 0 0 munged 02:35:42:[ 8272] 0 8272 16556 0 1 -17 -1000 sshd 02:35:42:[ 8283] 0 8283 5429 36 1 0 0 xinetd 02:35:42:[ 8373] 0 8373 20737 227 0 0 0 master 02:35:42:[ 8387] 89 8387 20757 219 0 0 0 pickup 02:35:42:[ 8388] 89 8388 20800 221 0 0 0 qmgr 02:35:42:[ 8402] 0 8402 29215 151 0 0 0 crond 02:35:42:[ 8421] 0 8421 5276 46 1 0 0 atd 02:35:42:[ 8453] 0 8453 16112 187 1 0 0 certmonger 02:35:42:[ 8482] 0 8482 1020 23 1 0 0 agetty 02:35:42:[ 8486] 0 8486 1016 21 1 0 0 mingetty 02:35:42:[ 8491] 0 8491 1016 22 1 0 0 mingetty 02:35:42:[ 8494] 0 8494 1016 22 0 0 0 mingetty 02:35:42:[ 8499] 0 8499 1016 21 1 0 0 mingetty 02:35:42:[ 8504] 0 8504 1016 21 1 0 0 mingetty 02:35:42:[ 8507] 0 8507 1016 21 0 0 0 mingetty 02:35:42:[10244] 38 10244 8207 161 1 0 0 ntpd 02:35:42:[30737] 0 30737 4237 52 1 0 0 anacron 02:35:42:[ 6720] 0 6720 15806 178 0 0 0 in.mrshd 02:35:42:[ 6721] 0 6721 26515 66 0 0 0 bash 02:35:42:[ 6746] 0 6746 26515 66 1 0 0 bash 02:35:42:[ 6747] 0 6747 27593 618 0 0 0 sh 02:35:42:[ 6822] 0 6822 28024 73 0 0 0 lctl 02:35:42:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled 02:35:42: 02:35:42:Pid: 10244, comm: ntpd Not tainted 2.6.32-573.18.1.el6_lustre.ge5f28dc.x86_64 #1
Here are recent logs for this type of failure
2016-03-10
https://testing.hpdd.intel.com/test_sets/2e6d4b9a-e700-11e5-bdde-5254006e85c2
https://testing.hpdd.intel.com/test_sets/6c42ab98-e6d0-11e5-8590-5254006e85c2
2016-03-14
https://testing.hpdd.intel.com/test_sets/8f4538da-ea7d-11e5-8779-5254006e85c2
https://testing.hpdd.intel.com/test_sets/bc22e04c-ea4f-11e5-8606-5254006e85c2
https://testing.hpdd.intel.com/test_sets/7dc47e5e-ea41-11e5-8606-5254006e85c2
https://testing.hpdd.intel.com/test_sets/a332f46a-e9fe-11e5-8186-5254006e85c2