Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.4.0
-
3
-
4115
Description
This issue was created by maloo for Li Wei <liwei@whamcloud.com>
This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/b21b4ae6-049b-11e2-bfd4-52540035b04c.
The OSS went OOM. From the OSS console:
01:36:27:Lustre: DEBUG MARKER: /usr/sbin/lctl mark ===== mdsrate-create-large.sh ### 1 NODE CREATE ### 01:36:27:Lustre: DEBUG MARKER: ===== mdsrate-create-large.sh 01:43:20:Lustre: 3565:0:(client.c:1905:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1348216986/real 0] req@ffff880011e35000 x1413704938685180/t0(0) o400->MGC10.10.4.186@tcp@10.10.4.186@tcp:26/25 lens 224/224 e 0 to 1 dl 1348216993 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 01:43:20:LustreError: 166-1: MGC10.10.4.186@tcp: Connection to MGS (at 10.10.4.186@tcp) was lost; in progress operations using this service will fail 01:43:20:Lustre: 3563:0:(client.c:1905:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1348216993/real 0] req@ffff88007ad55c00 x1413704938685182/t0(0) o250->MGC10.10.4.186@tcp@10.10.4.186@tcp:26/25 lens 400/544 e 0 to 1 dl 1348216999 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 01:43:31:auditd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=-17, oom_score_adj=-1000 01:43:31:auditd cpuset=/ mems_allowed=0 01:43:31:Pid: 1143, comm: auditd Not tainted 2.6.32-279.5.1.el6_lustre.g7f15218.x86_64 #1 01:43:31:Call Trace: 01:43:31: [<ffffffff810c4aa1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 01:43:31: [<ffffffff81117210>] ? dump_header+0x90/0x1b0 01:43:31: [<ffffffff810e368e>] ? __delayacct_freepages_end+0x2e/0x30 01:43:31: [<ffffffff8121489c>] ? security_real_capable_noaudit+0x3c/0x70 01:43:31: [<ffffffff81117692>] ? oom_kill_process+0x82/0x2a0 01:43:31: [<ffffffff8111758e>] ? select_bad_process+0x9e/0x120 01:43:31: [<ffffffff81117ad0>] ? out_of_memory+0x220/0x3c0 01:43:31: [<ffffffff81136a89>] ? zone_statistics+0x99/0xc0 01:43:31: [<ffffffff811277ee>] ? __alloc_pages_nodemask+0x89e/0x940 01:43:31: [<ffffffff8115c30a>] ? alloc_pages_current+0xaa/0x110 01:43:31: [<ffffffff81114617>] ? __page_cache_alloc+0x87/0x90 01:43:31: [<ffffffff8112a23b>] ? __do_page_cache_readahead+0xdb/0x210 01:43:31: [<ffffffff8112a391>] ? ra_submit+0x21/0x30 01:43:31: [<ffffffff81115943>] ? filemap_fault+0x4c3/0x500 01:43:31: [<ffffffff8113ed44>] ? __do_fault+0x54/0x510 01:43:31: [<ffffffff8113f2f7>] ? handle_pte_fault+0xf7/0xb50 01:43:31: [<ffffffff814fdca0>] ? thread_return+0x4e/0x76e 01:43:31: [<ffffffff81096fa3>] ? __hrtimer_start_range_ns+0x1a3/0x460 01:43:31: [<ffffffff81096481>] ? lock_hrtimer_base+0x31/0x60 01:43:31: [<ffffffff810972df>] ? hrtimer_try_to_cancel+0x3f/0xd0 01:43:31: [<ffffffff8113ff34>] ? handle_mm_fault+0x1e4/0x2b0 01:43:31: [<ffffffff81044479>] ? __do_page_fault+0x139/0x480 01:43:31: [<ffffffff811beed6>] ? ep_poll+0x306/0x330 01:43:31: [<ffffffff81060250>] ? default_wake_function+0x0/0x20 01:43:31: [<ffffffff815036de>] ? do_page_fault+0x3e/0xa0 01:43:31: [<ffffffff81500a95>] ? page_fault+0x25/0x30 01:43:31:Mem-Info: 01:43:31:Node 0 DMA per-cpu: 01:43:31:CPU 0: hi: 0, btch: 1 usd: 0 01:43:31:Node 0 DMA32 per-cpu: 01:43:31:CPU 0: hi: 186, btch: 31 usd: 95 01:43:31:active_anon:0 inactive_anon:1 isolated_anon:0 01:43:31: active_file:3080 inactive_file:2931 isolated_file:0 01:43:31: unevictable:0 dirty:0 writeback:0 unstable:0 01:43:31: free:13267 slab_reclaimable:410440 slab_unreclaimable:11251 01:43:31: mapped:1 shmem:0 pagetables:726 bounce:0 01:43:31:Node 0 DMA free:8352kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:160kB inactive_file:112kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15324kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:6964kB slab_unreclaimable:132kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:123126 all_unreclaimable? yes 01:43:31:lowmem_reserve[]: 0 2003 2003 2003 01:43:31:Node 0 DMA32 free:44716kB min:44720kB low:55900kB high:67080kB active_anon:0kB inactive_anon:4kB active_file:12160kB inactive_file:11612kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052064kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:1634796kB slab_unreclaimable:44872kB kernel_stack:1624kB pagetables:2904kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:663531 all_unreclaimable? yes 01:43:31:lowmem_reserve[]: 0 0 0 0 01:43:31:Node 0 DMA: 34*4kB 35*8kB 20*16kB 16*32kB 9*64kB 7*128kB 2*256kB 2*512kB 2*1024kB 1*2048kB 0*4096kB = 8352kB 01:43:31:Node 0 DMA32: 3943*4kB 1842*8kB 412*16kB 48*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 44716kB 01:43:31:6012 total pagecache pages 01:43:31:1 pages in swap cache 01:43:31:Swap cache stats: add 4103, delete 4102, find 33/55 01:43:31:Free swap = 4112960kB 01:43:31:Total swap = 4128760kB 01:43:31:524284 pages RAM 01:43:31:43628 pages reserved 01:43:31:6039 pages shared 01:43:31:456143 pages non-shared 01:43:31:[ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name 01:43:31:[ 469] 0 469 2754 0 0 -17 -1000 udevd 01:43:31:[ 796] 0 796 2664 0 0 -17 -1000 udevd 01:43:31:[ 1143] 0 1143 6914 1 0 -17 -1000 auditd 01:43:31:[ 1159] 0 1159 62270 1 0 0 0 rsyslogd 01:43:31:[ 1201] 32 1201 4742 1 0 0 0 rpcbind 01:43:31:[ 1213] 0 1213 45433 1 0 0 0 sssd 01:43:31:[ 1217] 0 1217 50973 1 0 0 0 sssd_be 01:43:31:[ 1225] 0 1225 42895 1 0 0 0 sssd_nss 01:43:31:[ 1226] 0 1226 42844 1 0 0 0 sssd_pam 01:43:31:[ 1233] 29 1233 6353 1 0 0 0 rpc.statd 01:43:31:[ 1359] 81 1359 5867 1 0 0 0 dbus-daemon 01:43:31:[ 1392] 0 1392 1018 0 0 0 0 acpid 01:43:31:[ 1401] 68 1401 6785 1 0 0 0 hald 01:43:31:[ 1402] 0 1402 4524 1 0 0 0 hald-runner 01:43:31:[ 1430] 0 1430 5053 1 0 0 0 hald-addon-inpu 01:43:31:[ 1440] 68 1440 4449 1 0 0 0 hald-addon-acpi 01:43:31:[ 1461] 0 1461 150867 2 0 0 0 automount 01:43:31:[ 1502] 0 1502 26825 0 0 0 0 rpc.rquotad 01:43:31:[ 1506] 0 1506 5412 0 0 0 0 rpc.mountd 01:43:31:[ 1555] 0 1555 6289 1 0 0 0 rpc.idmapd 01:43:31:[ 1580] 0 1580 16016 0 0 -17 -1000 sshd 01:43:31:[ 1588] 0 1588 5521 1 0 0 0 xinetd 01:43:31:[ 1596] 38 1596 7003 1 0 0 0 ntpd 01:43:31:[ 1612] 0 1612 22182 0 0 0 0 sendmail 01:43:31:[ 1620] 51 1620 19528 0 0 0 0 sendmail 01:43:31:[ 1642] 0 1642 27016 1 0 0 0 abrt-dump-oops 01:43:31:[ 1650] 0 1650 29302 1 0 0 0 crond 01:43:31:[ 1661] 0 1661 5362 0 0 0 0 atd 01:43:31:[ 1686] 0 1686 1017 1 0 0 0 agetty 01:43:31:[ 1688] 0 1688 1014 1 0 0 0 mingetty 01:43:31:[ 1690] 0 1690 1014 1 0 0 0 mingetty 01:43:31:[ 1692] 0 1692 1014 1 0 0 0 mingetty 01:43:31:[ 1694] 0 1694 2753 0 0 -17 -1000 udevd 01:43:31:[ 1695] 0 1695 1014 1 0 0 0 mingetty 01:43:31:[ 1697] 0 1697 1014 1 0 0 0 mingetty 01:43:31:[ 1699] 0 1699 1014 1 0 0 0 mingetty 01:43:31:Out of memory: Kill process 1159 (rsyslogd) score 1 or sacrifice child 01:43:31:Killed process 1159, UID 0, (rsyslogd) total-vm:249080kB, anon-rss:0kB, file-rss:4kB