[LU-6145] lfsck-performance test_6: out of memory on MDS Created: 21/Jan/15 Updated: 28/Feb/20 Resolved: 28/Feb/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 17157 |
| Description |
|
This issue was created by maloo for nasf <fan.yong@intel.com> Please provide additional information about the failure here. This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/b91acdba-a103-11e4-87d1-5254006e85c2. 20:46:44:rpm invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 20:46:44:rpm cpuset=/ mems_allowed=0 20:46:44:Pid: 16160, comm: rpm Not tainted 2.6.32-431.29.2.el6_lustre.gffd1fc2.x86_64 #1 20:46:44:Call Trace: 20:46:44: [<ffffffff810d0791>] ? cpuset_print_task_mems_allowed+0x91/0xb0 20:46:44: [<ffffffff81122b60>] ? dump_header+0x90/0x1b0 20:46:44: [<ffffffff81122cce>] ? check_panic_on_oom+0x4e/0x80 20:46:44: [<ffffffff811233bb>] ? out_of_memory+0x1bb/0x3c0 20:46:44: [<ffffffff8112fd3f>] ? __alloc_pages_nodemask+0x89f/0x8d0 20:46:44: [<ffffffff81167cca>] ? alloc_pages_current+0xaa/0x110 20:46:44: [<ffffffff8111ff57>] ? __page_cache_alloc+0x87/0x90 20:46:44: [<ffffffff8111f93e>] ? find_get_page+0x1e/0xa0 20:46:44: [<ffffffff81120ef7>] ? filemap_fault+0x1a7/0x500 20:46:44: [<ffffffff8114a234>] ? __do_fault+0x54/0x530 20:46:44: [<ffffffff8114a807>] ? handle_pte_fault+0xf7/0xb00 20:46:44: [<ffffffff8114b43a>] ? handle_mm_fault+0x22a/0x300 20:46:44: [<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480 20:46:44: [<ffffffff811ab820>] ? mntput_no_expire+0x30/0x110 20:46:44: [<ffffffff8118aba1>] ? __fput+0x1a1/0x210 20:46:44: [<ffffffff810890a1>] ? do_sigaction+0x91/0x1d0 20:46:44: [<ffffffff8152f23e>] ? do_page_fault+0x3e/0xa0 20:46:44: [<ffffffff8152c5f5>] ? page_fault+0x25/0x30 20:46:44:Mem-Info: 20:46:44:Node 0 DMA per-cpu: 20:46:44:CPU 0: hi: 0, btch: 1 usd: 0 20:46:44:CPU 1: hi: 0, btch: 1 usd: 0 20:46:44:Node 0 DMA32 per-cpu: 20:46:44:CPU 0: hi: 186, btch: 31 usd: 1 20:46:44:CPU 1: hi: 186, btch: 31 usd: 30 20:46:44:active_anon:79 inactive_anon:74 isolated_anon:0 20:46:44: active_file:111 inactive_file:71 isolated_file:32 20:46:44: unevictable:0 dirty:0 writeback:82 unstable:0 20:46:44: free:13242 slab_reclaimable:2265 slab_unreclaimable:438364 20:46:44: mapped:0 shmem:6 pagetables:621 bounce:0 20:46:44:Node 0 DMA free:8340kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:7404kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes 20:46:44:lowmem_reserve[]: 0 2004 2004 2004 20:46:44:Node 0 DMA32 free:44628kB min:44720kB low:55900kB high:67080kB active_anon:316kB inactive_anon:296kB active_file:368kB inactive_file:420kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:2052308kB mlocked:0kB dirty:0kB writeback:328kB mapped:0kB shmem:24kB slab_reclaimable:9060kB slab_unreclaimable:1746052kB kernel_stack:1720kB pagetables:2484kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:9792 all_unreclaimable? no 20:46:44:lowmem_reserve[]: 0 0 0 0 20:46:44:Node 0 DMA: 1*4kB 0*8kB 1*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 1*4096kB = 8340kB 20:46:44:Node 0 DMA32: 4719*4kB 2275*8kB 288*16kB 56*32kB 12*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 1 1 0 0 hald-addon-inpu 20:46:44:[ 1303] 68 1303 4483 1 1 0 0 hald-addon-acpi 20:46:44:[ 1342] 0 1342 26827 0 0 0 0 rpc.rquotad 20:46:44:[ 1346] 0 1346 5414 0 0 0 0 rpc.mountd 20:46:44:[ 1381] 0 1381 6291 1 0 0 0 rpc.idmapd 20:46:44:[ 1413] 498 1413 57325 1 0 0 0 munged 20:46:44:[ 1428] 0 1428 16656 0 0 -17 -1000 sshd 20:46:44:[ 1436] 0 1436 5545 1 0 0 0 xinetd 20:46:44:[ 1460] 0 1460 22321 0 1 0 0 sendmail 20:46:44:[ 1468] 51 1468 20183 0 0 0 0 sendmail 20:46:44:[ 1490] 0 1490 29324 1 1 0 0 crond 20:46:44:[ 1501] 0 1501 5385 0 0 0 0 atd 20:46:44:[ 1514] 0 1514 1020 1 1 0 0 agetty 20:46:44:[ 1516] 0 1516 1016 1 1 0 0 mingetty 20:46:44:[ 1518] 0 1518 1016 1 1 0 0 mingetty 20:46:44:[ 1520] 0 1520 1016 1 1 0 0 mingetty 20:46:44:[ 1522] 0 1522 1016 1 0 0 0 mingetty 20:46:44:[ 1523] 0 1523 2663 0 1 -17 -1000 udevd 20:46:44:[ 1524] 0 1524 2696 0 0 -17 -1000 udevd 20:46:44:[ 1526] 0 1526 1016 1 0 0 0 mingetty 20:46:44:[ 1528] 0 1528 1016 1 0 0 0 mingetty 20:46:44:[ 2055] 38 2055 7687 1 0 0 0 ntpd 20:46:44:[22506] 0 22506 4346 0 1 0 0 anacron 20:46:44:[16144] 0 16144 14862 1 0 0 0 in.mrshd 20:46:44:[16145] 0 16145 26515 1 1 0 0 bash 20:46:44:[16159] 0 16159 26515 0 1 0 0 bash 20:46:44:[16160] 0 16160 15217 89 1 0 0 rpm |
| Comments |
| Comment by Oleg Drokin [ 21/Jan/15 ] |
|
So first strange thing: how come rpm is running, it's not part of the test? Some cron job? something that came over network by mistake? This should be possible to see in the crashdump. Additionall - how come we have this panic on oom set in this run? TEI-2286 - how come this went in unnoticed I wonder? |
| Comment by John Hammond [ 22/Jan/15 ] |
crash> ps ... 1436 1 0 ffff88007b4f2ae0 IN 0.0 22180 4 xinetd ... 16144 1436 0 ffff88005404eae0 IN 0.0 59448 4 in.mrshd 16145 16144 1 ffff880063bbeaa0 IN 0.0 106060 4 bash 16159 16145 1 ffff8800595bd540 IN 0.0 106060 0 bash > 16160 16159 1 ffff8800628fb500 RU 0.0 60868 356 rpm Are we sending messages in a tight loop? crash> ps | grep -v IN
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff81a8d020 RU 0.0 0 0 [swapper]
0 0 1 ffff88007e509540 RU 0.0 0 0 [swapper]
23 2 1 ffff88007e5d8080 RU 0.0 0 0 [kblockd/1]
> 1009 1 0 ffff880037471540 RU 0.0 249092 4 rsyslogd
2695 2 0 ffff88007cdf5500 RU 0.0 0 0 [socknal_sd00_01]
2699 2 0 ffff88003790c080 RU 0.0 0 0 [ptlrpcd_0]
2700 2 1 ffff880037bd1500 RU 0.0 0 0 [ptlrpcd_1]
> 16160 16159 1 ffff8800628fb500 RU 0.0 60868 356 rpm
crash> bt 2695
PID: 2695 TASK: ffff88007cdf5500 CPU: 0 COMMAND: "socknal_sd00_01"
#0 [ffff88006afbb810] schedule at ffffffff815296a0
#1 [ffff88006afbb8d8] __cond_resched at ffffffff810695fa
#2 [ffff88006afbb8f8] _cond_resched at ffffffff8152a0e0
#3 [ffff88006afbb908] lock_sock_nested at ffffffff8144ca40
#4 [ffff88006afbb968] tcp_recvmsg at ffffffff814a5b48
#5 [ffff88006afbba78] inet_recvmsg at ffffffff814c750a
#6 [ffff88006afbbab8] sock_recvmsg at ffffffff8144b1c3
#7 [ffff88006afbbc78] kernel_recvmsg at ffffffff8144b234
#8 [ffff88006afbbc98] ksocknal_lib_recv_iov at ffffffffa0a2651a [ksocklnd]
#9 [ffff88006afbbd28] ksocknal_process_receive at ffffffffa0a202aa [ksocklnd]
#10 [ffff88006afbbdc8] ksocknal_scheduler at ffffffffa0a229bb [ksocklnd]
#11 [ffff88006afbbee8] kthread at ffffffff8109abf6
#12 [ffff88006afbbf48] kernel_thread at ffffffff8100c20a
crash> bt 2699
PID: 2699 TASK: ffff88003790c080 CPU: 0 COMMAND: "ptlrpcd_0"
#0 [ffff88007a8d3bb0] schedule at ffffffff815296a0
#1 [ffff88007a8d3c78] __cond_resched at ffffffff810695fa
#2 [ffff88007a8d3c98] _cond_resched at ffffffff8152a0e0
#3 [ffff88007a8d3ca8] ptlrpc_check_set at ffffffffa08045a7 [ptlrpc]
#4 [ffff88007a8d3d68] ptlrpcd_check at ffffffffa0831c63 [ptlrpc]
#5 [ffff88007a8d3dc8] ptlrpcd at ffffffffa083228b [ptlrpc]
#6 [ffff88007a8d3ee8] kthread at ffffffff8109abf6
#7 [ffff88007a8d3f48] kernel_thread at ffffffff8100c20a
crash> bt 2700
PID: 2700 TASK: ffff880037bd1500 CPU: 1 COMMAND: "ptlrpcd_1"
#0 [ffff88007a8d5440] schedule at ffffffff815296a0
#1 [ffff88007a8d5508] __cond_resched at ffffffff810695fa
#2 [ffff88007a8d5528] _cond_resched at ffffffff8152a0e0
#3 [ffff88007a8d5538] shrink_active_list at ffffffff81139ccd
#4 [ffff88007a8d55f8] shrink_mem_cgroup_zone at ffffffff8113aa75
#5 [ffff88007a8d56a8] shrink_zone at ffffffff8113ac3a
#6 [ffff88007a8d5728] do_try_to_free_pages at ffffffff8113ae55
#7 [ffff88007a8d57c8] try_to_free_pages at ffffffff8113b522
#8 [ffff88007a8d5868] __alloc_pages_nodemask at ffffffff8112f91e
#9 [ffff88007a8d59a8] kmem_getpages at ffffffff8116e6b2
#10 [ffff88007a8d59d8] fallback_alloc at ffffffff8116f2ca
#11 [ffff88007a8d5a58] ____cache_alloc_node at ffffffff8116f049
#12 [ffff88007a8d5ab8] __kmalloc at ffffffff8116fe19
#13 [ffff88007a8d5b08] null_alloc_repbuf at ffffffffa084c5ba [ptlrpc]
#14 [ffff88007a8d5b38] sptlrpc_cli_alloc_repbuf at ffffffffa083a7a5 [ptlrpc]
#15 [ffff88007a8d5b68] ptl_send_rpc at ffffffffa080cc51 [ptlrpc]
#16 [ffff88007a8d5c38] ptlrpc_send_new_req at ffffffffa0800cd3 [ptlrpc]
#17 [ffff88007a8d5ca8] ptlrpc_check_set at ffffffffa0804e60 [ptlrpc]
#18 [ffff88007a8d5d68] ptlrpcd_check at ffffffffa0831c63 [ptlrpc]
#19 [ffff88007a8d5dc8] ptlrpcd at ffffffffa08321fa [ptlrpc]
#20 [ffff88007a8d5ee8] kthread at ffffffff8109abf6
#21 [ffff88007a8d5f48] kernel_thread at ffffffff8100c20a
|
| Comment by Andreas Dilger [ 28/Feb/20 ] |
|
Close old bug that hasn't been seen in a long time. |