[LU-7203] sanity-hsm test_110a: MDS OOM Created: 23/Sep/15 Updated: 10/Oct/21 Resolved: 10/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
client and server: lustre-master build# 3192 RHEL6.7 DNE |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for sarah_lw <wei3.liu@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/3e45127e-60b3-11e5-ba37-5254006e85c2. The sub-test test_110a failed with the following error: test failed to respond and timed out mds console 16:54:35:Lustre: DEBUG MARKER: == sanity-hsm test 110a: Non blocking restore policy (import case) == 16:27:28 (1442766448) 16:54:35:Lustre: lustre-MDT0000: Client lustre-MDT0003-mdtlov_UUID (at 10.1.6.124@tcp) reconnecting, waiting for 7 clients in recovery for 21188480:59 16:54:35:Lustre: Skipped 696 previous similar messages 16:54:35:Lustre: lustre-MDT0000: Connection restored to 6fadab14-5579-715e-6a00-fbf0bb8ebcc8 (at 10.1.6.123@tcp) 16:54:35:Lustre: Skipped 695 previous similar messages 16:54:35:LustreError: 573:0:(tgt_handler.c:502:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff880052f86080 x1512821167356112/t0(0) o601->lustre-MDT0000-lwp-OST0004_UUID@10.1.6.121@tcp:453/0 lens 336/0 e 0 to 0 dl 1442766948 ref 1 fl Interpret:/0/ffffffff rc 0/-1 16:54:35:LustreError: 573:0:(tgt_handler.c:502:tgt_filter_recovery_request()) Skipped 39 previous similar messages 16:54:35:Lustre: lustre-MDT0000: Client lustre-MDT0002-mdtlov_UUID (at 10.1.6.124@tcp) reconnecting, waiting for 7 clients in recovery for 21188470:59 16:54:35:Lustre: Skipped 699 previous similar messages 16:54:35:Lustre: lustre-MDT0000: Connection restored to 6fadab14-5579-715e-6a00-fbf0bb8ebcc8 (at 10.1.6.123@tcp) 16:54:35:Lustre: Skipped 695 previous similar messages 16:54:35:LustreError: 577:0:(tgt_handler.c:502:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff88005adb7050 x1512821167372272/t0(0) o601->lustre-MDT0000-lwp-OST0001_UUID@10.1.6.121@tcp:304/0 lens 336/0 e 0 to 0 dl 1442767554 ref 1 fl Interpret:/0/ffffffff rc 0/-1 16:54:35:LustreError: 577:0:(tgt_handler.c:502:tgt_filter_recovery_request()) Skipped 41 previous similar messages 16:54:35:Lustre: lustre-MDT0000: Client lustre-MDT0001-mdtlov_UUID (at 10.1.6.124@tcp) reconnecting, waiting for 7 clients in recovery for 21188460:58 16:54:35:Lustre: Skipped 695 previous similar messages 16:54:35:Lustre: lustre-MDT0000: Connection restored to 6fadab14-5579-715e-6a00-fbf0bb8ebcc8 (at 10.1.6.123@tcp) 16:54:35:Lustre: Skipped 694 previous similar messages 16:54:35:mdt_out00_002 invoked oom-killer: gfp_mask=0x80d2, order=0, oom_adj=0, oom_score_adj=0 16:54:35:mdt_out00_002 cpuset=/ mems_allowed=0 16:54:35:Pid: 576, comm: mdt_out00_002 Not tainted 2.6.32-573.3.1.el6_lustre.gc3d2503.x86_64 #1 16:54:35:Call Trace: 16:54:35: [<ffffffff810d7201>] ? cpuset_print_task_mems_allowed+0x91/0xb0 16:54:35: [<ffffffff8112aa00>] ? dump_header+0x90/0x1b0 16:54:35: [<ffffffff8112ab6e>] ? check_panic_on_oom+0x4e/0x80 16:54:35: [<ffffffff8112b25b>] ? out_of_memory+0x1bb/0x3c0 16:54:35: [<ffffffff81137c9c>] ? __alloc_pages_nodemask+0x93c/0x950 16:54:35: [<ffffffff81161469>] ? __vmalloc_area_node+0xb9/0x190 16:54:35: [<ffffffffa082ad4e>] ? ptlrpc_grow_req_bufs+0x33e/0x7e0 [ptlrpc] 16:54:35: [<ffffffff8116133d>] ? __vmalloc_node+0xad/0x120 16:54:35: [<ffffffffa082ad4e>] ? ptlrpc_grow_req_bufs+0x33e/0x7e0 [ptlrpc] 16:54:35: [<ffffffff81161759>] ? vzalloc_node+0x29/0x30 16:54:35: [<ffffffffa082ad4e>] ? ptlrpc_grow_req_bufs+0x33e/0x7e0 [ptlrpc] 16:54:35: [<ffffffffa082db55>] ? ptlrpc_main+0x1265/0x1910 [ptlrpc] 16:54:35: [<ffffffffa082c8f0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] 16:54:35: [<ffffffff810a101e>] ? kthread+0x9e/0xc0 16:54:35: [<ffffffff8100c28a>] ? child_rip+0xa/0x20 16:54:35: [<ffffffff810a0f80>] ? kthread+0x0/0xc0 16:54:35: [<ffffffff8100c280>] ? child_rip+0x0/0x20 16:54:35:Mem-Info: 16:54:35:Node 0 DMA per-cpu: 16:54:35:CPU 0: hi: 0, btch: 1 usd: 0 16:54:35:CPU 1: hi: 0, btch: 1 usd: 0 16:54:35:Node 0 DMA32 per-cpu: 16:54:35:CPU 0: hi: 186, btch: 31 usd: 0 16:54:35:CPU 1: hi: 186, btch: 31 usd: 30 16:54:35:active_anon:317 inactive_anon:359 isolated_anon:0 16:54:35: active_file:127 inactive_file:21 isolated_file:0 16:54:35: unevictable:0 dirty:0 writeback:372 unstable:0 16:54:35: free:13241 slab_reclaimable:1628 slab_unreclaimable:354951 16:54:35: mapped:9 shmem:1 pagetables:579 bounce:0 16:54:35:Node 0 DMA free:8356kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:7168kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes 16:54:35:lowmem_reserve[]: 0 2004 2004 2004 16:54:35:Node 0 DMA32 free:44608kB min:44720kB low:55900kB high:67080kB active_anon:1268kB inactive_anon:1436kB active_file:256kB inactive_file:340kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:1488kB mapped:36kB shmem:4kB slab_reclaimable:6512kB slab_unreclaimable:1412636kB kernel_stack:3584kB pagetables:2316kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:15712 all_unreclaimable? no 16:54:35:lowmem_reserve[]: 0 0 0 0 16:54:35:Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 1*4096kB = 8356kB 16:54:35:Node 0 DMA32: 622*4kB 469*8kB 344*16kB 233*32kB 143*64kB 67*128kB 28*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 44608kB 16:54:35:685 total pagecache pages 16:54:35:516 pages in swap cache 16:54:35:Swap cache stats: add 4038, delete 3522, find 145/145 16:54:35:Free swap = 4112648kB 16:54:35:Total swap = 4128764kB 16:54:35:524284 pages RAM 16:54:35:43736 pages reserved 16:54:35:311 pages shared 16:54:35:122697 pages non-shared 16:54:35:[ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name 16:54:35:[ 523] 0 523 2729 0 1 -17 -1000 udevd 16:54:35:[ 892] 0 892 2729 0 0 -17 -1000 udevd 16:54:35:[ 1036] 0 1036 2728 0 1 -17 -1000 udevd 16:54:35:[ 1294] 0 1294 23286 21 0 -17 -1000 auditd 16:54:35:[ 1328] 0 1328 62369 32 1 0 0 rsyslogd 16:54:35:[ 1362] 0 1362 4561 16 0 0 0 irqbalance 16:54:35:[ 1380] 32 1380 4744 15 1 0 0 rpcbind 16:54:35:[ 1402] 29 1402 5837 1 1 0 0 rpc.statd 16:54:35:[ 1436] 81 1436 6418 1 0 0 0 dbus-daemon 16:54:35:[ 1456] 0 1456 53919 3 1 0 0 ypbind 16:54:35:[ 1490] 0 1490 47233 1 0 0 0 cupsd 16:54:35:[ 1547] 0 1547 1020 0 0 0 0 acpid 16:54:35:[ 1559] 68 1559 10517 9 0 0 0 hald 16:54:35:[ 1560] 0 1560 5099 1 0 0 0 hald-runner 16:54:35:[ 1592] 0 1592 5629 1 0 0 0 hald-addon-inpu 16:54:35:[ 1602] 68 1602 4501 1 0 0 0 hald-addon-acpi 16:54:35:[ 1648] 0 1648 26827 0 0 0 0 rpc.rquotad 16:54:35:[ 1653] 0 1653 5417 0 0 0 0 rpc.mountd 16:54:35:[ 1697] 0 1697 6292 1 0 0 0 rpc.idmapd 16:54:35:[ 1734] 498 1734 57325 192 0 0 0 munged 16:54:35:[ 1845] 0 1845 16555 0 0 -17 -1000 sshd 16:54:35:[ 1856] 0 1856 5429 7 1 0 0 xinetd 16:54:35:[ 1888] 0 1888 22211 22 0 0 0 sendmail 16:54:35:[ 1897] 51 1897 20073 0 0 0 0 sendmail 16:54:35:[ 1925] 0 1925 29216 14 0 0 0 crond 16:54:35:[ 1940] 0 1940 5276 0 0 0 0 atd 16:54:35:[ 1967] 0 1967 1020 1 1 0 0 agetty 16:54:35:[ 1969] 0 1969 1016 1 1 0 0 mingetty 16:54:35:[ 1971] 0 1971 1016 1 1 0 0 mingetty 16:54:35:[ 1973] 0 1973 1016 1 1 0 0 mingetty 16:54:35:[ 1975] 0 1975 1016 1 1 0 0 mingetty 16:54:35:[ 1977] 0 1977 1016 1 0 0 0 mingetty 16:54:35:[ 1979] 0 1979 1016 1 0 0 0 mingetty 16:54:35:[ 2731] 38 2731 7690 1 1 0 0 ntpd 16:54:35:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled 16:54:35: 16:54:35:Pid: 576, comm: mdt_out00_002 Not tainted 2.6.32-573.3.1.el6_lustre.gc3d2503.x86_64 #1 16:54:35:Call Trace: 16:54:35: [<ffffffff815384e4>] ? panic+0xa7/0x16f 16:54:35: [<ffffffff8112aa71>] ? dump_header+0x101/0x1b0 16:54:35: [<ffffffff8112ab9c>] ? check_panic_on_oom+0x7c/0x80 16:54:35: [<ffffffff8112b25b>] ? out_of_memory+0x1bb/0x3c0 16:54:35: [<ffffffff81137c9c>] ? __alloc_pages_nodemask+0x93c/0x950 16:54:35: [<ffffffff81161469>] ? __vmalloc_area_node+0xb9/0x190 16:54:35: [<ffffffffa082ad4e>] ? ptlrpc_grow_req_bufs+0x33e/0x7e0 [ptlrpc] 16:54:35: [<ffffffff8116133d>] ? __vmalloc_node+0xad/0x120 16:54:35: [<ffffffffa082ad4e>] ? ptlrpc_grow_req_bufs+0x33e/0x7e0 [ptlrpc] 16:54:35: [<ffffffff81161759>] ? vzalloc_node+0x29/0x30 16:54:35: [<ffffffffa082ad4e>] ? ptlrpc_grow_req_bufs+0x33e/0x7e0 [ptlrpc] 16:54:35: [<ffffffffa082db55>] ? ptlrpc_main+0x1265/0x1910 [ptlrpc] 16:54:35: [<ffffffffa082c8f0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] 16:54:35: [<ffffffff810a101e>] ? kthread+0x9e/0xc0 16:54:35: [<ffffffff8100c28a>] ? child_rip+0xa/0x20 16:54:35: [<ffffffff810a0f80>] ? kthread+0x0/0xc0 16:54:35: [<ffffffff8100c280>] ? child_rip+0x0/0x20 |