[LU-7203] sanity-hsm test_110a: MDS OOM Created: 23/Sep/15  Updated: 10/Oct/21  Resolved: 10/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

client and server: lustre-master build# 3192 RHEL6.7 DNE


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/3e45127e-60b3-11e5-ba37-5254006e85c2.

The sub-test test_110a failed with the following error:

test failed to respond and timed out

mds console

16:54:35:Lustre: DEBUG MARKER: == sanity-hsm test 110a: Non blocking restore policy (import case) == 16:27:28 (1442766448)
16:54:35:Lustre: lustre-MDT0000: Client lustre-MDT0003-mdtlov_UUID (at 10.1.6.124@tcp) reconnecting, waiting for 7 clients in recovery for 21188480:59
16:54:35:Lustre: Skipped 696 previous similar messages
16:54:35:Lustre: lustre-MDT0000: Connection restored to 6fadab14-5579-715e-6a00-fbf0bb8ebcc8 (at 10.1.6.123@tcp)
16:54:35:Lustre: Skipped 695 previous similar messages
16:54:35:LustreError: 573:0:(tgt_handler.c:502:tgt_filter_recovery_request()) @@@ not permitted during recovery  req@ffff880052f86080 x1512821167356112/t0(0) o601->lustre-MDT0000-lwp-OST0004_UUID@10.1.6.121@tcp:453/0 lens 336/0 e 0 to 0 dl 1442766948 ref 1 fl Interpret:/0/ffffffff rc 0/-1
16:54:35:LustreError: 573:0:(tgt_handler.c:502:tgt_filter_recovery_request()) Skipped 39 previous similar messages
16:54:35:Lustre: lustre-MDT0000: Client lustre-MDT0002-mdtlov_UUID (at 10.1.6.124@tcp) reconnecting, waiting for 7 clients in recovery for 21188470:59
16:54:35:Lustre: Skipped 699 previous similar messages
16:54:35:Lustre: lustre-MDT0000: Connection restored to 6fadab14-5579-715e-6a00-fbf0bb8ebcc8 (at 10.1.6.123@tcp)
16:54:35:Lustre: Skipped 695 previous similar messages
16:54:35:LustreError: 577:0:(tgt_handler.c:502:tgt_filter_recovery_request()) @@@ not permitted during recovery  req@ffff88005adb7050 x1512821167372272/t0(0) o601->lustre-MDT0000-lwp-OST0001_UUID@10.1.6.121@tcp:304/0 lens 336/0 e 0 to 0 dl 1442767554 ref 1 fl Interpret:/0/ffffffff rc 0/-1
16:54:35:LustreError: 577:0:(tgt_handler.c:502:tgt_filter_recovery_request()) Skipped 41 previous similar messages
16:54:35:Lustre: lustre-MDT0000: Client lustre-MDT0001-mdtlov_UUID (at 10.1.6.124@tcp) reconnecting, waiting for 7 clients in recovery for 21188460:58
16:54:35:Lustre: Skipped 695 previous similar messages
16:54:35:Lustre: lustre-MDT0000: Connection restored to 6fadab14-5579-715e-6a00-fbf0bb8ebcc8 (at 10.1.6.123@tcp)
16:54:35:Lustre: Skipped 694 previous similar messages
16:54:35:mdt_out00_002 invoked oom-killer: gfp_mask=0x80d2, order=0, oom_adj=0, oom_score_adj=0
16:54:35:mdt_out00_002 cpuset=/ mems_allowed=0
16:54:35:Pid: 576, comm: mdt_out00_002 Not tainted 2.6.32-573.3.1.el6_lustre.gc3d2503.x86_64 #1
16:54:35:Call Trace:
16:54:35: [<ffffffff810d7201>] ? cpuset_print_task_mems_allowed+0x91/0xb0
16:54:35: [<ffffffff8112aa00>] ? dump_header+0x90/0x1b0
16:54:35: [<ffffffff8112ab6e>] ? check_panic_on_oom+0x4e/0x80
16:54:35: [<ffffffff8112b25b>] ? out_of_memory+0x1bb/0x3c0
16:54:35: [<ffffffff81137c9c>] ? __alloc_pages_nodemask+0x93c/0x950
16:54:35: [<ffffffff81161469>] ? __vmalloc_area_node+0xb9/0x190
16:54:35: [<ffffffffa082ad4e>] ? ptlrpc_grow_req_bufs+0x33e/0x7e0 [ptlrpc]
16:54:35: [<ffffffff8116133d>] ? __vmalloc_node+0xad/0x120
16:54:35: [<ffffffffa082ad4e>] ? ptlrpc_grow_req_bufs+0x33e/0x7e0 [ptlrpc]
16:54:35: [<ffffffff81161759>] ? vzalloc_node+0x29/0x30
16:54:35: [<ffffffffa082ad4e>] ? ptlrpc_grow_req_bufs+0x33e/0x7e0 [ptlrpc]
16:54:35: [<ffffffffa082db55>] ? ptlrpc_main+0x1265/0x1910 [ptlrpc]
16:54:35: [<ffffffffa082c8f0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
16:54:35: [<ffffffff810a101e>] ? kthread+0x9e/0xc0
16:54:35: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
16:54:35: [<ffffffff810a0f80>] ? kthread+0x0/0xc0
16:54:35: [<ffffffff8100c280>] ? child_rip+0x0/0x20
16:54:35:Mem-Info:
16:54:35:Node 0 DMA per-cpu:
16:54:35:CPU    0: hi:    0, btch:   1 usd:   0
16:54:35:CPU    1: hi:    0, btch:   1 usd:   0
16:54:35:Node 0 DMA32 per-cpu:
16:54:35:CPU    0: hi:  186, btch:  31 usd:   0
16:54:35:CPU    1: hi:  186, btch:  31 usd:  30
16:54:35:active_anon:317 inactive_anon:359 isolated_anon:0
16:54:35: active_file:127 inactive_file:21 isolated_file:0
16:54:35: unevictable:0 dirty:0 writeback:372 unstable:0
16:54:35: free:13241 slab_reclaimable:1628 slab_unreclaimable:354951
16:54:35: mapped:9 shmem:1 pagetables:579 bounce:0
16:54:35:Node 0 DMA free:8356kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:7168kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
16:54:35:lowmem_reserve[]: 0 2004 2004 2004
16:54:35:Node 0 DMA32 free:44608kB min:44720kB low:55900kB high:67080kB active_anon:1268kB inactive_anon:1436kB active_file:256kB inactive_file:340kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:1488kB mapped:36kB shmem:4kB slab_reclaimable:6512kB slab_unreclaimable:1412636kB kernel_stack:3584kB pagetables:2316kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:15712 all_unreclaimable? no
16:54:35:lowmem_reserve[]: 0 0 0 0
16:54:35:Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 1*4096kB = 8356kB
16:54:35:Node 0 DMA32: 622*4kB 469*8kB 344*16kB 233*32kB 143*64kB 67*128kB 28*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 44608kB
16:54:35:685 total pagecache pages
16:54:35:516 pages in swap cache
16:54:35:Swap cache stats: add 4038, delete 3522, find 145/145
16:54:35:Free swap  = 4112648kB
16:54:35:Total swap = 4128764kB
16:54:35:524284 pages RAM
16:54:35:43736 pages reserved
16:54:35:311 pages shared
16:54:35:122697 pages non-shared
16:54:35:[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
16:54:35:[  523]     0   523     2729        0   1     -17         -1000 udevd
16:54:35:[  892]     0   892     2729        0   0     -17         -1000 udevd
16:54:35:[ 1036]     0  1036     2728        0   1     -17         -1000 udevd
16:54:35:[ 1294]     0  1294    23286       21   0     -17         -1000 auditd
16:54:35:[ 1328]     0  1328    62369       32   1       0             0 rsyslogd
16:54:35:[ 1362]     0  1362     4561       16   0       0             0 irqbalance
16:54:35:[ 1380]    32  1380     4744       15   1       0             0 rpcbind
16:54:35:[ 1402]    29  1402     5837        1   1       0             0 rpc.statd
16:54:35:[ 1436]    81  1436     6418        1   0       0             0 dbus-daemon
16:54:35:[ 1456]     0  1456    53919        3   1       0             0 ypbind
16:54:35:[ 1490]     0  1490    47233        1   0       0             0 cupsd
16:54:35:[ 1547]     0  1547     1020        0   0       0             0 acpid
16:54:35:[ 1559]    68  1559    10517        9   0       0             0 hald
16:54:35:[ 1560]     0  1560     5099        1   0       0             0 hald-runner
16:54:35:[ 1592]     0  1592     5629        1   0       0             0 hald-addon-inpu
16:54:35:[ 1602]    68  1602     4501        1   0       0             0 hald-addon-acpi
16:54:35:[ 1648]     0  1648    26827        0   0       0             0 rpc.rquotad
16:54:35:[ 1653]     0  1653     5417        0   0       0             0 rpc.mountd
16:54:35:[ 1697]     0  1697     6292        1   0       0             0 rpc.idmapd
16:54:35:[ 1734]   498  1734    57325      192   0       0             0 munged
16:54:35:[ 1845]     0  1845    16555        0   0     -17         -1000 sshd
16:54:35:[ 1856]     0  1856     5429        7   1       0             0 xinetd
16:54:35:[ 1888]     0  1888    22211       22   0       0             0 sendmail
16:54:35:[ 1897]    51  1897    20073        0   0       0             0 sendmail
16:54:35:[ 1925]     0  1925    29216       14   0       0             0 crond
16:54:35:[ 1940]     0  1940     5276        0   0       0             0 atd
16:54:35:[ 1967]     0  1967     1020        1   1       0             0 agetty
16:54:35:[ 1969]     0  1969     1016        1   1       0             0 mingetty
16:54:35:[ 1971]     0  1971     1016        1   1       0             0 mingetty
16:54:35:[ 1973]     0  1973     1016        1   1       0             0 mingetty
16:54:35:[ 1975]     0  1975     1016        1   1       0             0 mingetty
16:54:35:[ 1977]     0  1977     1016        1   0       0             0 mingetty
16:54:35:[ 1979]     0  1979     1016        1   0       0             0 mingetty
16:54:35:[ 2731]    38  2731     7690        1   1       0             0 ntpd
16:54:35:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
16:54:35:
16:54:35:Pid: 576, comm: mdt_out00_002 Not tainted 2.6.32-573.3.1.el6_lustre.gc3d2503.x86_64 #1
16:54:35:Call Trace:
16:54:35: [<ffffffff815384e4>] ? panic+0xa7/0x16f
16:54:35: [<ffffffff8112aa71>] ? dump_header+0x101/0x1b0
16:54:35: [<ffffffff8112ab9c>] ? check_panic_on_oom+0x7c/0x80
16:54:35: [<ffffffff8112b25b>] ? out_of_memory+0x1bb/0x3c0
16:54:35: [<ffffffff81137c9c>] ? __alloc_pages_nodemask+0x93c/0x950
16:54:35: [<ffffffff81161469>] ? __vmalloc_area_node+0xb9/0x190
16:54:35: [<ffffffffa082ad4e>] ? ptlrpc_grow_req_bufs+0x33e/0x7e0 [ptlrpc]
16:54:35: [<ffffffff8116133d>] ? __vmalloc_node+0xad/0x120
16:54:35: [<ffffffffa082ad4e>] ? ptlrpc_grow_req_bufs+0x33e/0x7e0 [ptlrpc]
16:54:35: [<ffffffff81161759>] ? vzalloc_node+0x29/0x30
16:54:35: [<ffffffffa082ad4e>] ? ptlrpc_grow_req_bufs+0x33e/0x7e0 [ptlrpc]
16:54:35: [<ffffffffa082db55>] ? ptlrpc_main+0x1265/0x1910 [ptlrpc]
16:54:35: [<ffffffffa082c8f0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
16:54:35: [<ffffffff810a101e>] ? kthread+0x9e/0xc0
16:54:35: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
16:54:35: [<ffffffff810a0f80>] ? kthread+0x0/0xc0
16:54:35: [<ffffffff8100c280>] ? child_rip+0x0/0x20

Generated at Sat Feb 10 02:06:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.