Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12727

OSS OOM during failover

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.13.0
    • lustre-master-ib #286
    • 3
    • 9223372036854775807

    Description

      soak-

      [  540.771758] ^A4Lustre: soaked-OST000d: Will be in recovery for at least 2:30, or until 28 clients reconnect
      [  557.249150] ^A6Lustre: soaked-OST000d: Recovery over after 0:16, of 28 clients 28 recovered and 0 were evicted.
      [  557.377573] ^A6Lustre: soaked-OST000d: deleting orphan objects from 0x680000400:163752940 to 0x680000400:163755413
      [  557.449951] ^A6Lustre: soaked-OST000d: deleting orphan objects from 0x680000402:140748232 to 0x680000402:140758690
      [  557.508161] ^A6Lustre: soaked-OST000d: deleting orphan objects from 0x680000401:211295588 to 0x680000401:211298918
      [  557.519664] ^A6Lustre: soaked-OST000d: deleting orphan objects from 0x0:215937270 to 0x0:215938124
      [  568.981469] ^A4Lustre: Failing over soaked-OST0009
      [  570.884990] ^A4Lustre: server umount soaked-OST0009 complete
      [  585.668199] in:imjournal invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
      [  585.677142] in:imjournal cpuset=/ mems_allowed=0-1
      [  585.682498] CPU: 24 PID: 24262 Comm: in:imjournal Kdump: loaded Tainted: P           OE  ------------   3.10.0-957.21.3.el7_lustre.x86_64 #1
      [  585.696573] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      [  585.709100] Call Trace:
      [  585.711837]  [<ffffffff83363107>] dump_stack+0x19/0x1b
      [  585.717576]  [<ffffffff8335db2a>] dump_header+0x90/0x229
      [  585.723510]  [<ffffffff82d01292>] ? ktime_get_ts64+0x52/0xf0
      [  585.729836]  [<ffffffff82d584df>] ? delayacct_end+0x8f/0xb0
      [  585.736060]  [<ffffffff82dba834>] oom_kill_process+0x254/0x3d0
      [  585.742576]  [<ffffffff82dba2dd>] ? oom_unkillable_task+0xcd/0x120
      [  585.749478]  [<ffffffff82dba386>] ? find_lock_task_mm+0x56/0xc0
      [  585.756107]  [<ffffffff82dbb076>] out_of_memory+0x4b6/0x4f0
      [  585.762335]  [<ffffffff8335e62e>] __alloc_pages_slowpath+0x5d6/0x724
      [  585.769458]  [<ffffffff82dc1454>] __alloc_pages_nodemask+0x404/0x420
      [  585.776594]  [<ffffffff82e11795>] alloc_pages_vma+0xb5/0x200
      [  585.782921]  [<ffffffff82dff9e5>] __read_swap_cache_async+0x115/0x190
      [  585.790133]  [<ffffffff82dffa86>] read_swap_cache_async+0x26/0x60
      [  585.796946]  [<ffffffff82dffb6c>] swapin_readahead+0xac/0x110
      [  585.803365]  [<ffffffff82de9c62>] handle_pte_fault+0x812/0xd10
      [  585.809881]  [<ffffffff82ce035c>] ? update_curr+0x14c/0x1e0
      [  585.816106]  [<ffffffff82cdccbe>] ? account_entity_dequeue+0xae/0xd0
      [  585.823203]  [<ffffffff82ce084c>] ? dequeue_entity+0x11c/0x5e0
      [  585.829715]  [<ffffffff82dec27d>] handle_mm_fault+0x39d/0x9b0
      [  585.836131]  [<ffffffff82ce112e>] ? dequeue_task_fair+0x41e/0x660
      [  585.842928]  [<ffffffff83370603>] __do_page_fault+0x203/0x4f0
      [  585.849344]  [<ffffffff83370925>] do_page_fault+0x35/0x90
      [  585.855374]  [<ffffffff833680ce>] ? schedule_hrtimeout_range_clock+0xbe/0x150
      [  585.863348]  [<ffffffff8336c768>] page_fault+0x28/0x30
      [  585.869093]  [<ffffffff82e58e0e>] ? do_sys_poll+0x4fe/0x590
      [  585.875320]  [<ffffffff82e58de6>] ? do_sys_poll+0x4d6/0x590
      [  585.881546]  [<ffffffff82dd1a5f>] ? shmem_fault+0xdf/0x1f0
      [  585.887673]  [<ffffffff82e57530>] ? __pollwait+0xf0/0xf0
      [  585.893610]  [<ffffffff82df755c>] ? page_add_file_rmap+0x8c/0xc0
      [  585.900311]  [<ffffffff82db6abb>] ? unlock_page+0x2b/0x30
      [  585.906341]  [<ffffffff82de4e89>] ? do_read_fault.isra.61+0x139/0x1b0
      [  585.913539]  [<ffffffff82de9744>] ? handle_pte_fault+0x2f4/0xd10
      [  585.920248]  [<ffffffff82e54492>] ? user_path_at_empty+0x72/0xc0
      [  585.926957]  [<ffffffff82e3e82a>] ? __check_object_size+0x1ca/0x250
      [  585.933958]  [<ffffffff82f9572d>] ? list_del+0xd/0x30
      [  585.939600]  [<ffffffff82cc2a61>] ? remove_wait_queue+0x31/0x40
      [  585.946211]  [<ffffffff82e8c22f>] ? inotify_read+0x2ef/0x420
      [  585.952532]  [<ffffffff82d01292>] ? ktime_get_ts64+0x52/0xf0
      [  585.958854]  [<ffffffff82e59213>] SyS_ppoll+0x1d3/0x1f0
      [  585.964688]  [<ffffffff83375d15>] ? system_call_after_swapgs+0xa2/0x146
      [  585.972074]  [<ffffffff83375d21>] ? system_call_after_swapgs+0xae/0x146
      [  585.979462]  [<ffffffff83375ddb>] system_call_fastpath+0x22/0x27
      [  585.986171]  [<ffffffff83375d21>] ? system_call_after_swapgs+0xae/0x146
      [  585.993556] Mem-Info:
      [  585.996086] active_anon:271 inactive_anon:404 isolated_anon:0
      [  585.996086]  active_file:149 inactive_file:0 isolated_file:0
      [  585.996086]  unevictable:6763 dirty:0 writeback:0 unstable:0
      [  585.996086]  slab_reclaimable:10724 slab_unreclaimable:174704
      [  585.996086]  mapped:1588 shmem:22 pagetables:1820 bounce:0
      [  585.996086]  free:34101 free_pcp:0 free_cma:0
      [  586.032467] Node 0 DMA free:15324kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15920kB managed:15836kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      [  586.078658] lowmem_reserve[]: 0 2754 15791 15791
      [  586.083859] Node 0 DMA32 free:59904kB min:7780kB low:9724kB high:11668kB active_anon:824kB inactive_anon:1244kB active_file:0kB inactive_file:0kB unevictable:4076kB isolated(anon):0kB isolated(file):0kB present:3051628kB managed:2820172kB mlocked:4076kB dirty:0kB writeback:0kB mapped:408kB shmem:84kB slab_reclaimable:2692kB slab_unreclaimable:71512kB kernel_stack:1792kB pagetables:764kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:49514 all_unreclaimable? yes
      [  586.134218] lowmem_reserve[]: 0 0 13037 13037
      [  586.139149] Node 0 Normal free:27284kB min:36828kB low:46032kB high:55240kB active_anon:0kB inactive_anon:4kB active_file:172kB inactive_file:0kB unevictable:21484kB isolated(anon):0kB isolated(file):128kB present:13631488kB managed:13350636kB mlocked:21484kB dirty:0kB writeback:0kB mapped:4456kB shmem:0kB slab_reclaimable:19596kB slab_unreclaimable:342544kB kernel_stack:22144kB pagetables:3948kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:20401 all_unreclaimable? yes
      [  586.190481] lowmem_reserve[]: 0 0 0 0
      [  586.194625] Node 1 Normal free:34292kB min:45456kB low:56820kB high:68184kB active_anon:0kB inactive_anon:0kB active_file:384kB inactive_file:0kB unevictable:1492kB isolated(anon):0kB isolated(file):0kB present:16777216kB managed:16480320kB mlocked:1492kB dirty:0kB writeback:0kB mapped:1488kB shmem:0kB slab_reclaimable:20608kB slab_unreclaimable:284740kB kernel_stack:15616kB pagetables:2568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:13718 all_unreclaimable? no
      [  586.245470] lowmem_reserve[]: 0 0 0 0
      [  586.249615] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 0*1024kB 1*2048kB (M) 3*4096kB (M) = 15324kB
      [  586.265227] Node 0 DMA32: 696*4kB (UEM) 639*8kB (UEM) 370*16kB (UEM) 269*32kB (UEM) 90*64kB (UEM) 31*128kB (UM) 24*256kB (UM) 15*512kB (UM) 2*1024kB (UM) 2*2048kB (UM) 2*4096kB (U) = 60312kB
      [  586.284458] Node 0 Normal: 2214*4kB (UEM) 1463*8kB (UEM) 313*16kB (UM) 12*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 25952kB
      [  586.299720] Node 1 Normal: 931*4kB (UEM) 1127*8kB (UEM) 611*16kB (UEM) 100*32kB (UEM) 36*64kB (UM) 8*128kB (UM) 12*256kB (UM) 3*512kB (UM) 0*1024kB 0*2048kB 0*4096kB = 33652kB
      [  586.317475] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      [  586.327186] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      [  586.336607] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      [  586.346321] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      [  586.355742] 1829 total pagecache pages
      [  586.359930] 0 pages in swap cache
      [  586.363631] Swap cache stats: add 16693, delete 16693, find 46/62
      [  586.370434] Free swap  = 16183796kB
      [  586.374330] Total swap = 16253948kB
      [  586.378225] 8369063 pages RAM
      [  586.381536] 0 pages HighMem/MovableOnly
      [  586.385818] 202322 pages reserved
      [  586.389518] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
      [  586.398294] [ 7134]     0  7134     9769      222      25       83             0 systemd-journal
      [  586.408106] [ 7164]     0  7164    29157      231      27       80             0 lvmetad
      [  586.417142] [ 7195]     0  7195    11230      232      24      212         -1000 systemd-udevd
      [  586.426759] [ 7206]     0  7206  1572958     3672     133        0         -1000 multipathd
      [  586.436129] [23295]     0 23295    15511       88      31      155         -1000 auditd
      [  586.445065] [23324]     0 23324     5475      190      16      144             0 irqbalance
      [  586.454381] [23325]   999 23325   156119      270      64     1900             0 polkitd
      [  586.463418] [23327]    32 23327    18412      164      40      190             0 rpcbind
      [  586.472453] [23330]     0 23330    64337      330      79      333             0 sssd
      [  586.481198] [23335]    81 23335    17628      274      36      171          -900 dbus-daemon
      [  586.490621] [23345]     0 23345    69399      176      48      214             0 gssproxy
      [  586.499753] [23373]     0 23373   136908      304      87     1138             0 NetworkManager
      [  586.509466] [23374]     0 23374    41019      263      44      207             0 zed
      [  586.518113] [23382]     0 23382     1781       30       8       38             0 mcelog
      [  586.527053] [23383]   997 23383    29446      248      30      113             0 chronyd
      [  586.536089] [23385]     0 23385    32230      209      33      271             0 rpc.gssd
      [  586.545220] [23402]     0 23402    98257      333     135      642             0 sssd_be
      [  586.554262] [23438]     0 23438    66241      296      85      235             0 sssd_nss
      [  586.563394] [23439]     0 23439    61158      288      74      229             0 sssd_pam
      [  586.572525] [23440]     0 23440    58985      273      71      213             0 sssd_ssh
      [  586.581660] [23441]     0 23441    69110      279      87      318             0 sssd_pac
      [  586.590794] [23455]     0 23455     6594      230      19       83             0 systemd-logind
      [  586.600507] [23590]     0 23590    26839      264      55      501             0 dhclient
      [  586.609639] [24250]     0 24250   143518      314      98     2832             0 tuned
      [  586.618480] [24253]     0 24253    54103      275      40      617             0 rsyslogd
      [  586.627616] [24254]     0 24254    28189      287      56      258         -1000 sshd
      [  586.636358] [24256]   998 24256    24222      182      22      129             0 munged
      [  586.645289] [24257]    29 24257    12760      174      28      256             0 rpc.statd
      [  586.654517] [24271]     0 24271     6791      150      18       64             0 xinetd
      [  586.663459] [24554]     0 24554    22907      175      44      262             0 master
      [  586.672399] [24560]    89 24560    25474      215      45      255             0 pickup
      [  586.681337] [24561]    89 24561    25491      211      45      256             0 qmgr
      [  586.690082] [24590]     0 24590   157973      273      81      424             0 automount
      [  586.699311] [24593]     0 24593     6476      168      18       52             0 atd
      [  586.707960] [24596]     0 24596    31571      205      20      154             0 crond
      [  586.716804] [24653]     0 24653    27523      167      11       32             0 agetty
      [  586.725743] [24654]     0 24654    27523      161      12       32             0 agetty
      [  586.734851] Out of memory: Kill process 24250 (tuned) score 0 or sacrifice child
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            sarah Sarah Liu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: