Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15442

e2freefrag can't complete because of OOM

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      1 x 300TB OST and 85% of filesystem is filled up.
      e2freefrag is not able to finish because of OOM below.

      # rpm -qa | grep e2fsprogs
      e2fsprogs-1.46.2.wc3-0.el7.x86_64
      e2fsprogs-libs-1.46.2.wc3-0.el7.x86_64
      e2fsprogs-devel-1.46.2.wc3-0.el7.x86_64
      
      # df -t lustre
      Filesystem                              1K-blocks         Used   Available Use% Mounted on
      /dev/sda                             313826717284 262586960268 48064149188  85% /lustre/ost0000
      
      # e2freefrag /dev/sda
      
      Jan 12 12:13:52 es7990e1-vm1 kernel: Call Trace:
      Jan 12 12:13:52 es7990e1-vm1 kernel: [<ffffffffacf835a9>] dump_stack+0x19/0x1b
      Jan 12 12:13:52 es7990e1-vm1 kernel: [<ffffffffacf7e648>] dump_header+0x90/0x229
      Jan 12 12:13:52 es7990e1-vm1 kernel: [<ffffffffac906492>] ? ktime_get_ts64+0x52/0xf0
      Jan 12 12:13:52 es7990e1-vm1 kernel: [<ffffffffac95db1f>] ? delayacct_end+0x8f/0xb0
      Jan 12 12:13:52 es7990e1-vm1 kernel: [<ffffffffac9c204d>] oom_kill_process+0x2cd/0x490
      Jan 12 12:13:52 es7990e1-vm1 kernel: [<ffffffffac9c1a3d>] ? oom_unkillable_task+0xcd/0x120
      Jan 12 12:13:52 es7990e1-vm1 kernel: [<ffffffffac9c273a>] out_of_memory+0x31a/0x500
      Jan 12 12:13:52 es7990e1-vm1 kernel: [<ffffffffac9c9354>] __alloc_pages_nodemask+0xad4/0xbe0
      Jan 12 12:13:52 es7990e1-vm1 kernel: [<ffffffffaca1c739>] alloc_pages_vma+0xa9/0x200
      Jan 12 12:13:52 es7990e1-vm1 kernel: [<ffffffffac9f6337>] handle_mm_fault+0xcb7/0xfb0
      Jan 12 12:13:52 es7990e1-vm1 kernel: [<ffffffffacf90653>] __do_page_fault+0x213/0x500
      Jan 12 12:13:52 es7990e1-vm1 kernel: [<ffffffffacf90a26>] trace_do_page_fault+0x56/0x150
      Jan 12 12:13:52 es7990e1-vm1 kernel: [<ffffffffacf8ffa2>] do_async_page_fault+0x22/0xf0
      Jan 12 12:13:52 es7990e1-vm1 kernel: [<ffffffffacf8c7a8>] async_page_fault+0x28/0x30
      Jan 12 12:13:52 es7990e1-vm1 kernel: Mem-Info:
      Jan 12 12:13:52 es7990e1-vm1 kernel: active_anon:32965316 inactive_anon:953248 isolated_anon:0#012 active_file:19299 inactive_file:18944 isolated_file:0#012 unevictable:0 dirty:0 writeback:2 unstable:0#012 slab_reclaimable:122379 slab_unreclaimable:74262#012 mapped:8092 shmem:8083 pagetables:69898 bounce:0#012 free:2249818 free_pcp:3985 free_cma:0
      Jan 12 12:13:52 es7990e1-vm1 kernel: Node 0 DMA free:15892kB min:868kB low:1084kB high:1300kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      Jan 12 12:13:52 es7990e1-vm1 kernel: lowmem_reserve[]: 0 943 150076 150076
      Jan 12 12:13:52 es7990e1-vm1 kernel: Node 0 DMA32 free:648832kB min:52748kB low:65932kB high:79120kB active_anon:59992kB inactive_anon:47604kB active_file:1524kB inactive_file:1020kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080608kB managed:966464kB mlocked:0kB dirty:0kB writeback:0kB mapped:252kB shmem:252kB slab_reclaimable:6616kB slab_unreclaimable:5092kB kernel_stack:384kB pagetables:236kB unstable:0kB bounce:0kB free_pcp:7136kB local_pcp:264kB free_cma:0kB writeback_tmp:0kB pages_scanned:5844902 all_unreclaimable? yes
      Jan 12 12:13:52 es7990e1-vm1 kernel: lowmem_reserve[]: 0 0 149132 149132
      Jan 12 12:13:52 es7990e1-vm1 kernel: Node 0 Normal free:8334548kB min:8334988kB low:10418732kB high:12502480kB active_anon:131801272kB inactive_anon:3765388kB active_file:75672kB inactive_file:74756kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:155189248kB managed:152711608kB mlocked:0kB dirty:0kB writeback:8kB mapped:32116kB shmem:32080kB slab_reclaimable:482900kB slab_unreclaimable:291940kB kernel_stack:19024kB pagetables:279356kB unstable:0kB bounce:0kB free_pcp:8804kB local_pcp:264kB free_cma:0kB writeback_tmp:0kB pages_scanned:261380 all_unreclaimable? yes
      Jan 12 12:13:52 es7990e1-vm1 kernel: lowmem_reserve[]: 0 0 0 0
      Jan 12 12:13:52 es7990e1-vm1 kernel: Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
      Jan 12 12:13:53 es7990e1-vm1 kernel: Node 0 DMA32: 247*4kB (UE) 268*8kB (UEM) 209*16kB (UE) 161*32kB (UE) 83*64kB (UEM) 10*128kB (UEM) 94*256kB (UEM) 88*512kB (UE) 9*1024kB (EM) 39*2048kB (UEM) 115*4096kB (UEM) = 647468kB
      Jan 12 12:13:53 es7990e1-vm1 kernel: Node 0 Normal: 8198*4kB (UEM) 8367*8kB (UEM) 6746*16kB (UE) 4778*32kB (UEM) 3271*64kB (UEM) 1641*128kB (UE) 694*256kB (UEM) 274*512kB (UEM) 146*1024kB (UEM) 110*2048kB (UE) 1675*4096kB (U) = 8333488kB
      Jan 12 12:13:53 es7990e1-vm1 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      Jan 12 12:13:53 es7990e1-vm1 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      Jan 12 12:13:53 es7990e1-vm1 kernel: 46685 total pagecache pages
      Jan 12 12:13:53 es7990e1-vm1 kernel: 227 pages in swap cache
      Jan 12 12:13:53 es7990e1-vm1 kernel: Swap cache stats: add 2739967, delete 2739738, find 67504/71273
      Jan 12 12:13:53 es7990e1-vm1 kernel: Free swap  = 0kB
      Jan 12 12:13:53 es7990e1-vm1 kernel: Total swap = 5472252kB
      Jan 12 12:13:53 es7990e1-vm1 kernel: 39321462 pages RAM
      Jan 12 12:13:53 es7990e1-vm1 kernel: 0 pages HighMem/MovableOnly
      Jan 12 12:13:53 es7990e1-vm1 kernel: 897967 pages reserved
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
      Jan 12 12:13:53 es7990e1-vm1 kernel: [  807]     0   807    35383     9157      70       31             0 systemd-journal
      Jan 12 12:13:53 es7990e1-vm1 kernel: [  838]     0   838    11412        5      23      134         -1000 systemd-udevd
      Jan 12 12:13:53 es7990e1-vm1 kernel: [  973]     0   973   152045        0      38      208             0 lvmetad
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 1737]     0  1737    13883       22      27       89         -1000 auditd
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 1766]     0  1766     6596       20      18       54             0 systemd-logind
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 1770]     0  1770     5444       50      16       65             0 irqbalance
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 1775]     0  1775    22652        0      47      224             0 rngd
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 1790]   998  1790     2145        7      10       30             0 lsmd
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 1800]    32  1800    17314       14      38      129             0 rpcbind
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 1806]    81  1806    15046       51      34       82          -900 dbus-daemon
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 1919]     0  1919    50357       11      39      123             0 gssproxy
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 1970]     0  1970    13220        1      32      205             0 smartd
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 2899]     0  2899    25736        1      48      517             0 dhclient
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 3184]     0  3184     6261        0      17       58             0 xinetd
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 3193]     0  3193    76950     5874      80      348             0 rsyslogd
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 3198]     0  3198    28235        1      57      257         -1000 sshd
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 3235]     0  3235     6477        0      18       53             0 atd
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 3247]     0  3247    57127        0      40      188             0 sharpd
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 3277]     0  3277    27551        1      10       33             0 agetty
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 3279]     0  3279    27551        1      13       33             0 agetty
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 3472]     0  3472    31596       20      20      135             0 crond
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 3473]    38  3473     6954        1      17      150             0 ntpd
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 6542]   999  6542   153604        0      63      883             0 polkitd
      Jan 12 12:13:53 es7990e1-vm1 kernel: [ 6555]    29  6555    16802        0      38      285             0 rpc.statd
      Jan 12 12:13:53 es7990e1-vm1 kernel: [13484]     0 13484    39204       40      77      303             0 sshd
      Jan 12 12:13:53 es7990e1-vm1 kernel: [13486]     0 13486    29271        1      13      506             0 bash
      Jan 12 12:13:53 es7990e1-vm1 kernel: [13770]     0 13770    32001        1      18      144             0 screen
      Jan 12 12:13:53 es7990e1-vm1 kernel: [13771]     0 13771    29273        1      14      489             0 bash
      Jan 12 12:13:53 es7990e1-vm1 kernel: [13950]     0 13950 35405515 33909205   68922  1355850             0 e2freefrag
      Jan 12 12:13:53 es7990e1-vm1 kernel: [13951]     0 13951    27013        0       9       25             0 tee
      Jan 12 12:13:53 es7990e1-vm1 kernel: [13984]     0 13984    40796      372      36       77             0 top
      Jan 12 12:13:53 es7990e1-vm1 kernel: Out of memory: Kill process 13950 (e2freefrag) score 861 or sacrifice child
      Jan 12 12:13:53 es7990e1-vm1 kernel: Killed process 13950 (e2freefrag), UID 0, total-vm:141622060kB, anon-rss:135636820kB, file-rss:0kB, shmem-rss:0kB
      Jan 12 12:13:53 es7990e1-vm1 kernel: systemd-journal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
      Jan 12 12:13:53 es7990e1-vm1 kernel: systemd-journal cpuset=/ mems_allowed=0
      Jan 12 12:13:53 es7990e1-vm1 kernel: CPU: 4 PID: 807 Comm: systemd-journal Kdump: loaded Tainted: G           OE  ------------ T 3.10.0-1160.31.1.el7_lustre.ddn15.x86_64 #
      

      vm.min_free_kbytes = 8388608 didn't help in this case.

      Attachments

        Activity

          People

            wc-triage WC Triage
            sihara Shuichi Ihara
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: