Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6257

Interop 2.6.0<->2.7 replay-vbr test_7b: MDS OOM

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • None
    • Lustre 2.7.0
    • None
    • server: 2.6.0
      client: lustre-master build # 2856
    • 3
    • 17540

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/1dc4ae10-b7fd-11e4-a8e9-5254006e85c2.

      The sub-test test_7b failed with the following error:

      test failed to respond and timed out
      
      16:18:19:LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
      16:18:19:LustreError: Skipped 5 previous similar messages
      16:18:19:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
      16:18:19:Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2>/dev/null
      16:18:20:Lustre: lustre-MDT0000: Denying connection for new client lustre-MDT0000-lwp-OST0001_UUID (at 10.2.4.208@tcp), waiting for all 4 known clients (0 recovered, 3 in progress, and 0 evicted) to recover in 0:56
      16:18:20:Lustre: Skipped 561 previous similar messages
      16:18:20:ntpd invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
      16:18:20:ntpd cpuset=/ mems_allowed=0
      16:18:20:Pid: 2199, comm: ntpd Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
      16:18:20:Call Trace:
      16:18:20: [<ffffffff810d03d1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      16:18:20: [<ffffffff81122780>] ? dump_header+0x90/0x1b0
      16:18:20: [<ffffffff811228ee>] ? check_panic_on_oom+0x4e/0x80
      16:18:20: [<ffffffff81122fdb>] ? out_of_memory+0x1bb/0x3c0
      16:18:20: [<ffffffff8112f95f>] ? __alloc_pages_nodemask+0x89f/0x8d0
      16:18:20: [<ffffffff8116795a>] ? alloc_pages_vma+0x9a/0x150
      16:18:20: [<ffffffff8115b632>] ? read_swap_cache_async+0xf2/0x160
      16:18:20: [<ffffffff8115c159>] ? valid_swaphandles+0x69/0x150
      16:18:20: [<ffffffff8115b727>] ? swapin_readahead+0x87/0xc0
      16:18:20: [<ffffffff8114a9fd>] ? handle_pte_fault+0x6dd/0xb00
      16:18:20: [<ffffffff812272c6>] ? security_task_to_inode+0x16/0x20
      16:18:20: [<ffffffff8114b04a>] ? handle_mm_fault+0x22a/0x300
      16:18:20: [<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480
      16:18:20: [<ffffffff811a07b0>] ? pollwake+0x0/0x60
      16:18:20: [<ffffffff811a07b0>] ? pollwake+0x0/0x60
      16:18:20: [<ffffffff811a07b0>] ? pollwake+0x0/0x60
      16:18:20: [<ffffffff8152e7ee>] ? do_page_fault+0x3e/0xa0
      16:18:20: [<ffffffff8152bba5>] ? page_fault+0x25/0x30
      16:18:20: [<ffffffff8128e1e6>] ? copy_user_generic_unrolled+0x86/0xb0
      16:18:20: [<ffffffff810129de>] ? copy_user_generic+0xe/0x20
      16:18:20: [<ffffffff811a04c9>] ? set_fd_set+0x49/0x60
      16:18:20: [<ffffffff811a198c>] ? core_sys_select+0x1bc/0x2c0
      16:18:20: [<ffffffff8103ea6c>] ? kvm_clock_read+0x1c/0x20
      16:18:20: [<ffffffff8103ea79>] ? kvm_clock_get_cycles+0x9/0x10
      16:18:20: [<ffffffff8109530f>] ? queue_work+0x1f/0x30
      16:18:20: [<ffffffff8103f9d8>] ? pvclock_clocksource_read+0x58/0xd0
      16:18:20: [<ffffffff8103ea6c>] ? kvm_clock_read+0x1c/0x20
      16:18:20: [<ffffffff8103ea79>] ? kvm_clock_get_cycles+0x9/0x10
      16:18:20: [<ffffffff810a6d21>] ? ktime_get_ts+0xb1/0xf0
      16:18:20: [<ffffffff811a1ce7>] ? sys_select+0x47/0x110
      16:18:20: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
      16:18:20:Mem-Info:
      16:18:20:Node 0 DMA per-cpu:
      16:18:20:CPU    0: hi:    0, btch:   1 usd:   0
      16:18:20:CPU    1: hi:    0, btch:   1 usd:   0
      16:18:20:Node 0 DMA32 per-cpu:
      16:18:20:CPU    0: hi:  186, btch:  31 usd:  86
      16:18:20:CPU    1: hi:  186, btch:  31 usd: 178
      16:18:20:active_anon:0 inactive_anon:0 isolated_anon:0
      16:18:20: active_file:1044 inactive_file:916 isolated_file:0
      16:18:20: unevictable:0 dirty:0 writeback:0 unstable:0
      16:18:20: free:13246 slab_reclaimable:2102 slab_unreclaimable:436918
      16:18:20: mapped:1 shmem:0 pagetables:1010 bounce:0
      16:18:20:Node 0 DMA free:8336kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:7408kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      16:18:20:lowmem_reserve[]: 0 2004 2004 2004
      16:18:20:Node 0 DMA32 free:44648kB min:44720kB low:55900kB high:67080kB active_anon:0kB inactive_anon:0kB active_file:4176kB inactive_file:3664kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052308kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:8408kB slab_unreclaimable:1740264kB kernel_stack:1736kB pagetables:4040kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:376672 all_unreclaimable? yes
      16:18:20:lowmem_reserve[]: 0 0 0 0
      16:18:20:Node 0 DMA: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 1*4096kB = 8336kB
      16:18:20:Node 0 DMA32: 778*4kB 612*8kB 380*16kB 169*32kB 69*64kB 32*128kB 21*256kB 8*512kB 3*1024kB 0*2048kB 1*4096kB = 44648kB
      16:18:20:116 total pagecache pages
      16:18:20:0 pages in swap cache
      16:18:20:Swap cache stats: add 6362, delete 6362, find 3766/3856
      16:18:20:Free swap  = 4107060kB
      16:18:20:Total swap = 4128760kB
      16:18:20:524284 pages RAM
      16:18:20:43694 pages reserved
      16:18:20:151 pages shared
      16:18:20:462141 pages non-shared
      16:18:20:[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
      16:18:20:[  366]     0   366     2663        0   1     -17         -1000 udevd
      16:18:20:[ 1014]     0  1014     2280        1   0       0             0 dhclient
      16:18:20:[ 1066]     0  1066    23300        1   1     -17         -1000 auditd
      16:18:20:[ 1082]     0  1082    63903        1   1       0             0 rsyslogd
      16:18:20:[ 1111]     0  1111     2705        1   1       0             0 irqbalance
      16:18:20:[ 1125]    32  1125     4744        1   1       0             0 rpcbind
      16:18:20:[ 1137]     0  1137    49913        1   0       0             0 sssd
      16:18:20:[ 1138]     0  1138    64294        1   1       0             0 sssd_be
      16:18:20:[ 1140]     0  1140    50479        1   1       0             0 sssd_nss
      16:18:20:[ 1141]     0  1141    48029        1   0       0             0 sssd_pam
      16:18:20:[ 1142]     0  1142    47518        1   0       0             0 sssd_ssh
      16:18:20:[ 1143]     0  1143    52608        1   0       0             0 sssd_pac
      16:18:20:[ 1160]    29  1160     5837        1   1       0             0 rpc.statd
      16:18:20:[ 1274]    81  1274     5871        1   0       0             0 dbus-daemon
      16:18:20:[ 1312]     0  1312     1020        0   1       0             0 acpid
      16:18:20:[ 1321]    68  1321     9920        1   0       0             0 hald
      16:18:20:[ 1322]     0  1322     5081        1   0       0             0 hald-runner
      16:18:20:[ 1354]     0  1354     5611        1   1       0             0 hald-addon-inpu
      16:18:20:[ 1364]    68  1364     4483        1   1       0             0 hald-addon-acpi
      16:18:20:[ 1384]     0  1384   168326        1   1       0             0 automount
      16:18:20:[ 1430]     0  1430    26827        0   0       0             0 rpc.rquotad
      16:18:20:[ 1434]     0  1434     5414        0   0       0             0 rpc.mountd
      16:18:20:[ 1470]     0  1470     5773        1   0       0             0 rpc.idmapd
      16:18:20:[ 1501]   496  1501    56785        1   0       0             0 munged
      16:18:20:[ 1516]     0  1516    16656        0   0     -17         -1000 sshd
      16:18:20:[ 1524]     0  1524     5545        1   0       0             0 xinetd
      16:18:20:[ 1608]     0  1608    20846        1   1       0             0 master
      16:18:20:[ 1628]    89  1628    20909        1   1       0             0 qmgr
      16:18:20:[ 1631]     0  1631    29325        1   1       0             0 crond
      16:18:20:[ 1642]     0  1642     5385        0   0       0             0 atd
      16:18:20:[ 1656]     0  1656    15585        1   0       0             0 certmonger
      16:18:20:[ 1669]     0  1669     1020        1   1       0             0 agetty
      16:18:20:[ 1671]     0  1671     1016        1   0       0             0 mingetty
      16:18:20:[ 1673]     0  1673     1016        1   1       0             0 mingetty
      16:18:20:[ 1675]     0  1675     1016        1   0       0             0 mingetty
      16:18:20:[ 1677]     0  1677     1016        1   1       0             0 mingetty
      16:18:20:[ 1679]     0  1679     2664        0   1     -17         -1000 udevd
      16:18:20:[ 1680]     0  1680     2662        0   1     -17         -1000 udevd
      16:18:20:[ 1681]     0  1681     1016        1   0       0             0 mingetty
      16:18:20:[ 1683]     0  1683     1016        1   0       0             0 mingetty
      16:18:20:[ 2199]    38  2199     8205        1   0       0             0 ntpd
      16:18:20:[13001]    89 13001    20866        1   1       0             0 pickup
      16:18:20:Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
      16:18:20:
      16:18:20:Pid: 2199, comm: ntpd Not tainted 2.6.32-431.20.3.el6_lustre.x86_64 #1
      16:18:20:Call Trace:
      16:18:20: [<ffffffff8152859c>] ? panic+0xa7/0x16f
      16:18:20: [<ffffffff811227f1>] ? dump_header+0x101/0x1b0
      16:18:20: [<ffffffff8112291c>] ? check_panic_on_oom+0x7c/0x80
      16:18:20: [<ffffffff81122fdb>] ? out_of_memory+0x1bb/0x3c0
      16:18:20: [<ffffffff8112f95f>] ? __alloc_pages_nodemask+0x89f/0x8d0
      16:18:20: [<ffffffff8116795a>] ? alloc_pages_vma+0x9a/0x150
      16:18:20: [<ffffffff8115b632>] ? read_swap_cache_async+0xf2/0x160
      16:18:20: [<ffffffff8115c159>] ? valid_swaphandles+0x69/0x150
      16:18:20: [<ffffffff8115b727>] ? swapin_readahead+0x87/0xc0
      16:18:20: [<ffffffff8114a9fd>] ? handle_pte_fault+0x6dd/0xb00
      16:18:20: [<ffffffff812272c6>] ? security_task_to_inode+0x16/0x20
      16:18:20: [<ffffffff8114b04a>] ? handle_mm_fault+0x22a/0x300
      16:18:20: [<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480
      16:18:20: [<ffffffff811a07b0>] ? pollwake+0x0/0x60
      16:18:20: [<ffffffff811a07b0>] ? pollwake+0x0/0x60
      16:18:20: [<ffffffff811a07b0>] ? pollwake+0x0/0x60
      16:18:20: [<ffffffff8152e7ee>] ? do_page_fault+0x3e/0xa0
      16:18:20: [<ffffffff8152bba5>] ? page_fault+0x25/0x30
      16:18:20: [<ffffffff8128e1e6>] ? copy_user_generic_unrolled+0x86/0xb0
      16:18:20: [<ffffffff810129de>] ? copy_user_generic+0xe/0x20
      16:18:20: [<ffffffff811a04c9>] ? set_fd_set+0x49/0x60
      16:18:20: [<ffffffff811a198c>] ? core_sys_select+0x1bc/0x2c0
      16:18:20: [<ffffffff8103ea6c>] ? kvm_clock_read+0x1c/0x20
      16:18:20: [<ffffffff8103ea79>] ? kvm_clock_get_cycles+0x9/0x10
      16:18:20: [<ffffffff8109530f>] ? queue_work+0x1f/0x30
      16:18:20: [<ffffffff8103f9d8>] ? pvclock_clocksource_read+0x58/0xd0
      16:18:20: [<ffffffff8103ea6c>] ? kvm_clock_read+0x1c/0x20
      16:18:20: [<ffffffff8103ea79>] ? kvm_clock_get_cycles+0x9/0x10
      16:18:20: [<ffffffff810a6d21>] ? ktime_get_ts+0xb1/0xf0
      16:18:20: [<ffffffff811a1ce7>] ? sys_select+0x47/0x110
      16:18:20: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: