Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5920

obdfilter-survey test_1c: OST OOM

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.7.0
    • None
    • server: lustre-master build # 2733 RHEL6
      client: SLES11 SP3
    • 3
    • 16527

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/e6ec8538-6b45-11e4-88ff-5254006e85c2.

      The sub-test test_1c failed with the following error:

      test failed to respond and timed out
      

      ost console:

      11:04:49:Lustre: DEBUG MARKER: == obdfilter-survey test 1c: Object Storage Targets survey, big batch ================================ 10:50:09 (1415731809)
      11:04:49:Lustre: DEBUG MARKER: lctl dl | grep obdfilter
      11:04:49:Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d '@'
      11:04:49:Lustre: 28038:0:(client.c:1947:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1415731818/real 0]  req@ffff88002955dc00 x1484448291597232/t0(0) o400->MGC10.2.4.126@tcp@10.2.4.126@tcp:26/25 lens 224/224 e 0 to 1 dl 1415731825 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      11:04:49:Lustre: 28038:0:(client.c:1947:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
      11:04:49:LustreError: 166-1: MGC10.2.4.126@tcp: Connection to MGS (at 10.2.4.126@tcp) was lost; in progress operations using this service will fail
      11:04:49:Lustre: lustre-MDT0000-lwp-OST0000: Connection to lustre-MDT0000 (at 10.2.4.126@tcp) was lost; in progress operations using this service will wait for recovery to complete
      11:04:49:Lustre: Skipped 6 previous similar messages
      11:04:49:Lustre: lustre-OST0001: Client lustre-MDT0000-mdtlov_UUID (at 10.2.4.126@tcp) reconnecting
      11:04:49:Lustre: lustre-OST0000: deleting orphan objects from 0x0:657118 to 0x0:657185
      11:04:49:Lustre: lustre-OST0002: deleting orphan objects from 0x0:395657 to 0x0:395841
      11:04:49:Lustre: lustre-OST0003: deleting orphan objects from 0x0:401000 to 0x0:401185
      11:04:49:Lustre: lustre-OST0004: deleting orphan objects from 0x0:397673 to 0x0:397985
      11:04:49:Lustre: lustre-OST0005: deleting orphan objects from 0x0:410896 to 0x0:410961
      11:04:49:Lustre: lustre-OST0006: deleting orphan objects from 0x0:411496 to 0x0:411697
      11:04:49:Lustre: lustre-OST0001: deleting orphan objects from 0x0:398463 to 0x0:398529
      11:04:49:Lustre: Evicted from lustre-MDT0000_UUID (at 10.2.4.126@tcp) after server handle changed from 0x66a373f382e9d413 to 0x66a373f382e9d738
      11:04:49:LustreError: 167-0: lustre-MDT0000-lwp-OST0000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      11:04:49:LustreError: Skipped 6 previous similar messages
      11:04:49:Lustre: lustre-MDT0000-lwp-OST0000: Connection restored to lustre-MDT0000 (at 10.2.4.126@tcp)
      11:04:49:Lustre: Skipped 4 previous similar messages
      11:04:49:Lustre: Evicted from lustre-MDT0000_UUID (at 10.2.4.126@tcp) after server handle changed from 0x66a373f382e9d578 to 0x66a373f382e9d77e
      11:04:49:Lustre: Skipped 4 previous similar messages
      11:04:49:Lustre: Evicted from MGS (at 10.2.4.126@tcp) after server handle changed from 0x66a373f382e9d3e9 to 0x66a373f382e9d72a
      11:04:49:Lustre: Skipped 1 previous similar message
      11:04:49:LNet: Service thread pid 20233 completed after 56.04s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      11:04:49:LNet: Skipped 2 previous similar messages
      11:04:49:LNet: Service thread pid 20231 completed after 64.11s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      11:04:49:LNet: Service thread pid 20829 completed after 50.39s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      11:04:49:hrtimer: interrupt took 10987 ns
      11:04:49:Lustre: 21522:0:(service.c:1289:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-12s), not sending early reply. Consider increasing at_early_margin (5)?  req@ffff88003e2e6850 x1484449564378832/t0(0) o400->0afe64c6-c2ec-c4bb-669e-4e3b1921e1a7@10.2.4.120@tcp:0/0 lens 224/192 e 0 to 0 dl 1415732327 ref 1 fl Complete:H/0/0 rc 0/0
      11:04:49:lctl invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
      11:04:49:lctl cpuset=/ mems_allowed=0
      11:04:49:Pid: 28917, comm: lctl Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1
      11:04:49:Call Trace:
      11:04:49: [<ffffffff810d0791>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      11:04:49: [<ffffffff81122b60>] ? dump_header+0x90/0x1b0
      11:04:49: [<ffffffff81122cce>] ? check_panic_on_oom+0x4e/0x80
      11:04:49: [<ffffffff811233bb>] ? out_of_memory+0x1bb/0x3c0
      11:04:49: [<ffffffff8112fd3f>] ? __alloc_pages_nodemask+0x89f/0x8d0
      11:04:49: [<ffffffff81167dca>] ? alloc_pages_vma+0x9a/0x150
      11:04:49: [<ffffffff8114ae4d>] ? handle_pte_fault+0x73d/0xb00
      11:04:49: [<ffffffff811a6050>] ? iput+0x30/0x70
      11:04:49: [<ffffffff810aee5e>] ? futex_wake+0x10e/0x120
      11:04:49: [<ffffffff8114b43a>] ? handle_mm_fault+0x22a/0x300
      11:04:49: [<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480
      11:04:49: [<ffffffff8152f23e>] ? do_page_fault+0x3e/0xa0
      11:04:49: [<ffffffff8103f9d8>] ? pvclock_clocksource_read+0x58/0xd0
      11:04:49: [<ffffffff8103ea6c>] ? kvm_clock_read+0x1c/0x20
      11:04:49: [<ffffffff8103ea79>] ? kvm_clock_get_cycles+0x9/0x10
      11:04:49: [<ffffffff810a5e07>] ? getnstimeofday+0x57/0xe0
      11:04:49: [<ffffffff8152f23e>] ? do_page_fault+0x3e/0xa0
      11:04:49: [<ffffffff8152c5f5>] ? page_fault+0x25/0x30
      11:04:49:Mem-Info:
      11:04:49:Node 0 DMA per-cpu:
      11:04:49:CPU    0: hi:    0, btch:   1 usd:   0
      11:04:49:CPU    1: hi:    0, btch:   1 usd:   0
      11:04:49:Node 0 DMA32 per-cpu:
      11:04:49:CPU    0: hi:  186, btch:  31 usd:   0
      11:04:49:CPU    1: hi:  186, btch:  31 usd:  31
      11:04:49:active_anon:7601 inactive_anon:7666 isolated_anon:0
      11:04:49: active_file:129652 inactive_file:210141 isolated_file:2272
      11:04:49: unevictable:0 dirty:32 writeback:0 unstable:0
      11:04:49: free:13257 slab_reclaimable:6977 slab_unreclaimable:27495
      11:04:49: mapped:2403 shmem:120 pagetables:14111 bounce:0
      11:04:49:Node 0 DMA free:8324kB min:332kB low:412kB high:496kB active_anon:8kB inactive_anon:60kB active_file:876kB inactive_file:6032kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15348kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:24kB slab_unreclaimable:420kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:18848 all_unreclaimable? yes
      11:04:49:lowmem_reserve[]: 0 2004 2004 2004
      11:04:49:Node 0 DMA32 free:44988kB min:44720kB low:55900kB high:67080kB active_anon:30396kB inactive_anon:30604kB active_file:517732kB inactive_file:835592kB unevictable:0kB isolated(anon):0kB isolated(file):9600kB present:2052308kB mlocked:0kB dirty:128kB writeback:0kB mapped:9612kB shmem:480kB slab_reclaimable:27884kB slab_unreclaimable:107392kB kernel_stack:9272kB pagetables:56444kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      11:04:49:lowmem_reserve[]: 0 0 0 0
      11:04:49:Node 0 DMA: 19*4kB 17*8kB 9*16kB 3*32kB 1*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8324kB
      11:04:49:Node 0 DMA32: 849*4kB 149*8kB 228*16kB 256*32kB 215*64kB 55*128kB 5*256kB 2*512kB 1*1024kB 0*2048kB 1*4096kB = 44652kB
      11:04:49:223199 total pagecache pages
      11:04:49:780 pages in swap cache
      11:04:49:Swap cache stats: add 78073, delete 77293, find 59149/71101
      11:04:49:Free swap  = 4106500kB
      11:04:49:Total swap = 4128764kB
      11:04:49:524284 pages RAM
      11:04:49:43695 pages reserved
      11:04:49:304823 pages shared
      11:04:49:247073 pages non-shared
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: