Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5773

obdfilter-survey test 1c: oom occurred on OSS

Details

    • 3
    • 16206

    Description

      While running obdfilter-survey test 1c, oom failure occurred on OSS:

      21:17:56:Lustre: DEBUG MARKER: == obdfilter-survey test 1c: Object Storage Targets survey, big batch == 02:50:56 (1412823056)
      21:17:56:Lustre: DEBUG MARKER: lctl dl | grep obdfilter
      21:17:56:Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d '@'
      21:17:56:Lustre: Echo OBD driver; http://www.lustre.org/
      21:17:56:hrtimer: interrupt took 7516 ns
      21:17:56:lctl invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
      21:17:56:lctl cpuset=/ mems_allowed=0
      21:17:56:Pid: 19467, comm: lctl Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1
      21:17:56:Call Trace:
      21:17:56: [<ffffffff810d07b1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      21:17:56: [<ffffffff81122b80>] ? dump_header+0x90/0x1b0
      21:17:56: [<ffffffff81122cee>] ? check_panic_on_oom+0x4e/0x80
      21:17:56: [<ffffffff811233db>] ? out_of_memory+0x1bb/0x3c0
      21:17:56: [<ffffffff8112fd5f>] ? __alloc_pages_nodemask+0x89f/0x8d0
      21:17:56: [<ffffffff81167dea>] ? alloc_pages_vma+0x9a/0x150
      21:17:56: [<ffffffff811499dd>] ? do_wp_page+0xfd/0x920
      21:17:56: [<ffffffff8133e4f5>] ? misc_open+0x1d5/0x330
      21:17:56: [<ffffffff8114a9fd>] ? handle_pte_fault+0x2cd/0xb00
      21:17:56: [<ffffffff8118d495>] ? chrdev_open+0x125/0x230
      21:17:56: [<ffffffff811ab840>] ? mntput_no_expire+0x30/0x110
      21:17:56: [<ffffffff8118d370>] ? chrdev_open+0x0/0x230
      21:17:56: [<ffffffff811863bf>] ? __dentry_open+0x23f/0x360
      21:17:56: [<ffffffff812284ef>] ? security_inode_permission+0x1f/0x30
      21:17:56: [<ffffffff8114b45a>] ? handle_mm_fault+0x22a/0x300
      21:17:56: [<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480
      21:17:56: [<ffffffff8152f25e>] ? do_page_fault+0x3e/0xa0
      21:17:56: [<ffffffff8152f25e>] ? do_page_fault+0x3e/0xa0
      21:17:56: [<ffffffff8152c615>] ? page_fault+0x25/0x30
      

      Maloo report: https://testing.hpdd.intel.com/test_sets/973e0216-4fcd-11e4-8e65-5254006e85c2

      Attachments

        Issue Links

          Activity

            [LU-5773] obdfilter-survey test 1c: oom occurred on OSS

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13078/
            Subject: LU-5773 test: reduce thread count
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d1a717ed189a1245af1f96ecb701cd869956ef75

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13078/ Subject: LU-5773 test: reduce thread count Project: fs/lustre-release Branch: master Current Patch Set: Commit: d1a717ed189a1245af1f96ecb701cd869956ef75

            Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/13101
            Subject: LU-5773 test: reduce thread count
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set: 1
            Commit: 113e635c1b68afa7698ec3b894647526ce2fef79

            gerrit Gerrit Updater added a comment - Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/13101 Subject: LU-5773 test: reduce thread count Project: fs/lustre-release Branch: b2_5 Current Patch Set: 1 Commit: 113e635c1b68afa7698ec3b894647526ce2fef79
            niu Niu Yawei (Inactive) added a comment - http://review.whamcloud.com/13078

            Ok, I'm going to cook a patch soon.

            niu Niu Yawei (Inactive) added a comment - Ok, I'm going to cook a patch soon.

            Niu or Yu Jian, could you please look into a patch to change obdfilter-survey to reduce the threads count when running in a low-memory VM, so it doesn't hit this OOM? We still want to run this test during autotest to make sure that the test script doesn't break, but it just needs to run basic functionality/stress tests since the performance numbers from a VM are useless.

            adilger Andreas Dilger added a comment - Niu or Yu Jian, could you please look into a patch to change obdfilter-survey to reduce the threads count when running in a low-memory VM, so it doesn't hit this OOM? We still want to run this test during autotest to make sure that the test script doesn't break, but it just needs to run basic functionality/stress tests since the performance numbers from a VM are useless.

            Do we still have real machine which has larger memory in our auto-test system? I presume such failure wouldn't occur on that system. Probably we should reduce the OST/thread count for the test_1c to make it runable on the 2G mem VMs?

            niu Niu Yawei (Inactive) added a comment - Do we still have real machine which has larger memory in our auto-test system? I presume such failure wouldn't occur on that system. Probably we should reduce the OST/thread count for the test_1c to make it runable on the 2G mem VMs?

            Jodi, it seems to me that patch isn't related to this failure.

            niu Niu Yawei (Inactive) added a comment - Jodi, it seems to me that patch isn't related to this failure.

            Do we need to back port http://review.whamcloud.com/#/c/11971/ to other branches?

            jlevi Jodi Levi (Inactive) added a comment - Do we need to back port http://review.whamcloud.com/#/c/11971/ to other branches?
            yujian Jian Yu added a comment - More instance on Lustre b2_5 branch: https://testing.hpdd.intel.com/test_sets/dc5dabda-8074-11e4-a434-5254006e85c2

            Looks dup of LU-3366. Perhaps the vm (OSS) can't afford such test (7 OSTs, 128 brw threads for each OST)?

            niu Niu Yawei (Inactive) added a comment - Looks dup of LU-3366 . Perhaps the vm (OSS) can't afford such test (7 OSTs, 128 brw threads for each OST)?

            People

              niu Niu Yawei (Inactive)
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: