Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5773

obdfilter-survey test 1c: oom occurred on OSS

Details

    • 3
    • 16206

    Description

      While running obdfilter-survey test 1c, oom failure occurred on OSS:

      21:17:56:Lustre: DEBUG MARKER: == obdfilter-survey test 1c: Object Storage Targets survey, big batch == 02:50:56 (1412823056)
      21:17:56:Lustre: DEBUG MARKER: lctl dl | grep obdfilter
      21:17:56:Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d '@'
      21:17:56:Lustre: Echo OBD driver; http://www.lustre.org/
      21:17:56:hrtimer: interrupt took 7516 ns
      21:17:56:lctl invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
      21:17:56:lctl cpuset=/ mems_allowed=0
      21:17:56:Pid: 19467, comm: lctl Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1
      21:17:56:Call Trace:
      21:17:56: [<ffffffff810d07b1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
      21:17:56: [<ffffffff81122b80>] ? dump_header+0x90/0x1b0
      21:17:56: [<ffffffff81122cee>] ? check_panic_on_oom+0x4e/0x80
      21:17:56: [<ffffffff811233db>] ? out_of_memory+0x1bb/0x3c0
      21:17:56: [<ffffffff8112fd5f>] ? __alloc_pages_nodemask+0x89f/0x8d0
      21:17:56: [<ffffffff81167dea>] ? alloc_pages_vma+0x9a/0x150
      21:17:56: [<ffffffff811499dd>] ? do_wp_page+0xfd/0x920
      21:17:56: [<ffffffff8133e4f5>] ? misc_open+0x1d5/0x330
      21:17:56: [<ffffffff8114a9fd>] ? handle_pte_fault+0x2cd/0xb00
      21:17:56: [<ffffffff8118d495>] ? chrdev_open+0x125/0x230
      21:17:56: [<ffffffff811ab840>] ? mntput_no_expire+0x30/0x110
      21:17:56: [<ffffffff8118d370>] ? chrdev_open+0x0/0x230
      21:17:56: [<ffffffff811863bf>] ? __dentry_open+0x23f/0x360
      21:17:56: [<ffffffff812284ef>] ? security_inode_permission+0x1f/0x30
      21:17:56: [<ffffffff8114b45a>] ? handle_mm_fault+0x22a/0x300
      21:17:56: [<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480
      21:17:56: [<ffffffff8152f25e>] ? do_page_fault+0x3e/0xa0
      21:17:56: [<ffffffff8152f25e>] ? do_page_fault+0x3e/0xa0
      21:17:56: [<ffffffff8152c615>] ? page_fault+0x25/0x30
      

      Maloo report: https://testing.hpdd.intel.com/test_sets/973e0216-4fcd-11e4-8e65-5254006e85c2

      Attachments

        Issue Links

          Activity

            [LU-5773] obdfilter-survey test 1c: oom occurred on OSS

            Do we still have real machine which has larger memory in our auto-test system? I presume such failure wouldn't occur on that system. Probably we should reduce the OST/thread count for the test_1c to make it runable on the 2G mem VMs?

            niu Niu Yawei (Inactive) added a comment - Do we still have real machine which has larger memory in our auto-test system? I presume such failure wouldn't occur on that system. Probably we should reduce the OST/thread count for the test_1c to make it runable on the 2G mem VMs?

            Jodi, it seems to me that patch isn't related to this failure.

            niu Niu Yawei (Inactive) added a comment - Jodi, it seems to me that patch isn't related to this failure.

            Do we need to back port http://review.whamcloud.com/#/c/11971/ to other branches?

            jlevi Jodi Levi (Inactive) added a comment - Do we need to back port http://review.whamcloud.com/#/c/11971/ to other branches?
            yujian Jian Yu added a comment - More instance on Lustre b2_5 branch: https://testing.hpdd.intel.com/test_sets/dc5dabda-8074-11e4-a434-5254006e85c2

            Looks dup of LU-3366. Perhaps the vm (OSS) can't afford such test (7 OSTs, 128 brw threads for each OST)?

            niu Niu Yawei (Inactive) added a comment - Looks dup of LU-3366 . Perhaps the vm (OSS) can't afford such test (7 OSTs, 128 brw threads for each OST)?

            Patch http://review.whamcloud.com/#/c/11971/ modified the ost-survey script, not obdfilter. Is there still a connection with this ticket and the 11971 patch?

            jamesanunez James Nunez (Inactive) added a comment - Patch http://review.whamcloud.com/#/c/11971/ modified the ost-survey script, not obdfilter. Is there still a connection with this ticket and the 11971 patch?

            It appears that http://review.whamcloud.com/11971 was changing the obdfilter-survey script, which landed on Oct 5th, and this bug was filed on Oct 20th.

            adilger Andreas Dilger added a comment - It appears that http://review.whamcloud.com/11971 was changing the obdfilter-survey script, which landed on Oct 5th, and this bug was filed on Oct 20th.

            Niu,
            Could you please have a look at this one?
            Thank you!

            jlevi Jodi Levi (Inactive) added a comment - Niu, Could you please have a look at this one? Thank you!

            I found something strange in the OST logs - hundreds of lctl processes are running on the node, like it is a fork bomb:

            11:04:49:[28448]     0 28448     3820      230   0       0             0 lctl
            11:04:49:[28449]     0 28449     3820      229   0       0             0 lctl
            11:04:49:[28450]     0 28450     3820      228   1       0             0 lctl
            11:04:49:[28451]     0 28451     3820      230   0       0             0 lctl
            11:04:49:[28452]     0 28452     3820      229   1       0             0 lctl
            [repeats]
            
            adilger Andreas Dilger added a comment - I found something strange in the OST logs - hundreds of lctl processes are running on the node, like it is a fork bomb: 11:04:49:[28448] 0 28448 3820 230 0 0 0 lctl 11:04:49:[28449] 0 28449 3820 229 0 0 0 lctl 11:04:49:[28450] 0 28450 3820 228 1 0 0 lctl 11:04:49:[28451] 0 28451 3820 230 0 0 0 lctl 11:04:49:[28452] 0 28452 3820 229 1 0 0 lctl [repeats]
            yujian Jian Yu added a comment - More instance on master branch: https://testing.hpdd.intel.com/test_sets/19e98088-7e99-11e4-ab67-5254006e85c2
            yujian Jian Yu added a comment - - edited More instances on Lustre b2_5 branch: https://testing.hpdd.intel.com/test_sets/12401f30-7a4e-11e4-b9fd-5254006e85c2 https://testing.hpdd.intel.com/test_sets/393c5232-7a55-11e4-807e-5254006e85c2 https://testing.hpdd.intel.com/test_sets/ee460ed2-7980-11e4-aa22-5254006e85c2 https://testing.hpdd.intel.com/test_sets/f55941c6-6a58-11e4-b203-5254006e85c2 https://testing.hpdd.intel.com/test_sets/87b3b698-5cdd-11e4-8561-5254006e85c2 https://testing.hpdd.intel.com/test_sets/bce60eac-4eef-11e4-872e-5254006e85c2 https://testing.hpdd.intel.com/test_sets/6702eeee-7d55-11e4-943c-5254006e85c2 https://testing.hpdd.intel.com/test_sets/6502c6d6-7d33-11e4-943c-5254006e85c2

            People

              niu Niu Yawei (Inactive)
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: