Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2121

Test failure on test suite lustre-rsync-test, subtest test_1

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.4.0
    • None
    • 3
    • 5122

    Description

      This issue was created by maloo for Li Wei <liwei@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/04462af8-1206-11e2-a663-52540035b04c.

      The sub-test test_1 failed with the following error:

      test failed to respond and timed out

      Info required for matching: lustre-rsync-test 1

      Attachments

        Issue Links

          Activity

            [LU-2121] Test failure on test suite lustre-rsync-test, subtest test_1

            Close old bug.

            adilger Andreas Dilger added a comment - Close old bug.

            Several of the failures (on both DNE and ZFS) on MDS:

            13:43:31:Lustre: DEBUG MARKER: == lustre-rsync-test test 1: Simple Replication == 19:43:23 (1408563803)
            13:43:31:LustreError: 3206:0:(layout.c:2355:req_capsule_extend()) ASSERTION( (fmt)->rf_fields[(i)].d[(j)]->rmf_size >= (old)->rf_fields[(i)].d[(j)]->rmf_size ) failed: 
            13:43:31:LustreError: 3206:0:(layout.c:2355:req_capsule_extend()) LBUG
            13:43:31:Pid: 3206, comm: mdt00_003
            13:43:31:
            13:43:31:Call Trace:
            13:43:31: [<ffffffffa0483895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            13:43:31: [<ffffffffa0483e97>] lbug_with_loc+0x47/0xb0 [libcfs]
            13:43:31: [<ffffffffa08511cc>] req_capsule_extend+0x1fc/0x200 [ptlrpc]
            13:43:31: [<ffffffffa0eba77a>] mdt_intent_policy+0x38a/0xca0 [mdt]
            13:43:31: [<ffffffffa07e0789>] ldlm_lock_enqueue+0x369/0x970 [ptlrpc]
            13:43:31: [<ffffffffa0809e4a>] ldlm_handle_enqueue0+0x36a/0x1120 [ptlrpc]
            13:43:31: [<ffffffffa088c972>] tgt_enqueue+0x62/0x1d0 [ptlrpc]
            13:43:31: [<ffffffffa088d1fe>] tgt_request_handle+0x71e/0xb10 [ptlrpc]
            13:43:31: [<ffffffffa083c224>] ptlrpc_main+0xe64/0x1990 [ptlrpc]
            13:43:31: [<ffffffffa083b3c0>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
            13:43:31: [<ffffffff8109abf6>] kthread+0x96/0xa0
            13:43:31: [<ffffffff8100c20a>] child_rip+0xa/0x20
            13:43:31: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
            13:43:31: [<ffffffff8100c200>] ? child_rip+0x0/0x20
            13:43:31:
            

            https://testing.hpdd.intel.com/test_sets/52111d5c-28ba-11e4-901f-5254006e85c2
            https://testing.hpdd.intel.com/test_sets/f5085e68-2e13-11e4-8a0b-5254006e85c2
            https://testing.hpdd.intel.com/test_sets/381780f0-334e-11e4-b04e-5254006e85c2

            utopiabound Nathaniel Clark added a comment - Several of the failures (on both DNE and ZFS) on MDS: 13:43:31:Lustre: DEBUG MARKER: == lustre-rsync-test test 1: Simple Replication == 19:43:23 (1408563803) 13:43:31:LustreError: 3206:0:(layout.c:2355:req_capsule_extend()) ASSERTION( (fmt)->rf_fields[(i)].d[(j)]->rmf_size >= (old)->rf_fields[(i)].d[(j)]->rmf_size ) failed: 13:43:31:LustreError: 3206:0:(layout.c:2355:req_capsule_extend()) LBUG 13:43:31:Pid: 3206, comm: mdt00_003 13:43:31: 13:43:31:Call Trace: 13:43:31: [<ffffffffa0483895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 13:43:31: [<ffffffffa0483e97>] lbug_with_loc+0x47/0xb0 [libcfs] 13:43:31: [<ffffffffa08511cc>] req_capsule_extend+0x1fc/0x200 [ptlrpc] 13:43:31: [<ffffffffa0eba77a>] mdt_intent_policy+0x38a/0xca0 [mdt] 13:43:31: [<ffffffffa07e0789>] ldlm_lock_enqueue+0x369/0x970 [ptlrpc] 13:43:31: [<ffffffffa0809e4a>] ldlm_handle_enqueue0+0x36a/0x1120 [ptlrpc] 13:43:31: [<ffffffffa088c972>] tgt_enqueue+0x62/0x1d0 [ptlrpc] 13:43:31: [<ffffffffa088d1fe>] tgt_request_handle+0x71e/0xb10 [ptlrpc] 13:43:31: [<ffffffffa083c224>] ptlrpc_main+0xe64/0x1990 [ptlrpc] 13:43:31: [<ffffffffa083b3c0>] ? ptlrpc_main+0x0/0x1990 [ptlrpc] 13:43:31: [<ffffffff8109abf6>] kthread+0x96/0xa0 13:43:31: [<ffffffff8100c20a>] child_rip+0xa/0x20 13:43:31: [<ffffffff8109ab60>] ? kthread+0x0/0xa0 13:43:31: [<ffffffff8100c200>] ? child_rip+0x0/0x20 13:43:31: https://testing.hpdd.intel.com/test_sets/52111d5c-28ba-11e4-901f-5254006e85c2 https://testing.hpdd.intel.com/test_sets/f5085e68-2e13-11e4-8a0b-5254006e85c2 https://testing.hpdd.intel.com/test_sets/381780f0-334e-11e4-b04e-5254006e85c2

            I see in more recent reports of this bug (which I suspect is a different problem, but this ticket is so old as to be only useful for recycling):

            14:05:18:IP: [<ffffffffa073c085>] lu_context_exit+0x35/0xa0 [obdclass]
            14:05:18:Oops: 0000 [#1] SMP 
            14:05:18:CPU 0 
            14:05:18:Pid: 18668, comm: lctl Tainted: P 2.6.32-431.3.1.el6_lustre.gc762f0f.x86_64 #1 Red Hat KVM
            14:05:18:RIP: 0010:[<ffffffffa073c085>]  [<ffffffffa073c085>] lu_context_exit+0x35/0xa0 [obdclass]
            14:05:18:Process lctl (pid: 18668, threadinfo ffff88006f718000, task ffff88006e76e080)
            14:05:18:Stack:
            14:05:18:Call Trace:
            14:05:18: [<ffffffffa073d1e6>] lu_env_fini+0x16/0x30 [obdclass]
            14:05:18: [<ffffffffa0ca8881>] mdd_changelog_users_seq_show+0x111/0x290 [mdd]
            14:05:18: [<ffffffff811ade22>] seq_read+0xf2/0x400
            14:05:18: [<ffffffff811f355e>] proc_reg_read+0x7e/0xc0
            14:05:18: [<ffffffff811896b5>] vfs_read+0xb5/0x1a0
            14:05:18: [<ffffffff811897f1>] sys_read+0x51/0x90
            
            adilger Andreas Dilger added a comment - I see in more recent reports of this bug (which I suspect is a different problem, but this ticket is so old as to be only useful for recycling): 14:05:18:IP: [<ffffffffa073c085>] lu_context_exit+0x35/0xa0 [obdclass] 14:05:18:Oops: 0000 [#1] SMP 14:05:18:CPU 0 14:05:18:Pid: 18668, comm: lctl Tainted: P 2.6.32-431.3.1.el6_lustre.gc762f0f.x86_64 #1 Red Hat KVM 14:05:18:RIP: 0010:[<ffffffffa073c085>] [<ffffffffa073c085>] lu_context_exit+0x35/0xa0 [obdclass] 14:05:18:Process lctl (pid: 18668, threadinfo ffff88006f718000, task ffff88006e76e080) 14:05:18:Stack: 14:05:18:Call Trace: 14:05:18: [<ffffffffa073d1e6>] lu_env_fini+0x16/0x30 [obdclass] 14:05:18: [<ffffffffa0ca8881>] mdd_changelog_users_seq_show+0x111/0x290 [mdd] 14:05:18: [<ffffffff811ade22>] seq_read+0xf2/0x400 14:05:18: [<ffffffff811f355e>] proc_reg_read+0x7e/0xc0 14:05:18: [<ffffffff811896b5>] vfs_read+0xb5/0x1a0 14:05:18: [<ffffffff811897f1>] sys_read+0x51/0x90

            Looks like the client is out of memory? Could be related to LU-2139.

            adilger Andreas Dilger added a comment - Looks like the client is out of memory? Could be related to LU-2139 .

            15:07:23:cannot allocate a tage (334)
            15:07:23:cannot allocate a tage (334)
            15:07:23:cannot allocate a tage (334)
            15:07:23:cannot allocate a tage (334)
            15:07:23:cannot allocate a tage (334)
            15:07:23:cannot allocate a tage (334)
            15:07:23:cannot allocate a tage (334)

            adilger Andreas Dilger added a comment - 15:07:23:cannot allocate a tage (334) 15:07:23:cannot allocate a tage (334) 15:07:23:cannot allocate a tage (334) 15:07:23:cannot allocate a tage (334) 15:07:23:cannot allocate a tage (334) 15:07:23:cannot allocate a tage (334) 15:07:23:cannot allocate a tage (334)

            https://maloo.whamcloud.com/test_sets/3be69dba-173a-11e2-afe1-52540035b04c

            CMD: fat-intel-1vm3 dumpe2fs -h lustre-mdt1/mdt1 2>&1 | grep -q large_xattr
            CMD: fat-intel-1vm3 dumpe2fs -h lustre-mdt1/mdt1 2>&1
            
            liwei Li Wei (Inactive) added a comment - https://maloo.whamcloud.com/test_sets/3be69dba-173a-11e2-afe1-52540035b04c CMD: fat-intel-1vm3 dumpe2fs -h lustre-mdt1/mdt1 2>&1 | grep -q large_xattr CMD: fat-intel-1vm3 dumpe2fs -h lustre-mdt1/mdt1 2>&1

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: