Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6008

sanity test_102b: setstripe failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.5.4
    • 3
    • 16745

    Description

      This issue was created by maloo for Li Wei <liwei@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/896d09b2-7cbd-11e4-b42a-5254006e85c2.

      The sub-test test_102b failed with the following error:

      setstripe failed
      

      More detailed test output:

      == sanity test 102b: getfattr/setfattr for trusted.lov EAs ============== 06:04:38 (1417759478)
      get/set/list trusted.lov xattr ...
      error on ioctl 0x4008669a for '/mnt/lustre/f102b.sanity' (3): File too large
      error: setstripe: create stripe file '/mnt/lustre/f102b.sanity' failed
       sanity test_102b: @@@@@@ FAIL: setstripe failed 
      

      From the MDS debug log:

      00000004:02000400:1.0:1417759538.570621:0:15379:0:(osp_precreate.c:1021:osp_precreate_timeout_condition()) lustre-OST0001-osc-MDT0000: slow creates, last=[0x100010000:0x1561:0x0], next=[0x100010000:0x1561:0x0], reserved=0, syn_changes=35, syn_rpc_in_progress=1, status=-19
      00000004:00000001:1.0:1417759538.574056:0:15379:0:(osp_precreate.c:1121:osp_precreate_reserve()) Process leaving (rc=18446744073709551506 : -110 : ffffffffffffff92)
      00000004:00000001:1.0:1417759538.574060:0:15379:0:(osp_object.c:219:osp_declare_object_create()) Process leaving (rc=18446744073709551506 : -110 : ffffffffffffff92)
      00020000:00001000:1.0:1417759538.574063:0:15379:0:(lod_qos.c:587:lod_qos_declare_object_on()) can't declare creation on #1: -110
      00000004:00000001:1.0:1417759538.574065:0:15379:0:(osp_object.c:398:osp_object_release()) Process entered
      00000004:00000001:1.0:1417759538.574067:0:15379:0:(osp_object.c:413:osp_object_release()) Process leaving
      00000020:00000001:1.0:1417759538.574069:0:15379:0:(lustre_fid.h:719:fid_flatten32()) Process leaving (rc=4261412608 : 4261412608 : fdffff00)
      00000004:00000010:1.0:1417759538.574073:0:15379:0:(osp_object.c:390:osp_object_free()) slab-freed '(obj)': 200 at ffff880034f0c9e0.
      00020000:00000001:1.0:1417759538.574076:0:15379:0:(lod_qos.c:593:lod_qos_declare_object_on()) Process leaving (rc=18446744073709551506 : -110 : ffffffffffffff92)
      00020000:00001000:1.0:1417759538.574077:0:15379:0:(lod_qos.c:912:lod_alloc_specific()) can't declare new object on #1: -110
      00020000:00020000:1.0:1417759538.574080:0:15379:0:(lod_qos.c:941:lod_alloc_specific()) can't lstripe objid [0x2000013a0:0x176:0x0]: have 1 want 2
      00020000:00000001:1.0:1417759538.575511:0:15379:0:(lod_qos.c:950:lod_alloc_specific()) Process leaving (rc=18446744073709551589 : -27 : ffffffffffffffe5)
      

      From the OSS dmesg log (before this test):

      [...]
      INFO: task txg_sync:9775 blocked for more than 120 seconds.
            Tainted: P           ---------------    2.6.32-431.29.2.el6_lustre.g6b22a20.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      txg_sync      D 0000000000000000     0  9775      2 0x00000080
       ffff88006b6f7ba0 0000000000000046 00000000ffffffff 00001204a652a9e0
       ffff88006b6f7b10 ffff8800695079f0 0000000000148ca8 ffffffffab7ecaf4
       ffff880073bed098 ffff88006b6f7fd8 000000000000fbc8 ffff880073bed098
      Call Trace:
       [<ffffffff810a6d31>] ? ktime_get_ts+0xb1/0xf0
       [<ffffffff81529ea3>] io_schedule+0x73/0xc0
       [<ffffffffa014341c>] cv_wait_common+0x8c/0x100 [spl]
       [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
       [<ffffffffa01434a8>] __cv_wait_io+0x18/0x20 [spl]
       [<ffffffffa02890ab>] zio_wait+0xfb/0x1b0 [zfs]
       [<ffffffffa021e9e3>] dsl_pool_sync+0xb3/0x3f0 [zfs]
       [<ffffffffa0236e4b>] spa_sync+0x40b/0xa60 [zfs]
       [<ffffffffa0240711>] ? spa_txg_history_set+0xb1/0xd0 [zfs]
       [<ffffffffa0243916>] txg_sync_thread+0x2e6/0x510 [zfs]
       [<ffffffff810591a9>] ? set_user_nice+0xc9/0x130
       [<ffffffffa0243630>] ? txg_sync_thread+0x0/0x510 [zfs]
       [<ffffffffa013ec2f>] thread_generic_wrapper+0x5f/0x70 [spl]
       [<ffffffffa013ebd0>] ? thread_generic_wrapper+0x0/0x70 [spl]
       [<ffffffff8109abf6>] kthread+0x96/0xa0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109ab60>] ? kthread+0x0/0xa0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      Lustre: lustre-OST0001: Client lustre-MDT0000-mdtlov_UUID (at 10.1.5.68@tcp) reconnecting
      Lustre: Skipped 25 previous similar messages
      Lustre: lustre-OST0001: Client lustre-MDT0000-mdtlov_UUID (at 10.1.5.68@tcp) refused reconnection, still busy with 1 active RPCs
      Lustre: Skipped 25 previous similar messages
      Lustre: lustre-OST0001: Client lustre-MDT0000-mdtlov_UUID (at 10.1.5.68@tcp) reconnecting
      Lustre: Skipped 51 previous similar messages
      Lustre: lustre-OST0001: Client lustre-MDT0000-mdtlov_UUID (at 10.1.5.68@tcp) refused reconnection, still busy with 1 active RPCs
      Lustre: Skipped 51 previous similar messages
      [...]
      

      See also the portion of the OSS dmesg log for later test failures in the same session.

      Info required for matching: sanity 102b

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: