Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.5.4
-
3
-
16745
Description
This issue was created by maloo for Li Wei <liwei@whamcloud.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/896d09b2-7cbd-11e4-b42a-5254006e85c2.
The sub-test test_102b failed with the following error:
setstripe failed
More detailed test output:
== sanity test 102b: getfattr/setfattr for trusted.lov EAs ============== 06:04:38 (1417759478) get/set/list trusted.lov xattr ... error on ioctl 0x4008669a for '/mnt/lustre/f102b.sanity' (3): File too large error: setstripe: create stripe file '/mnt/lustre/f102b.sanity' failed sanity test_102b: @@@@@@ FAIL: setstripe failed
From the MDS debug log:
00000004:02000400:1.0:1417759538.570621:0:15379:0:(osp_precreate.c:1021:osp_precreate_timeout_condition()) lustre-OST0001-osc-MDT0000: slow creates, last=[0x100010000:0x1561:0x0], next=[0x100010000:0x1561:0x0], reserved=0, syn_changes=35, syn_rpc_in_progress=1, status=-19 00000004:00000001:1.0:1417759538.574056:0:15379:0:(osp_precreate.c:1121:osp_precreate_reserve()) Process leaving (rc=18446744073709551506 : -110 : ffffffffffffff92) 00000004:00000001:1.0:1417759538.574060:0:15379:0:(osp_object.c:219:osp_declare_object_create()) Process leaving (rc=18446744073709551506 : -110 : ffffffffffffff92) 00020000:00001000:1.0:1417759538.574063:0:15379:0:(lod_qos.c:587:lod_qos_declare_object_on()) can't declare creation on #1: -110 00000004:00000001:1.0:1417759538.574065:0:15379:0:(osp_object.c:398:osp_object_release()) Process entered 00000004:00000001:1.0:1417759538.574067:0:15379:0:(osp_object.c:413:osp_object_release()) Process leaving 00000020:00000001:1.0:1417759538.574069:0:15379:0:(lustre_fid.h:719:fid_flatten32()) Process leaving (rc=4261412608 : 4261412608 : fdffff00) 00000004:00000010:1.0:1417759538.574073:0:15379:0:(osp_object.c:390:osp_object_free()) slab-freed '(obj)': 200 at ffff880034f0c9e0. 00020000:00000001:1.0:1417759538.574076:0:15379:0:(lod_qos.c:593:lod_qos_declare_object_on()) Process leaving (rc=18446744073709551506 : -110 : ffffffffffffff92) 00020000:00001000:1.0:1417759538.574077:0:15379:0:(lod_qos.c:912:lod_alloc_specific()) can't declare new object on #1: -110 00020000:00020000:1.0:1417759538.574080:0:15379:0:(lod_qos.c:941:lod_alloc_specific()) can't lstripe objid [0x2000013a0:0x176:0x0]: have 1 want 2 00020000:00000001:1.0:1417759538.575511:0:15379:0:(lod_qos.c:950:lod_alloc_specific()) Process leaving (rc=18446744073709551589 : -27 : ffffffffffffffe5)
From the OSS dmesg log (before this test):
[...] INFO: task txg_sync:9775 blocked for more than 120 seconds. Tainted: P --------------- 2.6.32-431.29.2.el6_lustre.g6b22a20.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. txg_sync D 0000000000000000 0 9775 2 0x00000080 ffff88006b6f7ba0 0000000000000046 00000000ffffffff 00001204a652a9e0 ffff88006b6f7b10 ffff8800695079f0 0000000000148ca8 ffffffffab7ecaf4 ffff880073bed098 ffff88006b6f7fd8 000000000000fbc8 ffff880073bed098 Call Trace: [<ffffffff810a6d31>] ? ktime_get_ts+0xb1/0xf0 [<ffffffff81529ea3>] io_schedule+0x73/0xc0 [<ffffffffa014341c>] cv_wait_common+0x8c/0x100 [spl] [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa01434a8>] __cv_wait_io+0x18/0x20 [spl] [<ffffffffa02890ab>] zio_wait+0xfb/0x1b0 [zfs] [<ffffffffa021e9e3>] dsl_pool_sync+0xb3/0x3f0 [zfs] [<ffffffffa0236e4b>] spa_sync+0x40b/0xa60 [zfs] [<ffffffffa0240711>] ? spa_txg_history_set+0xb1/0xd0 [zfs] [<ffffffffa0243916>] txg_sync_thread+0x2e6/0x510 [zfs] [<ffffffff810591a9>] ? set_user_nice+0xc9/0x130 [<ffffffffa0243630>] ? txg_sync_thread+0x0/0x510 [zfs] [<ffffffffa013ec2f>] thread_generic_wrapper+0x5f/0x70 [spl] [<ffffffffa013ebd0>] ? thread_generic_wrapper+0x0/0x70 [spl] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Lustre: lustre-OST0001: Client lustre-MDT0000-mdtlov_UUID (at 10.1.5.68@tcp) reconnecting Lustre: Skipped 25 previous similar messages Lustre: lustre-OST0001: Client lustre-MDT0000-mdtlov_UUID (at 10.1.5.68@tcp) refused reconnection, still busy with 1 active RPCs Lustre: Skipped 25 previous similar messages Lustre: lustre-OST0001: Client lustre-MDT0000-mdtlov_UUID (at 10.1.5.68@tcp) reconnecting Lustre: Skipped 51 previous similar messages Lustre: lustre-OST0001: Client lustre-MDT0000-mdtlov_UUID (at 10.1.5.68@tcp) refused reconnection, still busy with 1 active RPCs Lustre: Skipped 51 previous similar messages [...]
See also the portion of the OSS dmesg log for later test failures in the same session.
Info required for matching: sanity 102b
Attachments
Issue Links
- is related to
-
LU-5242 Test hang sanity test_132, test_133: umount ost
- Resolved