[LU-7550] sanity test_27C: FAIL: Can not find 5 in obdidx 0 1 2 3 4 6 Created: 14/Dec/15 Updated: 05/Jan/16 Resolved: 05/Jan/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jian Yu | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
sanity test 27C failed as follows: == sanity test 27C: check full striping across all OSTs == 06:13:56 (1450016036) obdidx 0 1 2 3 4 6 /usr/lib64/lustre/tests/sanity.sh: line 2017: [: obdidx: integer expression expected /usr/lib64/lustre/tests/sanity.sh: line 2017: [: obdidx: integer expression expected /usr/lib64/lustre/tests/sanity.sh: line 2017: [: obdidx: integer expression expected /usr/lib64/lustre/tests/sanity.sh: line 2017: [: obdidx: integer expression expected /usr/lib64/lustre/tests/sanity.sh: line 2017: [: obdidx: integer expression expected /usr/lib64/lustre/tests/sanity.sh: line 2017: [: obdidx: integer expression expected sanity test_27C: @@@@@@ FAIL: Can not find 5 in obdidx 0 1 2 3 4 6 Maloo report: |
| Comments |
| Comment by Jian Yu [ 14/Dec/15 ] |
|
This is affecting patch review testing on master branch: |
| Comment by Peter Jones [ 14/Dec/15 ] |
|
Jian is looking into this |
| Comment by James Nunez (Inactive) [ 15/Dec/15 ] |
|
Many failures on master: |
| Comment by Jian Yu [ 15/Dec/15 ] |
|
Lustre Build: https://build.hpdd.intel.com/job/lustre-master/3274/ sanity test 27C failed: After reverting commit 0585b0fb5895a24f07ca32e830d1fa72b75f4f2b for So, the failure is a regression introduced by patch http://review.whamcloud.com/15731 for |
| Comment by Di Wang [ 15/Dec/15 ] |
|
I checked the failure a little bit. and it seems because of OST0005 is not started yet, because check_seq_oid() in test_27z will restart OSTs. log in test_27C 98ad88ca34f2+331:11157:x1520450549370228:12345-10.2.4.165@tcp:103 Request procesed in 63us (226us total) trans 0 rc 0/0 00000100:00100000:0.0:1450016037.008128:0:28196:0:(nrs_fifo.c:241:nrs_fifo_req_stop()) NRS stop fifo request from 12345-10.2.4.165@tcp, seq: 143 00020000:00020000:1.0:1450016037.008151:0:21232:0:(lod_qos.c:205:lod_statfs_and_check()) lustre-MDT0000-mdtlov: statfs: rc = -19 00000004:00080000:1.0:1450016037.013441:0:21232:0:(osp_internal.h:551:osp_update_last_fid()) Gap in objids: start=[0x100000000:0xf6:0x0], count =76 00000004:00080000:1.0:1450016037.013446:0:21232:0:(osp_object.c:1486:osp_object_create()) Writing gap [0x100000000:0xf6:0x0]+76 in llog 00000004:00080000:1.0:1450016037.013451:0:21232:0:(osp_object.c:1503:osp_object_create()) lustre-OST0000-osc-MDT0000: Wrote last used FID: [0x100000000:0x142:0x0], index 0: 0 00000004:00080000:1.0:1450016037.013453:0:21232:0:(osp_internal.h:551:osp_update_last_fid()) Gap in objids: start=[0x100010000:0x124:0x0], count =30 00000004:00080000:1.0:1450016037.013455:0:21232:0:(osp_object.c:1486:osp_object_create()) Writing gap [0x100010000:0x124:0x0]+30 in llog 00000004:00080000:1.0:1450016037.013456:0:21232:0:(osp_object.c:1503:osp_object_create()) lustre-OST0001-osc-MDT0000: Wrote last used FID: [0x100010000:0x142:0x0], index 1: 0 00000004:00080000:1.0:1450016037.013459:0:21232:0:(osp_object.c:1503:osp_object_create()) lustre-OST0002-osc-MDT0000: Wrote last used FID: [0x100020000:0x143:0x0], index 2: 0 00000004:00080000:1.0:1450016037.013460:0:21232:0:(osp_internal.h:551:osp_update_last_fid()) Gap in objids: start=[0x100030000:0x124:0x0], count =30 00000004:00080000:1.0:1450016037.013460:0:21232:0:(osp_object.c:1486:osp_object_create()) Writing gap [0x100030000:0x124:0x0]+30 in llog 00000004:00080000:1.0:1450016037.013461:0:21232:0:(osp_object.c:1503:osp_object_create()) lustre-OST0003-osc-MDT0000: Wrote last used FID: [0x100030000:0x142:0x0], index 3: 0 00000004:00080000:1.0:1450016037.013462:0:21232:0:(osp_internal.h:551:osp_update_last_fid()) Gap in objids: start=[0x100040000:0x124:0x0], count =30 00000004:00080000:1.0:1450016037.013463:0:21232:0:(osp_object.c:1486:osp_object_create()) Writing gap [0x100040000:0x124:0x0]+30 in llog 00000004:00080000:1.0:1450016037.013464:0:21232:0:(osp_object.c:1503:osp_object_create()) lustre-OST0004-osc-MDT0000: Wrote last used FID: [0x100040000:0x142:0x0], index 4: 0 00000004:00080000:1.0:1450016037.013465:0:21232:0:(osp_internal.h:551:osp_update_last_fid()) Gap in objids: start=[0x100060000:0x124:0x0], count =30 00000004:00080000:1.0:1450016037.013466:0:21232:0:(osp_object.c:1486:osp_object_create()) Writing gap [0x100060000:0x124:0x0]+30 in llog 00000004:00080000:1.0:1450016037.013467:0:21232:0:(osp_object.c:1503:osp_object_create()) lustre-OST0006-osc-MDT0000: Wrote last used FID: [0x100060000:0x142:0x0], index 6: 0 return -19 (NODEV) when stat OST0005. And OST0005 is just restarted in test_27z 00000020:01200004:0.0:1450016031.465300:0:17414:0:(obd_mount.c:1276:lustre_fill_super()) VFS Op: sb ffff880064b67000 00000020:01000004:0.0:1450016031.465318:0:17414:0:(obd_mount.c:843:lmd_print()) mount data: 00000020:01000004:0.0:1450016031.465319:0:17414:0:(obd_mount.c:846:lmd_print()) device: /dev/mapper/lvm--Role_OSS-P6 00000020:01000004:0.0:1450016031.465320:0:17414:0:(obd_mount.c:847:lmd_print()) flags: 0 00000020:01000004:0.0:1450016031.465321:0:17414:0:(obd_mount.c:850:lmd_print()) options: errors=remount-ro 00000020:01000004:0.0:1450016031.465323:0:17414:0:(obd_mount.c:1323:lustre_fill_super()) Mounting server from /dev/mapper/lvm--Role_OSS-P6 00000020:01000004:0.0:1450016031.465326:0:17414:0:(obd_mount_server.c:1686:osd_start()) Attempting to start lustre-OST0005, type=osd-ldiskfs, lsifl=200002, mountfl=0 00000020:01000004:0.0:1450016031.465367:0:17414:0:(obd_mount.c:194:lustre_start_simple()) Starting obd lustre-OST0005-osd (typ=osd-ldiskfs) 00000020:00000080:0.0:1450016031.465370:0:17414:0:(obd_config.c:1145:class_process_config()) processing cmd: cf001 00000020:00000080:0.0:1450016031.465373:0:17414:0:(obd_config.c:359:class_attach()) attach type osd-ldiskfs name: lustre-OST0005-osd uuid: lustre-OST0005-osd_UUID 00000020:00000080:0.0:1450016031.465429:0:17414:0:(genops.c:371:class_newdev()) Adding new device lustre-OST0005-osd (ffff88005f728078) 00000020:00000080:0.0:1450016031.465432:0:17414:0:(obd_config.c:429:class_attach()) OBD: dev 17 attached type osd-ldiskfs with refcount 1 00000020:00000080:0.0:1450016031.465435:0:17414:0:(obd_config.c:1145:class_process_config()) processing cmd: cf003 00000020:00000080:0.0:1450016031.502031:0:17414:0:(obd_config.c:549:class_setup()) finished setup of obd lustre-OST0005-osd (uuid lustre-OST0005-osd_UUID) And it probably related with this change http://review.whamcloud.com/15731 diff --git a/lustre/osp/osp_dev.c b/lustre/osp/osp_dev.c
index e4af6b6..2a9d30f 100644
--- a/lustre/osp/osp_dev.c
+++ b/lustre/osp/osp_dev.c
@@ -752,7 +752,17 @@ static int osp_statfs(const struct lu_env *env, struct dt_device *dev,
LPU64" files, "LPU64" free files\n", d->opd_obd->obd_name,
sfs->os_blocks, sfs->os_bfree, sfs->os_bavail,
sfs->os_files, sfs->os_ffree);
- RETURN(0);
+
+ /* ENOSPC could be for two reasons,
+ * 1) not enough inodes 2) not enough blocks
+ * for 1) lod should use preallocated objects
+ * and for 2) shouldn`t. So, here for ENOSPC
+ * different values is returned to spend preallocated.
+ */
+ if (d->opd_pre_status == -ENOSPC && sfs->os_ffree < 32)
+ RETURN(0);
+
+ RETURN(d->opd_pre_status);
}
The easiest fix might be just wait OSTs up after restart OST in test_27z. |
| Comment by Peter Jones [ 15/Dec/15 ] |
|
|
| Comment by Jian Yu [ 15/Dec/15 ] |
Yes, Di. After making the following change, sanity test 27C passed regularly against master build https://build.hpdd.intel.com/job/lustre-master/3274/ (which failed before): diff --git a/lustre/tests/sanity.sh b/lustre/tests/sanity.sh
index 7a25e4b..19e8521 100755
--- a/lustre/tests/sanity.sh
+++ b/lustre/tests/sanity.sh
@@ -1885,6 +1885,7 @@ check_seq_oid()
$(facet_mntpt ost$ost)/$obj_file)
unmount_fstype ost$ost
start ost$ost $dev $OST_MOUNT_OPTS
+ clients_up
fi
https://testing.hpdd.intel.com/test_sessions/918784fc-a36c-11e5-a3ed-5254006e85c2 |
| Comment by James Nunez (Inactive) [ 18/Dec/15 ] |
|
More failures on master: |
| Comment by Alexander Boyko [ 21/Dec/15 ] |
|
>So, the failure is a regression introduced by patch http://review.whamcloud.com/15731 for |
| Comment by James Nunez (Inactive) [ 21/Dec/15 ] |
|
Reopening because we are seeing some patches fail due to this error with |
| Comment by Gerrit Updater [ 21/Dec/15 ] |
|
Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/17691 |
| Comment by Gerrit Updater [ 05/Jan/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17691/ |
| Comment by Jian Yu [ 05/Jan/16 ] |
|
Patch landed to master branch for Lustre 2.8.0. |