[LU-11396] sanity-sec test 16 fails with 'test_16 returned 1' Created: 18/Sep/18 Updated: 19/Dec/18 Resolved: 26/Sep/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Jian Yu |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | sles12, suse | ||
| Environment: |
SUSE |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
sanity-sec test_16, 17, 18, 19, 20, 21, and 22 started failing on 2018-09-01, so far, only for SUSE 12.3. All tests have a common error found in the client test_log trevis-3vm10: running as uid/gid/euid/egid 0/0/0/0, groups: 0 trevis-3vm10: [lfs] [quota] [-q] [/mnt/lustre] trevis-3vm10: Unexpected quotactl error: Operation not supported trevis-3vm10: Unexpected quotactl error: Operation not supported /usr/lib64/lustre/tests/sanity-sec.sh: line 1261: [0]: syntax error: operand expected (error token is "[0]") sanity-sec line 1261 is in the do_fops_quota_test() routine
1256 do_fops_quota_test() {
1257 local run_u=$1
1258 # fuzz quota used to account for possible indirect blocks, etc
1259 local quota_fuzz=$(fs_log_size)
1260 local qused_orig=$(nodemap_check_quota "$run_u")
1261 local qused_high=$((qused_orig + quota_fuzz))
1262 local qused_low=$((qused_orig - quota_fuzz))
1263 local testfile=$DIR/$tdir/$tfile
…
There’s nothing interesting in the node console logs and there is a lack of any other logs for these failures. Logs for some of these failures are at |
| Comments |
| Comment by James Nunez (Inactive) [ 18/Sep/18 ] |
|
It looks like sanity-quota is hitting a similar issue. Tests 1, 3, 4a, 6, 7a, 7b, 7c, 7d, 11, 12a, 13, 15, 17, 19, 20, 23, 24, 27b, 27c, 27d, 30, 33, 34, 35, 37, 38, 55, 60, and 61 fail with similar errors. For https://testing.whamcloud.com/test_sets/78bdbbf8-b921-11e8-9df3-52540065bddc, from the suite_log, we see errors like Unexpected quotactl error: Operation not supported /usr/lib64/lustre/tests/sanity-quota.sh: line 821: [: [0]: integer expression expected ost-pools test 23a has the same issue with quotas CMD: trevis-40vm4 /usr/sbin/lctl conf_param lustre.quota.ost=ug
Unexpected quotactl error: Operation not supported
Disk quotas for usr 500 (uid 500):
Filesystem kbytes quota limit grace files quota limit grace
/mnt/lustre/d23a.ost-pools/dir
[0] [0] [0] - [0] [0] [0] -
Total allocated inode limit: 0, total allocated block limit: 0
Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.
-rw-r--r-- 1 sanityusr sanityusr 0 Sep 15 06:29 /mnt/lustre/d23a.ost-pools/dir/f23a.ost-pools-quota
first dd failed with EDQUOT.
Unexpected quotactl error: Operation not supported
Disk quotas for usr 500 (uid 500):
Filesystem kbytes quota limit grace files quota limit grace
/mnt/lustre/d23a.ost-pools/dir
[0] [0] [0] - [0] [0] [0] -
Total allocated inode limit: 0, total allocated block limit: 0
Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.
ost-pools test_23a: @@@@@@ FAIL: test_23a failed with 95
|
| Comment by Sarah Liu [ 18/Sep/18 ] |
|
same issue for ost-pools |
| Comment by Peter Jones [ 19/Sep/18 ] |
|
Same as |
| Comment by Jian Yu [ 25/Sep/18 ] |
|
After updating SLES12 SP3 kernel version to 4.4.155-94.50.1, the "Unexpected quotactl error: Operation not supported" issue still occurs: # lfs quota -u sanityusr1 /mnt/lustre/
Unexpected quotactl error: Operation not supported
Disk quotas for usr sanityusr1 (uid 501):
Filesystem kbytes quota limit grace files quota limit grace
/mnt/lustre/ [0] [0] [0] - [0] [0] [0] -
Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.
Dmesg on client: LustreError: 16593:0:(osc_quota.c:293:osc_quotactl()) ptlrpc_queue_wait failed, rc: -95 Lustre debug log on client: 00000100:00000001:0.0:1537828311.791857:0:16593:0:(client.c:1262:ptlrpc_check_status()) Process entered 00000100:00000040:0.0:1537828311.791859:0:16593:0:(client.c:1280:ptlrpc_check_status()) @@@ status is -95 req@ffff880056fd43c0 x1612529474213632/t0(0) o19->lustre-OST0000-osc-ffff88004fd1d000@10.9.6.139@tcp:28/4 lens 336/304 e 0 to 0 dl 1537828318 ref 2 fl Rpc:RN/0/0 rc 0/-95 00000100:00000001:0.0:1537828311.791862:0:16593:0:(client.c:1286:ptlrpc_check_status()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1) Syslog on OSS: kernel: [11234.049602] LustreError: 0-0: lustre-OST0000: can't enable quota enforcement since space accounting isn't functional. Please run tunefs.lustre --quota on an unmounted filesystem if not done already Lustre debug log on OSS: 00040000:00000001:1.0:1537828312.078444:0:5504:0:(lquota_lib.c:179:lquotactl_slv()) Process entered 00040000:00000001:1.0:1537828312.078445:0:5504:0:(lquota_lib.c:84:acct_obj_lookup()) Process entered 00000020:00000001:1.0:1537828312.078448:0:5504:0:(lu_object.c:274:lu_object_alloc()) Process entered 00002000:00000001:1.0:1537828312.078449:0:5504:0:(ofd_dev.c:527:ofd_object_alloc()) Process entered 00002000:00000010:1.0:1537828312.078449:0:5504:0:(ofd_dev.c:529:ofd_object_alloc()) slab-alloced 'of': 200 at ffff88007bacd0e0. 00002000:00000001:1.0:1537828312.078450:0:5504:0:(ofd_dev.c:540:ofd_object_alloc()) Process leaving (rc=18446612134389141808 : -131939320409808 : ffff88007bacd130) 00002000:00000001:1.0:1537828312.078452:0:5504:0:(ofd_dev.c:442:ofd_object_init()) Process entered 00002000:00000040:1.0:1537828312.078452:0:5504:0:(ofd_dev.c:445:ofd_object_init()) object init, fid = [0x200000001:0xf:0x0] 00080000:00000010:1.0:1537828312.078454:0:5504:0:(osd_handler.c:345:osd_object_alloc()) kmalloced 'mo': 216 at ffff880061269900. 00002000:00000001:1.0:1537828312.078455:0:5504:0:(ofd_dev.c:454:ofd_object_init()) Process leaving (rc=0 : 0 : 0) 00080000:00000001:1.0:1537828312.078456:0:5504:0:(osd_handler.c:1072:osd_fid_lookup()) Process entered 00080000:00000001:1.0:1537828312.078458:0:5504:0:(osd_oi.c:572:fid_is_on_ost()) Process entered 00080000:00000001:1.0:1537828312.078458:0:5504:0:(osd_oi.c:580:fid_is_on_ost()) Process leaving (rc=0 : 0 : 0) 00000001:00000001:1.0:1537828312.078460:0:5504:0:(osd_quota.c:72:osd_acct_obj_lookup()) Process entered 00000001:00000001:1.0:1537828312.078460:0:5504:0:(osd_quota.c:101:osd_acct_obj_lookup()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) 00080000:00000001:1.0:1537828312.078462:0:5504:0:(osd_handler.c:1125:osd_fid_lookup()) Process leaving via out (rc=0 : 0 : 0x0) 00000020:00000001:1.0:1537828312.078463:0:5504:0:(lu_object.c:326:lu_object_alloc()) Process leaving (rc=18446612134389141808 : -131939320409808 : ffff88007bacd130) 00080000:00000010:1.0:1537828312.078465:0:5504:0:(osd_handler.c:1577:osd_object_free()) kfreed 'obj': 216 at ffff880061269900. 00002000:00000001:1.0:1537828312.078467:0:5504:0:(ofd_dev.c:470:ofd_object_free()) Process entered 00002000:00000040:1.0:1537828312.078467:0:5504:0:(ofd_dev.c:474:ofd_object_free()) object free, fid = [0x200000001:0xf:0x0] 00002000:00000010:1.0:1537828312.078468:0:5504:0:(ofd_dev.c:478:ofd_object_free()) slab-freed '(of)': 200 at ffff88007bacd0e0. 00002000:00000001:1.0:1537828312.078469:0:5504:0:(ofd_dev.c:479:ofd_object_free()) Process leaving 00040000:00000001:1.0:1537828312.078470:0:5504:0:(lquota_lib.c:95:acct_obj_lookup()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) 00040000:00000001:1.0:1537828312.078470:0:5504:0:(lquota_lib.c:199:lquotactl_slv()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1) osd_acct_obj_lookup() returned -2 (-ENOENT) from the following codes: osd_acct_obj_lookup
if (!ldiskfs_valid_inum(sb, id->oii_ino))
RETURN(-ENOENT);
In SLES12 SP3 kernel 4.4.155-94.50.1, ext4_valid_inum() is defined as follows: ext4_valid_inum static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino) { return ino == EXT4_ROOT_INO || (ino >= EXT4_FIRST_INO(sb) && ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count)); } However, in the old kernel 4.4.132, ext4_valid_inum() was defined as follows: ext4_valid_inum static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino) { return ino == EXT4_ROOT_INO || ino == EXT4_USR_QUOTA_INO || ino == EXT4_GRP_QUOTA_INO || ino == EXT4_BOOT_LOADER_INO || ino == EXT4_JOURNAL_INO || ino == EXT4_RESIZE_INO || (ino >= EXT4_FIRST_INO(sb) && ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count)); } Special inodes numbers "EXT4_USR_QUOTA_INO" and "EXT4_GRP_QUOTA_INO" were removed from ext4_valid_inum() in the following patch since kernel version 4.4.140: Continue investigating how to fix this issue. |
| Comment by Andreas Dilger [ 25/Sep/18 ] |
|
It looks like a small change to osd_acct_obj_lookup() to special-case allow the reserved quota inodes should be compatible with both old and new kernels:
if (!ldiskfs_valid_inum(sb, id->oii_ino) &&
id->oii_ino != LDISKFS_USR_QUOTA_INO &&
id->oii_ino != LDISKFS_GRP_QUOTA_INO)
RETURN(-ENOENT);
|
| Comment by Jian Yu [ 26/Sep/18 ] |
|
Yes, Andreas. With the above patch, sanity-sec passed on SLES12 SP3: I incorporated this patch to the kernel update patch for SLES12 SP3 kernel 4.4.155-94.50.1 in |
| Comment by Peter Jones [ 26/Sep/18 ] |
|
Great! So let's close this ticket as a duplicate of |