[LU-11396] sanity-sec test 16 fails with 'test_16 returned 1' Created: 18/Sep/18  Updated: 19/Dec/18  Resolved: 26/Sep/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Jian Yu
Resolution: Duplicate Votes: 0
Labels: sles12, suse
Environment:

SUSE


Issue Links:
Related
is related to LU-11412 kernel update [SLES12 SP3 4.4.155-94.... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-sec test_16, 17, 18, 19, 20, 21, and 22 started failing on 2018-09-01, so far, only for SUSE 12.3. All tests have a common error found in the client test_log

trevis-3vm10: running as uid/gid/euid/egid 0/0/0/0, groups: 0
trevis-3vm10:  [lfs] [quota] [-q] [/mnt/lustre]
trevis-3vm10: Unexpected quotactl error: Operation not supported
trevis-3vm10: Unexpected quotactl error: Operation not supported
/usr/lib64/lustre/tests/sanity-sec.sh: line 1261: [0]: syntax error: operand expected (error token is "[0]")

sanity-sec line 1261 is in the do_fops_quota_test() routine

1256 do_fops_quota_test() {
1257         local run_u=$1
1258         # fuzz quota used to account for possible indirect blocks, etc
1259         local quota_fuzz=$(fs_log_size)
1260         local qused_orig=$(nodemap_check_quota "$run_u")
1261         local qused_high=$((qused_orig + quota_fuzz))
1262         local qused_low=$((qused_orig - quota_fuzz))
1263         local testfile=$DIR/$tdir/$tfile
…

There’s nothing interesting in the node console logs and there is a lack of any other logs for these failures.

Logs for some of these failures are at
https://testing.whamcloud.com/test_sets/78bdbbf8-b921-11e8-9df3-52540065bddc
https://testing.whamcloud.com/test_sets/b3a081de-ae08-11e8-bfed-52540065bddc
https://testing.whamcloud.com/test_sets/a85560e6-ae22-11e8-bfed-52540065bddc
https://testing.whamcloud.com/test_sets/56ff85f6-ae44-11e8-bd05-52540065bddc



 Comments   
Comment by James Nunez (Inactive) [ 18/Sep/18 ]

It looks like sanity-quota is hitting a similar issue. Tests 1, 3, 4a, 6, 7a, 7b, 7c, 7d, 11, 12a, 13, 15, 17, 19, 20, 23, 24, 27b, 27c, 27d, 30, 33, 34, 35, 37, 38, 55, 60, and 61 fail with similar errors.

For https://testing.whamcloud.com/test_sets/78bdbbf8-b921-11e8-9df3-52540065bddc, from the suite_log, we see errors like

Unexpected quotactl error: Operation not supported
/usr/lib64/lustre/tests/sanity-quota.sh: line 821: [: [0]: integer expression expected

ost-pools test 23a has the same issue with quotas

CMD: trevis-40vm4 /usr/sbin/lctl conf_param lustre.quota.ost=ug
Unexpected quotactl error: Operation not supported
Disk quotas for usr 500 (uid 500):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
/mnt/lustre/d23a.ost-pools/dir
                    [0]     [0]     [0]       -     [0]     [0]     [0]       -
Total allocated inode limit: 0, total allocated block limit: 0
Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.
-rw-r--r-- 1 sanityusr sanityusr 0 Sep 15 06:29 /mnt/lustre/d23a.ost-pools/dir/f23a.ost-pools-quota
first dd failed with EDQUOT.
Unexpected quotactl error: Operation not supported
Disk quotas for usr 500 (uid 500):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
/mnt/lustre/d23a.ost-pools/dir
                    [0]     [0]     [0]       -     [0]     [0]     [0]       -
Total allocated inode limit: 0, total allocated block limit: 0
Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.
 ost-pools test_23a: @@@@@@ FAIL: test_23a failed with 95 
Comment by Sarah Liu [ 18/Sep/18 ]

same issue for ost-pools
https://testing.whamcloud.com/test_sets/e233a962-b935-11e8-9df3-52540065bddc

Comment by Peter Jones [ 19/Sep/18 ]

Same as LU-11395 - lines up with kernel change

Comment by Jian Yu [ 25/Sep/18 ]

After updating SLES12 SP3 kernel version to 4.4.155-94.50.1, the "Unexpected quotactl error: Operation not supported" issue still occurs:

# lfs quota -u sanityusr1 /mnt/lustre/
Unexpected quotactl error: Operation not supported
Disk quotas for usr sanityusr1 (uid 501):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
   /mnt/lustre/     [0]     [0]     [0]       -     [0]     [0]     [0]       -
Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.

Dmesg on client:

LustreError: 16593:0:(osc_quota.c:293:osc_quotactl()) ptlrpc_queue_wait failed, rc: -95

Lustre debug log on client:

00000100:00000001:0.0:1537828311.791857:0:16593:0:(client.c:1262:ptlrpc_check_status()) Process entered
00000100:00000040:0.0:1537828311.791859:0:16593:0:(client.c:1280:ptlrpc_check_status()) @@@ status is -95  req@ffff880056fd43c0 x1612529474213632/t0(0) o19->lustre-OST0000-osc-ffff88004fd1d000@10.9.6.139@tcp:28/4 lens 336/304 e 0 to 0 dl 1537828318 ref 2 fl Rpc:RN/0/0 rc 0/-95
00000100:00000001:0.0:1537828311.791862:0:16593:0:(client.c:1286:ptlrpc_check_status()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1)

Syslog on OSS:

kernel: [11234.049602] LustreError: 0-0: lustre-OST0000: can't enable quota enforcement since space accounting isn't functional. Please run tunefs.lustre --quota on an unmounted filesystem if not done already

Lustre debug log on OSS:

00040000:00000001:1.0:1537828312.078444:0:5504:0:(lquota_lib.c:179:lquotactl_slv()) Process entered
00040000:00000001:1.0:1537828312.078445:0:5504:0:(lquota_lib.c:84:acct_obj_lookup()) Process entered
00000020:00000001:1.0:1537828312.078448:0:5504:0:(lu_object.c:274:lu_object_alloc()) Process entered
00002000:00000001:1.0:1537828312.078449:0:5504:0:(ofd_dev.c:527:ofd_object_alloc()) Process entered
00002000:00000010:1.0:1537828312.078449:0:5504:0:(ofd_dev.c:529:ofd_object_alloc()) slab-alloced 'of': 200 at ffff88007bacd0e0.
00002000:00000001:1.0:1537828312.078450:0:5504:0:(ofd_dev.c:540:ofd_object_alloc()) Process leaving (rc=18446612134389141808 : -131939320409808 : ffff88007bacd130)
00002000:00000001:1.0:1537828312.078452:0:5504:0:(ofd_dev.c:442:ofd_object_init()) Process entered
00002000:00000040:1.0:1537828312.078452:0:5504:0:(ofd_dev.c:445:ofd_object_init()) object init, fid = [0x200000001:0xf:0x0]
00080000:00000010:1.0:1537828312.078454:0:5504:0:(osd_handler.c:345:osd_object_alloc()) kmalloced 'mo': 216 at ffff880061269900.
00002000:00000001:1.0:1537828312.078455:0:5504:0:(ofd_dev.c:454:ofd_object_init()) Process leaving (rc=0 : 0 : 0)
00080000:00000001:1.0:1537828312.078456:0:5504:0:(osd_handler.c:1072:osd_fid_lookup()) Process entered
00080000:00000001:1.0:1537828312.078458:0:5504:0:(osd_oi.c:572:fid_is_on_ost()) Process entered
00080000:00000001:1.0:1537828312.078458:0:5504:0:(osd_oi.c:580:fid_is_on_ost()) Process leaving (rc=0 : 0 : 0)
00000001:00000001:1.0:1537828312.078460:0:5504:0:(osd_quota.c:72:osd_acct_obj_lookup()) Process entered
00000001:00000001:1.0:1537828312.078460:0:5504:0:(osd_quota.c:101:osd_acct_obj_lookup()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)
00080000:00000001:1.0:1537828312.078462:0:5504:0:(osd_handler.c:1125:osd_fid_lookup()) Process leaving via out (rc=0 : 0 : 0x0)
00000020:00000001:1.0:1537828312.078463:0:5504:0:(lu_object.c:326:lu_object_alloc()) Process leaving (rc=18446612134389141808 : -131939320409808 : ffff88007bacd130)
00080000:00000010:1.0:1537828312.078465:0:5504:0:(osd_handler.c:1577:osd_object_free()) kfreed 'obj': 216 at ffff880061269900.
00002000:00000001:1.0:1537828312.078467:0:5504:0:(ofd_dev.c:470:ofd_object_free()) Process entered
00002000:00000040:1.0:1537828312.078467:0:5504:0:(ofd_dev.c:474:ofd_object_free()) object free, fid = [0x200000001:0xf:0x0]
00002000:00000010:1.0:1537828312.078468:0:5504:0:(ofd_dev.c:478:ofd_object_free()) slab-freed '(of)': 200 at ffff88007bacd0e0.
00002000:00000001:1.0:1537828312.078469:0:5504:0:(ofd_dev.c:479:ofd_object_free()) Process leaving
00040000:00000001:1.0:1537828312.078470:0:5504:0:(lquota_lib.c:95:acct_obj_lookup()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)
00040000:00000001:1.0:1537828312.078470:0:5504:0:(lquota_lib.c:199:lquotactl_slv()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1)

osd_acct_obj_lookup() returned -2 (-ENOENT) from the following codes:

osd_acct_obj_lookup
        if (!ldiskfs_valid_inum(sb, id->oii_ino))
                RETURN(-ENOENT);

In SLES12 SP3 kernel 4.4.155-94.50.1, ext4_valid_inum() is defined as follows:

ext4_valid_inum
static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
{
        return ino == EXT4_ROOT_INO ||
                (ino >= EXT4_FIRST_INO(sb) &&
                 ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count));
}

However, in the old kernel 4.4.132, ext4_valid_inum() was defined as follows:

ext4_valid_inum
static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
{
	return ino == EXT4_ROOT_INO ||
		ino == EXT4_USR_QUOTA_INO ||
		ino == EXT4_GRP_QUOTA_INO ||
		ino == EXT4_BOOT_LOADER_INO ||
		ino == EXT4_JOURNAL_INO ||
		ino == EXT4_RESIZE_INO ||
		(ino >= EXT4_FIRST_INO(sb) &&
		 ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count));
}

Special inodes numbers "EXT4_USR_QUOTA_INO" and "EXT4_GRP_QUOTA_INO" were removed from ext4_valid_inum() in the following patch since kernel version 4.4.140:
http://patchwork.ozlabs.org/patch/930637/

Continue investigating how to fix this issue.

Comment by Andreas Dilger [ 25/Sep/18 ]

It looks like a small change to osd_acct_obj_lookup() to special-case allow the reserved quota inodes should be compatible with both old and new kernels:

        if (!ldiskfs_valid_inum(sb, id->oii_ino) &&
            id->oii_ino != LDISKFS_USR_QUOTA_INO &&
            id->oii_ino != LDISKFS_GRP_QUOTA_INO)
                RETURN(-ENOENT);
Comment by Jian Yu [ 26/Sep/18 ]

Yes, Andreas. With the above patch, sanity-sec passed on SLES12 SP3:
https://testing.whamcloud.com/test_sets/72483bc8-c12c-11e8-a9d9-52540065bddc

I incorporated this patch to the kernel update patch for SLES12 SP3 kernel 4.4.155-94.50.1 in LU-11412: https://review.whamcloud.com/33236

Comment by Peter Jones [ 26/Sep/18 ]

Great! So let's close this ticket as a duplicate of LU-11412 then.

Generated at Sat Feb 10 02:43:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.