Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11396

sanity-sec test 16 fails with 'test_16 returned 1'

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.12.0
    • SUSE
    • 3
    • 9223372036854775807

    Description

      sanity-sec test_16, 17, 18, 19, 20, 21, and 22 started failing on 2018-09-01, so far, only for SUSE 12.3. All tests have a common error found in the client test_log

      trevis-3vm10: running as uid/gid/euid/egid 0/0/0/0, groups: 0
      trevis-3vm10:  [lfs] [quota] [-q] [/mnt/lustre]
      trevis-3vm10: Unexpected quotactl error: Operation not supported
      trevis-3vm10: Unexpected quotactl error: Operation not supported
      /usr/lib64/lustre/tests/sanity-sec.sh: line 1261: [0]: syntax error: operand expected (error token is "[0]")
      

      sanity-sec line 1261 is in the do_fops_quota_test() routine

      1256 do_fops_quota_test() {
      1257         local run_u=$1
      1258         # fuzz quota used to account for possible indirect blocks, etc
      1259         local quota_fuzz=$(fs_log_size)
      1260         local qused_orig=$(nodemap_check_quota "$run_u")
      1261         local qused_high=$((qused_orig + quota_fuzz))
      1262         local qused_low=$((qused_orig - quota_fuzz))
      1263         local testfile=$DIR/$tdir/$tfile
      …
      

      There’s nothing interesting in the node console logs and there is a lack of any other logs for these failures.

      Logs for some of these failures are at
      https://testing.whamcloud.com/test_sets/78bdbbf8-b921-11e8-9df3-52540065bddc
      https://testing.whamcloud.com/test_sets/b3a081de-ae08-11e8-bfed-52540065bddc
      https://testing.whamcloud.com/test_sets/a85560e6-ae22-11e8-bfed-52540065bddc
      https://testing.whamcloud.com/test_sets/56ff85f6-ae44-11e8-bd05-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-11396] sanity-sec test 16 fails with 'test_16 returned 1'
            pjones Peter Jones added a comment -

            Great! So let's close this ticket as a duplicate of LU-11412 then.

            pjones Peter Jones added a comment - Great! So let's close this ticket as a duplicate of LU-11412 then.
            yujian Jian Yu added a comment -

            Yes, Andreas. With the above patch, sanity-sec passed on SLES12 SP3:
            https://testing.whamcloud.com/test_sets/72483bc8-c12c-11e8-a9d9-52540065bddc

            I incorporated this patch to the kernel update patch for SLES12 SP3 kernel 4.4.155-94.50.1 in LU-11412: https://review.whamcloud.com/33236

            yujian Jian Yu added a comment - Yes, Andreas. With the above patch, sanity-sec passed on SLES12 SP3: https://testing.whamcloud.com/test_sets/72483bc8-c12c-11e8-a9d9-52540065bddc I incorporated this patch to the kernel update patch for SLES12 SP3 kernel 4.4.155-94.50.1 in LU-11412 : https://review.whamcloud.com/33236

            It looks like a small change to osd_acct_obj_lookup() to special-case allow the reserved quota inodes should be compatible with both old and new kernels:

                    if (!ldiskfs_valid_inum(sb, id->oii_ino) &&
                        id->oii_ino != LDISKFS_USR_QUOTA_INO &&
                        id->oii_ino != LDISKFS_GRP_QUOTA_INO)
                            RETURN(-ENOENT);
            
            adilger Andreas Dilger added a comment - It looks like a small change to osd_acct_obj_lookup() to special-case allow the reserved quota inodes should be compatible with both old and new kernels: if (!ldiskfs_valid_inum(sb, id->oii_ino) && id->oii_ino != LDISKFS_USR_QUOTA_INO && id->oii_ino != LDISKFS_GRP_QUOTA_INO) RETURN(-ENOENT);
            yujian Jian Yu added a comment -

            After updating SLES12 SP3 kernel version to 4.4.155-94.50.1, the "Unexpected quotactl error: Operation not supported" issue still occurs:

            # lfs quota -u sanityusr1 /mnt/lustre/
            Unexpected quotactl error: Operation not supported
            Disk quotas for usr sanityusr1 (uid 501):
                 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
               /mnt/lustre/     [0]     [0]     [0]       -     [0]     [0]     [0]       -
            Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.
            

            Dmesg on client:

            LustreError: 16593:0:(osc_quota.c:293:osc_quotactl()) ptlrpc_queue_wait failed, rc: -95
            

            Lustre debug log on client:

            00000100:00000001:0.0:1537828311.791857:0:16593:0:(client.c:1262:ptlrpc_check_status()) Process entered
            00000100:00000040:0.0:1537828311.791859:0:16593:0:(client.c:1280:ptlrpc_check_status()) @@@ status is -95  req@ffff880056fd43c0 x1612529474213632/t0(0) o19->lustre-OST0000-osc-ffff88004fd1d000@10.9.6.139@tcp:28/4 lens 336/304 e 0 to 0 dl 1537828318 ref 2 fl Rpc:RN/0/0 rc 0/-95
            00000100:00000001:0.0:1537828311.791862:0:16593:0:(client.c:1286:ptlrpc_check_status()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1)
            

            Syslog on OSS:

            kernel: [11234.049602] LustreError: 0-0: lustre-OST0000: can't enable quota enforcement since space accounting isn't functional. Please run tunefs.lustre --quota on an unmounted filesystem if not done already
            

            Lustre debug log on OSS:

            00040000:00000001:1.0:1537828312.078444:0:5504:0:(lquota_lib.c:179:lquotactl_slv()) Process entered
            00040000:00000001:1.0:1537828312.078445:0:5504:0:(lquota_lib.c:84:acct_obj_lookup()) Process entered
            00000020:00000001:1.0:1537828312.078448:0:5504:0:(lu_object.c:274:lu_object_alloc()) Process entered
            00002000:00000001:1.0:1537828312.078449:0:5504:0:(ofd_dev.c:527:ofd_object_alloc()) Process entered
            00002000:00000010:1.0:1537828312.078449:0:5504:0:(ofd_dev.c:529:ofd_object_alloc()) slab-alloced 'of': 200 at ffff88007bacd0e0.
            00002000:00000001:1.0:1537828312.078450:0:5504:0:(ofd_dev.c:540:ofd_object_alloc()) Process leaving (rc=18446612134389141808 : -131939320409808 : ffff88007bacd130)
            00002000:00000001:1.0:1537828312.078452:0:5504:0:(ofd_dev.c:442:ofd_object_init()) Process entered
            00002000:00000040:1.0:1537828312.078452:0:5504:0:(ofd_dev.c:445:ofd_object_init()) object init, fid = [0x200000001:0xf:0x0]
            00080000:00000010:1.0:1537828312.078454:0:5504:0:(osd_handler.c:345:osd_object_alloc()) kmalloced 'mo': 216 at ffff880061269900.
            00002000:00000001:1.0:1537828312.078455:0:5504:0:(ofd_dev.c:454:ofd_object_init()) Process leaving (rc=0 : 0 : 0)
            00080000:00000001:1.0:1537828312.078456:0:5504:0:(osd_handler.c:1072:osd_fid_lookup()) Process entered
            00080000:00000001:1.0:1537828312.078458:0:5504:0:(osd_oi.c:572:fid_is_on_ost()) Process entered
            00080000:00000001:1.0:1537828312.078458:0:5504:0:(osd_oi.c:580:fid_is_on_ost()) Process leaving (rc=0 : 0 : 0)
            00000001:00000001:1.0:1537828312.078460:0:5504:0:(osd_quota.c:72:osd_acct_obj_lookup()) Process entered
            00000001:00000001:1.0:1537828312.078460:0:5504:0:(osd_quota.c:101:osd_acct_obj_lookup()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)
            00080000:00000001:1.0:1537828312.078462:0:5504:0:(osd_handler.c:1125:osd_fid_lookup()) Process leaving via out (rc=0 : 0 : 0x0)
            00000020:00000001:1.0:1537828312.078463:0:5504:0:(lu_object.c:326:lu_object_alloc()) Process leaving (rc=18446612134389141808 : -131939320409808 : ffff88007bacd130)
            00080000:00000010:1.0:1537828312.078465:0:5504:0:(osd_handler.c:1577:osd_object_free()) kfreed 'obj': 216 at ffff880061269900.
            00002000:00000001:1.0:1537828312.078467:0:5504:0:(ofd_dev.c:470:ofd_object_free()) Process entered
            00002000:00000040:1.0:1537828312.078467:0:5504:0:(ofd_dev.c:474:ofd_object_free()) object free, fid = [0x200000001:0xf:0x0]
            00002000:00000010:1.0:1537828312.078468:0:5504:0:(ofd_dev.c:478:ofd_object_free()) slab-freed '(of)': 200 at ffff88007bacd0e0.
            00002000:00000001:1.0:1537828312.078469:0:5504:0:(ofd_dev.c:479:ofd_object_free()) Process leaving
            00040000:00000001:1.0:1537828312.078470:0:5504:0:(lquota_lib.c:95:acct_obj_lookup()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)
            00040000:00000001:1.0:1537828312.078470:0:5504:0:(lquota_lib.c:199:lquotactl_slv()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1)
            

            osd_acct_obj_lookup() returned -2 (-ENOENT) from the following codes:

            osd_acct_obj_lookup
                    if (!ldiskfs_valid_inum(sb, id->oii_ino))
                            RETURN(-ENOENT);
            

            In SLES12 SP3 kernel 4.4.155-94.50.1, ext4_valid_inum() is defined as follows:

            ext4_valid_inum
            static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
            {
                    return ino == EXT4_ROOT_INO ||
                            (ino >= EXT4_FIRST_INO(sb) &&
                             ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count));
            }
            

            However, in the old kernel 4.4.132, ext4_valid_inum() was defined as follows:

            ext4_valid_inum
            static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
            {
            	return ino == EXT4_ROOT_INO ||
            		ino == EXT4_USR_QUOTA_INO ||
            		ino == EXT4_GRP_QUOTA_INO ||
            		ino == EXT4_BOOT_LOADER_INO ||
            		ino == EXT4_JOURNAL_INO ||
            		ino == EXT4_RESIZE_INO ||
            		(ino >= EXT4_FIRST_INO(sb) &&
            		 ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count));
            }
            

            Special inodes numbers "EXT4_USR_QUOTA_INO" and "EXT4_GRP_QUOTA_INO" were removed from ext4_valid_inum() in the following patch since kernel version 4.4.140:
            http://patchwork.ozlabs.org/patch/930637/

            Continue investigating how to fix this issue.

            yujian Jian Yu added a comment - After updating SLES12 SP3 kernel version to 4.4.155-94.50.1, the "Unexpected quotactl error: Operation not supported" issue still occurs: # lfs quota -u sanityusr1 /mnt/lustre/ Unexpected quotactl error: Operation not supported Disk quotas for usr sanityusr1 (uid 501): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre/ [0] [0] [0] - [0] [0] [0] - Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate. Dmesg on client: LustreError: 16593:0:(osc_quota.c:293:osc_quotactl()) ptlrpc_queue_wait failed, rc: -95 Lustre debug log on client: 00000100:00000001:0.0:1537828311.791857:0:16593:0:(client.c:1262:ptlrpc_check_status()) Process entered 00000100:00000040:0.0:1537828311.791859:0:16593:0:(client.c:1280:ptlrpc_check_status()) @@@ status is -95 req@ffff880056fd43c0 x1612529474213632/t0(0) o19->lustre-OST0000-osc-ffff88004fd1d000@10.9.6.139@tcp:28/4 lens 336/304 e 0 to 0 dl 1537828318 ref 2 fl Rpc:RN/0/0 rc 0/-95 00000100:00000001:0.0:1537828311.791862:0:16593:0:(client.c:1286:ptlrpc_check_status()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1) Syslog on OSS: kernel: [11234.049602] LustreError: 0-0: lustre-OST0000: can't enable quota enforcement since space accounting isn't functional. Please run tunefs.lustre --quota on an unmounted filesystem if not done already Lustre debug log on OSS: 00040000:00000001:1.0:1537828312.078444:0:5504:0:(lquota_lib.c:179:lquotactl_slv()) Process entered 00040000:00000001:1.0:1537828312.078445:0:5504:0:(lquota_lib.c:84:acct_obj_lookup()) Process entered 00000020:00000001:1.0:1537828312.078448:0:5504:0:(lu_object.c:274:lu_object_alloc()) Process entered 00002000:00000001:1.0:1537828312.078449:0:5504:0:(ofd_dev.c:527:ofd_object_alloc()) Process entered 00002000:00000010:1.0:1537828312.078449:0:5504:0:(ofd_dev.c:529:ofd_object_alloc()) slab-alloced 'of': 200 at ffff88007bacd0e0. 00002000:00000001:1.0:1537828312.078450:0:5504:0:(ofd_dev.c:540:ofd_object_alloc()) Process leaving (rc=18446612134389141808 : -131939320409808 : ffff88007bacd130) 00002000:00000001:1.0:1537828312.078452:0:5504:0:(ofd_dev.c:442:ofd_object_init()) Process entered 00002000:00000040:1.0:1537828312.078452:0:5504:0:(ofd_dev.c:445:ofd_object_init()) object init, fid = [0x200000001:0xf:0x0] 00080000:00000010:1.0:1537828312.078454:0:5504:0:(osd_handler.c:345:osd_object_alloc()) kmalloced 'mo': 216 at ffff880061269900. 00002000:00000001:1.0:1537828312.078455:0:5504:0:(ofd_dev.c:454:ofd_object_init()) Process leaving (rc=0 : 0 : 0) 00080000:00000001:1.0:1537828312.078456:0:5504:0:(osd_handler.c:1072:osd_fid_lookup()) Process entered 00080000:00000001:1.0:1537828312.078458:0:5504:0:(osd_oi.c:572:fid_is_on_ost()) Process entered 00080000:00000001:1.0:1537828312.078458:0:5504:0:(osd_oi.c:580:fid_is_on_ost()) Process leaving (rc=0 : 0 : 0) 00000001:00000001:1.0:1537828312.078460:0:5504:0:(osd_quota.c:72:osd_acct_obj_lookup()) Process entered 00000001:00000001:1.0:1537828312.078460:0:5504:0:(osd_quota.c:101:osd_acct_obj_lookup()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) 00080000:00000001:1.0:1537828312.078462:0:5504:0:(osd_handler.c:1125:osd_fid_lookup()) Process leaving via out (rc=0 : 0 : 0x0) 00000020:00000001:1.0:1537828312.078463:0:5504:0:(lu_object.c:326:lu_object_alloc()) Process leaving (rc=18446612134389141808 : -131939320409808 : ffff88007bacd130) 00080000:00000010:1.0:1537828312.078465:0:5504:0:(osd_handler.c:1577:osd_object_free()) kfreed 'obj': 216 at ffff880061269900. 00002000:00000001:1.0:1537828312.078467:0:5504:0:(ofd_dev.c:470:ofd_object_free()) Process entered 00002000:00000040:1.0:1537828312.078467:0:5504:0:(ofd_dev.c:474:ofd_object_free()) object free, fid = [0x200000001:0xf:0x0] 00002000:00000010:1.0:1537828312.078468:0:5504:0:(ofd_dev.c:478:ofd_object_free()) slab-freed '(of)': 200 at ffff88007bacd0e0. 00002000:00000001:1.0:1537828312.078469:0:5504:0:(ofd_dev.c:479:ofd_object_free()) Process leaving 00040000:00000001:1.0:1537828312.078470:0:5504:0:(lquota_lib.c:95:acct_obj_lookup()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) 00040000:00000001:1.0:1537828312.078470:0:5504:0:(lquota_lib.c:199:lquotactl_slv()) Process leaving (rc=18446744073709551521 : -95 : ffffffffffffffa1) osd_acct_obj_lookup() returned -2 (-ENOENT) from the following codes: osd_acct_obj_lookup if (!ldiskfs_valid_inum(sb, id->oii_ino)) RETURN(-ENOENT); In SLES12 SP3 kernel 4.4.155-94.50.1, ext4_valid_inum() is defined as follows: ext4_valid_inum static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino) { return ino == EXT4_ROOT_INO || (ino >= EXT4_FIRST_INO(sb) && ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count)); } However, in the old kernel 4.4.132, ext4_valid_inum() was defined as follows: ext4_valid_inum static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino) { return ino == EXT4_ROOT_INO || ino == EXT4_USR_QUOTA_INO || ino == EXT4_GRP_QUOTA_INO || ino == EXT4_BOOT_LOADER_INO || ino == EXT4_JOURNAL_INO || ino == EXT4_RESIZE_INO || (ino >= EXT4_FIRST_INO(sb) && ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count)); } Special inodes numbers "EXT4_USR_QUOTA_INO" and "EXT4_GRP_QUOTA_INO" were removed from ext4_valid_inum() in the following patch since kernel version 4.4.140: http://patchwork.ozlabs.org/patch/930637/ Continue investigating how to fix this issue.
            pjones Peter Jones added a comment -

            Same as LU-11395 - lines up with kernel change

            pjones Peter Jones added a comment - Same as LU-11395 - lines up with kernel change
            sarah Sarah Liu added a comment - same issue for ost-pools https://testing.whamcloud.com/test_sets/e233a962-b935-11e8-9df3-52540065bddc
            jamesanunez James Nunez (Inactive) added a comment - - edited

            It looks like sanity-quota is hitting a similar issue. Tests 1, 3, 4a, 6, 7a, 7b, 7c, 7d, 11, 12a, 13, 15, 17, 19, 20, 23, 24, 27b, 27c, 27d, 30, 33, 34, 35, 37, 38, 55, 60, and 61 fail with similar errors.

            For https://testing.whamcloud.com/test_sets/78bdbbf8-b921-11e8-9df3-52540065bddc, from the suite_log, we see errors like

            Unexpected quotactl error: Operation not supported
            /usr/lib64/lustre/tests/sanity-quota.sh: line 821: [: [0]: integer expression expected
            

            ost-pools test 23a has the same issue with quotas

            CMD: trevis-40vm4 /usr/sbin/lctl conf_param lustre.quota.ost=ug
            Unexpected quotactl error: Operation not supported
            Disk quotas for usr 500 (uid 500):
                 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
            /mnt/lustre/d23a.ost-pools/dir
                                [0]     [0]     [0]       -     [0]     [0]     [0]       -
            Total allocated inode limit: 0, total allocated block limit: 0
            Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.
            -rw-r--r-- 1 sanityusr sanityusr 0 Sep 15 06:29 /mnt/lustre/d23a.ost-pools/dir/f23a.ost-pools-quota
            first dd failed with EDQUOT.
            Unexpected quotactl error: Operation not supported
            Disk quotas for usr 500 (uid 500):
                 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
            /mnt/lustre/d23a.ost-pools/dir
                                [0]     [0]     [0]       -     [0]     [0]     [0]       -
            Total allocated inode limit: 0, total allocated block limit: 0
            Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.
             ost-pools test_23a: @@@@@@ FAIL: test_23a failed with 95 
            
            jamesanunez James Nunez (Inactive) added a comment - - edited It looks like sanity-quota is hitting a similar issue. Tests 1, 3, 4a, 6, 7a, 7b, 7c, 7d, 11, 12a, 13, 15, 17, 19, 20, 23, 24, 27b, 27c, 27d, 30, 33, 34, 35, 37, 38, 55, 60, and 61 fail with similar errors. For https://testing.whamcloud.com/test_sets/78bdbbf8-b921-11e8-9df3-52540065bddc , from the suite_log, we see errors like Unexpected quotactl error: Operation not supported /usr/lib64/lustre/tests/sanity-quota.sh: line 821: [: [0]: integer expression expected ost-pools test 23a has the same issue with quotas CMD: trevis-40vm4 /usr/sbin/lctl conf_param lustre.quota.ost=ug Unexpected quotactl error: Operation not supported Disk quotas for usr 500 (uid 500): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre/d23a.ost-pools/dir [0] [0] [0] - [0] [0] [0] - Total allocated inode limit: 0, total allocated block limit: 0 Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate. -rw-r--r-- 1 sanityusr sanityusr 0 Sep 15 06:29 /mnt/lustre/d23a.ost-pools/dir/f23a.ost-pools-quota first dd failed with EDQUOT. Unexpected quotactl error: Operation not supported Disk quotas for usr 500 (uid 500): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre/d23a.ost-pools/dir [0] [0] [0] - [0] [0] [0] - Total allocated inode limit: 0, total allocated block limit: 0 Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate. ost-pools test_23a: @@@@@@ FAIL: test_23a failed with 95

            People

              yujian Jian Yu
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: