Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4515

Test failure sanity-quota test_34: Used space for user 60000 is 0, expected 2048

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0
    • Lustre 2.6.0
    • None
    • 3
    • 12354

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run:
      http://maloo.whamcloud.com/test_sets/0c69b0d8-7e9d-11e3-b6d1-52540035b04c
      https://maloo.whamcloud.com/test_sets/e355e8f2-8148-11e3-81ba-52540035b04c

      The sub-test test_34 failed with the following error:

      Used space for user 60000 is 0, expected 2048

      Info required for matching: sanity-quota 34

      Attachments

        Issue Links

          Activity

            [LU-4515] Test failure sanity-quota test_34: Used space for user 60000 is 0, expected 2048

            Closing this bug again. I've moved the re-enabling of test_34 into http://review.whamcloud.com/10989 so that this patch is being verified as fixed.

            adilger Andreas Dilger added a comment - Closing this bug again. I've moved the re-enabling of test_34 into http://review.whamcloud.com/10989 so that this patch is being verified as fixed.

            We can't enable sanity-quota test_34 until the test is fixed by LU-5296 patch http://review.whamcloud.com/10989

            adilger Andreas Dilger added a comment - We can't enable sanity-quota test_34 until the test is fixed by LU-5296 patch http://review.whamcloud.com/10989

            The test_34 need be re-enabled.

            niu Niu Yawei (Inactive) added a comment - The test_34 need be re-enabled.
            niu Niu Yawei (Inactive) added a comment - http://review.whamcloud.com/10957

            Patch landed to Master. Please reopen ticket if more work is needed.

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master. Please reopen ticket if more work is needed.

            Niu, we still need a test to remove test_34 from sanity-quota.sh ALWAYS_EXCEPT. Please just delete that ALWAYS_EXCEPT line, since these exceptions should be added at the top of the function where ALWAYS_EXCEPT is first defined.

            adilger Andreas Dilger added a comment - Niu, we still need a test to remove test_34 from sanity-quota.sh ALWAYS_EXCEPT. Please just delete that ALWAYS_EXCEPT line, since these exceptions should be added at the top of the function where ALWAYS_EXCEPT is first defined.

            The revert helped but a module loading race condition still exist when both client and server are on the same node. A patch exist to fix that issue at http://review.whamcloud.com/#/c/9038.

            simmonsja James A Simmons added a comment - The revert helped but a module loading race condition still exist when both client and server are on the same node. A patch exist to fix that issue at http://review.whamcloud.com/#/c/9038 .

            I saw this on the mds log before s-q test_34:

            Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl  		                   lustre-mdt1/mdt1 /mnt/mds1
            ------------[ cut here ]------------
            WARNING: at fs/proc/generic.c:590 proc_register+0x129/0x220() (Tainted: P        W  ---------------   )
            Hardware name: KVM
            proc_dir_entry 'lustre/osc' already registered
            Modules linked in: osp(U) mdd(U) lfsck(U) lod(U) mdt(U) mgs(U) mgc(U) nodemap(U) osd_zfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfsd exportfs autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
            Pid: 26088, comm: llog_process_th Tainted: P        W  ---------------    2.6.32-358.23.2.el6_lustre.ge80a1ca.x86_64 #1
            Call Trace:
             [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
             [<ffffffff8106e4d6>] ? warn_slowpath_fmt+0x46/0x50
             [<ffffffff811f0299>] ? proc_register+0x129/0x220
             [<ffffffff811f05c2>] ? proc_mkdir_mode+0x42/0x60
             [<ffffffff811f05f6>] ? proc_mkdir+0x16/0x20
             [<ffffffffa0718c00>] ? lprocfs_seq_register+0x20/0x80 [obdclass]
             [<ffffffffa070a203>] ? class_search_type+0x53/0xa0 [obdclass]
             [<ffffffffa1098f0a>] ? osp_lprocfs_init+0x30a/0x530 [osp]
             [<ffffffffa1085c3a>] ? osp_device_alloc+0xfaa/0x29e0 [osp]
             [<ffffffffa0746263>] ? lu_context_init+0xa3/0x240 [obdclass]
             [<ffffffffa072ce2f>] ? obd_setup+0x1bf/0x290 [obdclass]
             [<ffffffffa072d108>] ? class_setup+0x208/0x870 [obdclass]
             [<ffffffffa073567c>] ? class_process_config+0xc6c/0x1ad0 [obdclass]
             [<ffffffffa072e263>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass]
             [<ffffffffa07375fb>] ? class_config_llog_handler+0xa2b/0x1880 [obdclass]
             [<ffffffff8150faee>] ? mutex_lock+0x1e/0x50
             [<ffffffffa06f3d5b>] ? llog_process_thread+0x8ab/0xd60 [obdclass]
             [<ffffffffa0741e4f>] ? keys_fill+0x6f/0x190 [obdclass]
             [<ffffffffa06f4255>] ? llog_process_thread_daemonize+0x45/0x70 [obdclass]
             [<ffffffffa06f4210>] ? llog_process_thread_daemonize+0x0/0x70 [obdclass]
             [<ffffffff81096a36>] ? kthread+0x96/0xa0
             [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
             [<ffffffff810969a0>] ? kthread+0x0/0xa0
             [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
            ---[ end trace f9bd04d09a084708 ]---
            

            Looks there is something wrong with the osp proc files, which makes wait_delete_completed() out of work.

            This should be introduced by LU-3319. Given that the patch "LU-3319 procfs: move osp proc handling to seq_files" (which exactly caused this problem) has been reverted, I think we can add back test_34 now.

            niu Niu Yawei (Inactive) added a comment - I saw this on the mds log before s-q test_34: Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl lustre-mdt1/mdt1 /mnt/mds1 ------------[ cut here ]------------ WARNING: at fs/proc/ generic .c:590 proc_register+0x129/0x220() (Tainted: P W --------------- ) Hardware name: KVM proc_dir_entry 'lustre/osc' already registered Modules linked in: osp(U) mdd(U) lfsck(U) lod(U) mdt(U) mgs(U) mgc(U) nodemap(U) osd_zfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfsd exportfs autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] Pid: 26088, comm: llog_process_th Tainted: P W --------------- 2.6.32-358.23.2.el6_lustre.ge80a1ca.x86_64 #1 Call Trace: [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0 [<ffffffff8106e4d6>] ? warn_slowpath_fmt+0x46/0x50 [<ffffffff811f0299>] ? proc_register+0x129/0x220 [<ffffffff811f05c2>] ? proc_mkdir_mode+0x42/0x60 [<ffffffff811f05f6>] ? proc_mkdir+0x16/0x20 [<ffffffffa0718c00>] ? lprocfs_seq_register+0x20/0x80 [obdclass] [<ffffffffa070a203>] ? class_search_type+0x53/0xa0 [obdclass] [<ffffffffa1098f0a>] ? osp_lprocfs_init+0x30a/0x530 [osp] [<ffffffffa1085c3a>] ? osp_device_alloc+0xfaa/0x29e0 [osp] [<ffffffffa0746263>] ? lu_context_init+0xa3/0x240 [obdclass] [<ffffffffa072ce2f>] ? obd_setup+0x1bf/0x290 [obdclass] [<ffffffffa072d108>] ? class_setup+0x208/0x870 [obdclass] [<ffffffffa073567c>] ? class_process_config+0xc6c/0x1ad0 [obdclass] [<ffffffffa072e263>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass] [<ffffffffa07375fb>] ? class_config_llog_handler+0xa2b/0x1880 [obdclass] [<ffffffff8150faee>] ? mutex_lock+0x1e/0x50 [<ffffffffa06f3d5b>] ? llog_process_thread+0x8ab/0xd60 [obdclass] [<ffffffffa0741e4f>] ? keys_fill+0x6f/0x190 [obdclass] [<ffffffffa06f4255>] ? llog_process_thread_daemonize+0x45/0x70 [obdclass] [<ffffffffa06f4210>] ? llog_process_thread_daemonize+0x0/0x70 [obdclass] [<ffffffff81096a36>] ? kthread+0x96/0xa0 [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 [<ffffffff810969a0>] ? kthread+0x0/0xa0 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 ---[ end trace f9bd04d09a084708 ]--- Looks there is something wrong with the osp proc files, which makes wait_delete_completed() out of work. This should be introduced by LU-3319 . Given that the patch " LU-3319 procfs: move osp proc handling to seq_files" (which exactly caused this problem) has been reverted, I think we can add back test_34 now.

            Jodi, how has this test been disabled? Could this be related to issues with TEI-1403?

            utopiabound Nathaniel Clark added a comment - Jodi, how has this test been disabled? Could this be related to issues with TEI-1403?

            Test has been disabled. Will re-enable once fix is in place.

            jlevi Jodi Levi (Inactive) added a comment - Test has been disabled. Will re-enable once fix is in place.

            People

              niu Niu Yawei (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: