Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1484

Test failure on test suite recovery-small, subtest test_57

Details

    • 3
    • 4529

    Description

      This issue was created by maloo for yujian <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/743bea58-af48-11e1-a585-52540035b04c.

      The sub-test test_57 failed with the following error:

      == recovery-small test 57: read procfs entries causes kernel crash =================================== 05:43:48 (1338900228)
      fail_loc=0x80000B00
      Stopping client client-28vm6.lab.whamcloud.com /mnt/lustre (opts

      test failed to respond and timed out

      Info required for matching: recovery-small 57

      Attachments

        1. config.h
          18 kB
        2. config.log
          589 kB

        Issue Links

          Activity

            [LU-1484] Test failure on test suite recovery-small, subtest test_57
            pjones Peter Jones added a comment -

            Landed for 1.8.9 and 2.1.5

            pjones Peter Jones added a comment - Landed for 1.8.9 and 2.1.5
            utopiabound Nathaniel Clark added a comment - b2_1 patch: http://review.whamcloud.com/5468

            Peter,

            Yes. This patch can cleanly apply to b2_1 (all the way through master). It should be applied to anything we want to support rhel 5 on. Should I submit additional patches?

            utopiabound Nathaniel Clark added a comment - Peter, Yes. This patch can cleanly apply to b2_1 (all the way through master). It should be applied to anything we want to support rhel 5 on. Should I submit additional patches?
            yujian Jian Yu added a comment -

            Per http://wiki.whamcloud.com/display/ENG/Lustre+2.1.4+release+testing+tracker, the issue still exists in Lustre 2.1.4, so we need the patch on the current b2_1 branch for Lustre 2.1.5.

            yujian Jian Yu added a comment - Per http://wiki.whamcloud.com/display/ENG/Lustre+2.1.4+release+testing+tracker , the issue still exists in Lustre 2.1.4, so we need the patch on the current b2_1 branch for Lustre 2.1.5.
            pjones Peter Jones added a comment -

            Nathaniel,

            Is this patch needed for b2_1 also?

            Peter

            pjones Peter Jones added a comment - Nathaniel, Is this patch needed for b2_1 also? Peter

            Patch to assume proc_dir_entry for rhel kernels: http://review.whamcloud.com/5439

            utopiabound Nathaniel Clark added a comment - Patch to assume proc_dir_entry for rhel kernels: http://review.whamcloud.com/5439
            bobijam Zhenyu Xu added a comment - - edited

            Since recovery-small test_57 is intended to test proc removing while reading it, so the patch (review#5253) cannot avoid the hung of the test w/ patchless client build upon the hidden proc_dir_entry users kernels.

            Since later kernels all use proc_dir_entry users, I think we can presume it and define LPROCFS_

            {ENTRY,END}

            empty ops.

            bobijam Zhenyu Xu added a comment - - edited Since recovery-small test_57 is intended to test proc removing while reading it, so the patch (review#5253) cannot avoid the hung of the test w/ patchless client build upon the hidden proc_dir_entry users kernels. Since later kernels all use proc_dir_entry users, I think we can presume it and define LPROCFS_ {ENTRY,END} empty ops.
            yujian Jian Yu added a comment -

            Lustre Branch: b1_8
            Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/253
            Distro/Arch: RHEL5.9/x86_64

            The issue still occurred: https://maloo.whamcloud.com/test_sets/583b7710-7009-11e2-a955-52540035b04c

            yujian Jian Yu added a comment - Lustre Branch: b1_8 Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/253 Distro/Arch: RHEL5.9/x86_64 The issue still occurred: https://maloo.whamcloud.com/test_sets/583b7710-7009-11e2-a955-52540035b04c

            Patch at http://review.whamcloud.com/5253, let's hope it builds and tests OK.

            adilger Andreas Dilger added a comment - Patch at http://review.whamcloud.com/5253 , let's hope it builds and tests OK.
            adilger Andreas Dilger added a comment - - edited

            I can't find any way to check for proc_dir_entry_aux, so we can't depend on checking it for patchless clients.

            I think what needs to change here is two things:

            • the code in lprocfs_status.h (1.8) and param_tree.h (master) should be changed to check for HAVE_PROCFS_USERS first, then HAVE_PROCFS_DELETED secondly, so that if both are available it uses the HAVE_PROCFS_USERS method
            • always check for pde_fops == NULL, regardless of whether we detect HAVE_PROCFS_USERS
            • always check for deleted, if HAVE_PROCFS_DELETED is set, even if HAVE_PROCFS_USERS is also present

            At worst this causes some small race where a /proc entry will not be shown when it is just loaded or unloaded, but should be safe against crashing.

            static inline int LPROCFS_ENTRY_AND_CHECK(struct proc_dir_entry *dp)
            {
                    int deleted = 0;
            
            #ifdef HAVE_PROCFS_USERS
                    spin_lock(&dp->pde_unload_lock);
            #endif
                    if (unlikely(dp->proc_fops == NULL)) 
                            deleted = 1;
            #ifdef HAVE_PROCFS_USERS
                    spin_unlock(&dp->pde_unload_lock);
            #endif
            
                    LPROCFS_ENTRY();
            #if defined(HAVE_PROCFS_DELETED)
                    if (unlikely(dp->deleted)) {
                            LPROCFS_EXIT();
                            deleted = 1;
                    }
            #endif
            
                    return deleted ? -ENODEV : 0;
            }
            

            I haven't tested this at all, nor even compiled it yet.

            adilger Andreas Dilger added a comment - - edited I can't find any way to check for proc_dir_entry_aux, so we can't depend on checking it for patchless clients. I think what needs to change here is two things: the code in lprocfs_status.h (1.8) and param_tree.h (master) should be changed to check for HAVE_PROCFS_USERS first, then HAVE_PROCFS_DELETED secondly, so that if both are available it uses the HAVE_PROCFS_USERS method always check for pde_fops == NULL, regardless of whether we detect HAVE_PROCFS_USERS always check for deleted, if HAVE_PROCFS_DELETED is set, even if HAVE_PROCFS_USERS is also present At worst this causes some small race where a /proc entry will not be shown when it is just loaded or unloaded, but should be safe against crashing. static inline int LPROCFS_ENTRY_AND_CHECK(struct proc_dir_entry *dp) { int deleted = 0; #ifdef HAVE_PROCFS_USERS spin_lock(&dp->pde_unload_lock); #endif if (unlikely(dp->proc_fops == NULL)) deleted = 1; #ifdef HAVE_PROCFS_USERS spin_unlock(&dp->pde_unload_lock); #endif LPROCFS_ENTRY(); # if defined(HAVE_PROCFS_DELETED) if (unlikely(dp->deleted)) { LPROCFS_EXIT(); deleted = 1; } #endif return deleted ? -ENODEV : 0; } I haven't tested this at all, nor even compiled it yet.

            People

              utopiabound Nathaniel Clark
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: