Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7442

conf-sanity test_41c: @@@@@@ FAIL: unexpected concurent MDT mounts rc=17 rc2=0

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.8.0
    • None
    • single node setup
    • 3
    • 9223372036854775807

    Description

      modules unloaded.
      error: set_param: /proc/

      {fs,sys}/{lnet,lustre}/fail_loc: Found no match
      Starting mds1: -o rw,user_xattr /dev/vdb /mnt/mds1
      mount.lustre: set /sys/block/vdb/queue/max_sectors_kb to 2147483647

      error: set_param: /proc/{fs,sys}

      /

      {lnet,lustre}

      /fail_loc: Found no match
      Starting mds1: -o rw,user_xattr /dev/vdb /mnt/mds1
      mount.lustre: set /sys/block/vdb/queue/max_sectors_kb to 2147483647

      mount.lustre: mount /dev/vdb at /mnt/mds1 failed: File exists
      Start of /dev/vdb on mds1 failed 17
      Started lustre-MDT0000
      Stopping /mnt/mds1 (opts:-f) on fre819
      conf-sanity test_41c: @@@@@@ FAIL: unexpected concurent MDT mounts result, rc=17 rc2=0

      Attachments

        Issue Links

          Activity

            [LU-7442] conf-sanity test_41c: @@@@@@ FAIL: unexpected concurent MDT mounts rc=17 rc2=0
            pjones Peter Jones added a comment -

            Landed for 2.9

            pjones Peter Jones added a comment - Landed for 2.9

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17301/
            Subject: LU-7442 tests: Load modules on MDS/OSS in conf-sanity test_41c
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3973c51b0ba246fb9904235206e6b9269d670a51

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17301/ Subject: LU-7442 tests: Load modules on MDS/OSS in conf-sanity test_41c Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3973c51b0ba246fb9904235206e6b9269d670a51

            Thanks Bruno. I have rebased my patch against the patch http://review.whamcloud.com/#/c/17427 to resolve conf-sanity test_41c failure.

            yong.fan nasf (Inactive) added a comment - Thanks Bruno. I have rebased my patch against the patch http://review.whamcloud.com/#/c/17427 to resolve conf-sanity test_41c failure.
            bfaccini Bruno Faccini (Inactive) added a comment - - edited

            hemaharish,
            Sorry to be late on this, but am I right if I think that you can encounter this problem solid (I mean the missing load_modules) when you run conf-sanity/test_41c as a single test run and not as part as full conf-sanity test suite ??

            bfaccini Bruno Faccini (Inactive) added a comment - - edited hemaharish, Sorry to be late on this, but am I right if I think that you can encounter this problem solid (I mean the missing load_modules) when you run conf-sanity/test_41c as a single test run and not as part as full conf-sanity test suite ??
            hemaharish hemaharish added a comment -

            Yes, failure was permanent on single node setup without patch.

            hemaharish hemaharish added a comment - Yes, failure was permanent on single node setup without patch.

            The reason of the failure (in fact the non-permanent failure!) is still a bit mysterious for me, but patch re-loading of modules after cleanup is harmless and will clear any special cases...
            Just for my information, was this failure permanent during you testing ?

            bfaccini Bruno Faccini (Inactive) added a comment - The reason of the failure (in fact the non-permanent failure!) is still a bit mysterious for me, but patch re-loading of modules after cleanup is harmless and will clear any special cases... Just for my information, was this failure permanent during you testing ?

            HemaHarish (hema.yarramilli@seagate.com) uploaded a new patch: http://review.whamcloud.com/17301
            Subject: LU-7442 test: Unexpected concurent MDT mounts in conf-sanity 41c
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 3ac88c6a184ca0db1fda8368ec1e4590cd446ffe

            gerrit Gerrit Updater added a comment - HemaHarish (hema.yarramilli@seagate.com) uploaded a new patch: http://review.whamcloud.com/17301 Subject: LU-7442 test: Unexpected concurent MDT mounts in conf-sanity 41c Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3ac88c6a184ca0db1fda8368ec1e4590cd446ffe
            hemaharish hemaharish added a comment -

            Hi,
            We worked on this patch. Call to "load_modules" fixed the issue, the test case was pass, will land the patch.

            == conf-sanity test 41c: concurrent mounts of MDT/OST should all fail but one == 10:42:24 (1447996344)
            umount lustre on /mnt/lustre.....
            stop ost1 service on centos6.6-Upstream-landing
            stop mds service on centos6.6-Upstream-landing
            modules unloaded.
            Loading modules from /home/hema/xyratex/code/lustre-wc-rel/lustre/tests/..
            detected 1 online CPUs by sysfs
            libcfs will create CPU partition based on online CPUs
            debug=-1
            subsystem_debug=all -lnet -lnd -pinger
            gss/krb5 is not supported
            quota/lquota options: 'hash_lqs_cur_bits=3'
            fail_loc=0x703
            Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
            fail_loc=0x0
            Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
            mount.lustre: mount /dev/loop1 at /mnt/mds1 failed: Operation already in progress
            The target service is already running. (/dev/loop1)
            Start of /tmp/lustre-mdt1 on mds1 failed 114
            Started lustre-MDT0000
            1st MDT start succeed
            2nd MDT start failed with EALREADY
            fail_loc=0x703
            Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
            fail_loc=0x0
            Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
            mount.lustre: mount /dev/loop2 at /mnt/ost1 failed: Operation already in progress
            The target service is already running. (/dev/loop2)
            Start of /tmp/lustre-ost1 on ost1 failed 114
            Started lustre-OST0000
            1st OST start succeed
            2nd OST start failed with EALREADY
            stop mds service on centos6.6-Upstream-landing
            Stopping /mnt/mds1 (opts:-f) on centos6.6-Upstream-landing
            Stopping /mnt/ost1 (opts:-f) on centos6.6-Upstream-landing
            start mds service on centos6.6-Upstream-landing
            Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
            Started lustre-MDT0000
            start ost1 service on centos6.6-Upstream-landing
            Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
            Started lustre-OST0000
            mount lustre on /mnt/lustre.....
            Starting client: centos6.6-Upstream-landing:  -o user_xattr,flock centos6.6-Upstream-landing@tcp:/lustre /mnt/lustre
            setup single mount lustre success
            umount lustre on /mnt/lustre.....
            Stopping client centos6.6-Upstream-landing /mnt/lustre (opts:)
            stop ost1 service on centos6.6-Upstream-landing
            Stopping /mnt/ost1 (opts:-f) on centos6.6-Upstream-landing
            stop mds service on centos6.6-Upstream-landing
            Stopping /mnt/mds1 (opts:-f) on centos6.6-Upstream-landing
            modules unloaded.
            Resetting fail_loc on all nodes...done.
            PASS 41c (78s)
            
            
            hemaharish hemaharish added a comment - Hi, We worked on this patch. Call to "load_modules" fixed the issue, the test case was pass, will land the patch. == conf-sanity test 41c: concurrent mounts of MDT/OST should all fail but one == 10:42:24 (1447996344) umount lustre on /mnt/lustre..... stop ost1 service on centos6.6-Upstream-landing stop mds service on centos6.6-Upstream-landing modules unloaded. Loading modules from /home/hema/xyratex/code/lustre-wc-rel/lustre/tests/.. detected 1 online CPUs by sysfs libcfs will create CPU partition based on online CPUs debug=-1 subsystem_debug=all -lnet -lnd -pinger gss/krb5 is not supported quota/lquota options: 'hash_lqs_cur_bits=3' fail_loc=0x703 Starting mds1: -o loop /tmp/lustre-mdt1 /mnt/mds1 fail_loc=0x0 Starting mds1: -o loop /tmp/lustre-mdt1 /mnt/mds1 mount.lustre: mount /dev/loop1 at /mnt/mds1 failed: Operation already in progress The target service is already running. (/dev/loop1) Start of /tmp/lustre-mdt1 on mds1 failed 114 Started lustre-MDT0000 1st MDT start succeed 2nd MDT start failed with EALREADY fail_loc=0x703 Starting ost1: -o loop /tmp/lustre-ost1 /mnt/ost1 fail_loc=0x0 Starting ost1: -o loop /tmp/lustre-ost1 /mnt/ost1 mount.lustre: mount /dev/loop2 at /mnt/ost1 failed: Operation already in progress The target service is already running. (/dev/loop2) Start of /tmp/lustre-ost1 on ost1 failed 114 Started lustre-OST0000 1st OST start succeed 2nd OST start failed with EALREADY stop mds service on centos6.6-Upstream-landing Stopping /mnt/mds1 (opts:-f) on centos6.6-Upstream-landing Stopping /mnt/ost1 (opts:-f) on centos6.6-Upstream-landing start mds service on centos6.6-Upstream-landing Starting mds1: -o loop /tmp/lustre-mdt1 /mnt/mds1 Started lustre-MDT0000 start ost1 service on centos6.6-Upstream-landing Starting ost1: -o loop /tmp/lustre-ost1 /mnt/ost1 Started lustre-OST0000 mount lustre on /mnt/lustre..... Starting client: centos6.6-Upstream-landing: -o user_xattr,flock centos6.6-Upstream-landing@tcp:/lustre /mnt/lustre setup single mount lustre success umount lustre on /mnt/lustre..... Stopping client centos6.6-Upstream-landing /mnt/lustre (opts:) stop ost1 service on centos6.6-Upstream-landing Stopping /mnt/ost1 (opts:-f) on centos6.6-Upstream-landing stop mds service on centos6.6-Upstream-landing Stopping /mnt/mds1 (opts:-f) on centos6.6-Upstream-landing modules unloaded. Resetting fail_loc on all nodes...done. PASS 41c (78s)

            Looks like conf-sanity/test_41c needs some fixes/cleanup, as for LU-5921 which is already assigned to me.
            In this particular ticket's case, it seems that the modules unload prevented the fail_loc setting for the test ...
            I will cook a patch soon.

            bfaccini Bruno Faccini (Inactive) added a comment - Looks like conf-sanity/test_41c needs some fixes/cleanup, as for LU-5921 which is already assigned to me. In this particular ticket's case, it seems that the modules unload prevented the fail_loc setting for the test ... I will cook a patch soon.

            People

              bfaccini Bruno Faccini (Inactive)
              hemaharish hemaharish
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: