Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12312

sanity-sec: test_31: 'network' mount option cannot be taken into account

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Lai Siyao <lai.siyao@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/1bf6a37a-7821-11e9-a028-52540065bddc

      CMD: trevis-38vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests//usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all\" 4 
      trevis-38vm8: == rpc test complete, duration -o sec ================================================================ 19:32:45 (1558035165)
      trevis-38vm8: trevis-38vm8.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
      CMD: trevis-38vm8 e2label /dev/mapper/ost8_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
      CMD: trevis-38vm8 e2label /dev/mapper/ost8_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
      CMD: trevis-38vm8 e2label /dev/mapper/ost8_flakey 2>/dev/null
      Started lustre-OST0007
      CMD: trevis-38vm9 /usr/sbin/lctl list_nids | grep tcp999
      Starting client: trevis-38vm6.trevis.whamcloud.com:  -o user_xattr,flock,network=tcp999 10.9.3.145@tcp999:/lustre /mnt/lustre
      CMD: trevis-38vm6.trevis.whamcloud.com mkdir -p /mnt/lustre
      CMD: trevis-38vm6.trevis.whamcloud.com mount -t lustre -o user_xattr,flock,network=tcp999 10.9.3.145@tcp999:/lustre /mnt/lustre
      mount.lustre: mount 10.9.3.145@tcp999:/lustre at /mnt/lustre failed: Invalid argument
      This may have multiple causes.
      Is 'lustre' the correct filesystem name?
      Are the mount options correct?
      Check the syslog for more info.
      unconfigure:
          - lnet:
                errno: -16
                descr: "LNet unconfigure error: Device or resource busy"
      
      [17996.736209] Lustre: DEBUG MARKER: == sanity-sec test 31: client mount option '-o network' ============================================== 19:30:04 (1558035004)
      [17997.693592] Lustre: DEBUG MARKER: lctl get_param -n *.lustre*.exports.'10.9.5.215@tcp'.uuid 		  2>/dev/null | grep -q -
      [17998.217952] Lustre: DEBUG MARKER: /usr/sbin/lnetctl lnet configure && /usr/sbin/lnetctl net add --if 		  eth0 		  --net tcp999
      [17998.557153] LNet: Added LNI 10.9.3.146@tcp999 [8/256/0/180]
      [18000.237970] LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation mds_statfs to node 10.9.3.145@tcp failed: rc = -107
      [18000.239925] LustreError: Skipped 9 previous similar messages
      [18000.240888] Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.9.3.145@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [18000.243842] Lustre: Skipped 18 previous similar messages
      [18007.616779] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
      [18007.922846] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2
      [18009.125483] Lustre: lustre-MDT0001: Not available for connect from 10.9.3.145@tcp (stopping)
      [18009.127096] Lustre: Skipped 42 previous similar messages
      [18011.260546] LustreError: 17495:0:(client.c:1183:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff98fcd0168d80 x1633699486628912/t0(0) o41->lustre-MDT0003-osp-MDT0001@0@lo:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      [18011.264135] LustreError: 17495:0:(client.c:1183:ptlrpc_import_delay_req()) Skipped 2 previous similar messages
      [18015.357668] Lustre: server umount lustre-MDT0001 complete
      [18015.358716] Lustre: Skipped 1 previous similar message
      [18016.092394] LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_24v - Timeout occurred after 343 mins, last suite running was sanity-sec, restarting cluster to continue tests

      Attachments

        Issue Links

          Activity

            [LU-12312] sanity-sec: test_31: 'network' mount option cannot be taken into account

            The test has been added to the ALWAYS_EXCEPT list, and this ticket marked with the always_except label, so we can't close it until the issue is fixed and the test is removed from the sanity-sec.sh ALWAYS_EXCEPT list.

            adilger Andreas Dilger added a comment - The test has been added to the ALWAYS_EXCEPT list, and this ticket marked with the always_except label, so we can't close it until the issue is fixed and the test is removed from the sanity-sec.sh ALWAYS_EXCEPT list.

            looks like the revert has been dropped.

            simmonsja James A Simmons added a comment - looks like the revert has been dropped.

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38128/
            Subject: LU-12312 tests: stop running sanity-sec test 31
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d4b93cb4b79139027992e675582eb8036734d770

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38128/ Subject: LU-12312 tests: stop running sanity-sec test 31 Project: fs/lustre-release Branch: master Current Patch Set: Commit: d4b93cb4b79139027992e675582eb8036734d770

            James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38128
            Subject: LU-12312 tests: stop running sanity-sec test 31
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4aed4881178c24ba8985bbb6d7f0737f9762b4da

            gerrit Gerrit Updater added a comment - James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38128 Subject: LU-12312 tests: stop running sanity-sec test 31 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4aed4881178c24ba8985bbb6d7f0737f9762b4da

            It seems that the initial problem report from this ticket is unrelated to the failures currently being hit, despite the fact that they both cause the same subtest to fail in the same way.

            adilger Andreas Dilger added a comment - It seems that the initial problem report from this ticket is unrelated to the failures currently being hit, despite the fact that they both cause the same subtest to fail in the same way.

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38126
            Subject: LU-12312 tests: patch causing sanity-sec test_31 failure
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c3fccbe5e86b6fd90ac98dd21310625e58dd0c8c

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38126 Subject: LU-12312 tests: patch causing sanity-sec test_31 failure Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c3fccbe5e86b6fd90ac98dd21310625e58dd0c8c
            adilger Andreas Dilger added a comment - - edited

            The test log shows:

            mount.lustre: mount 10.9.6.211@tcp999:/lustre at /mnt/lustre failed: Invalid argument
            

            and the client console log shows:

            [19617.682714] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock,network=tcp999 10.9.6.211@tcp999:/lustre /mnt/lustre
            [19617.693290] LustreError: 21537:0:(obd_mount.c:1487:lmd_parse()) LNet Dynamic Peer Discovery is enabled on this node. 'network' mount option cannot be taken into account.
            [19617.695857] LustreError: 21537:0:(obd_mount.c:1586:lmd_parse()) Bad mount options user_xattr,flock,network=tcp999,device=10.9.6.211@tcp999:/lustre
            [19617.698041] LustreError: 21537:0:(obd_mount.c:1681:lustre_fill_super()) Unable to mount  (-22)
            [19618.702375] LNet: Removed LNI 10.9.6.208@tcp999
            

            It may relate to patch https://review.whamcloud.com/36919 "LU-13028 lnet: advertise discovery when toggled" that changed lnet, but was submitted with "trivial".

            adilger Andreas Dilger added a comment - - edited The test log shows: mount.lustre: mount 10.9.6.211@tcp999:/lustre at /mnt/lustre failed: Invalid argument and the client console log shows: [19617.682714] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock,network=tcp999 10.9.6.211@tcp999:/lustre /mnt/lustre [19617.693290] LustreError: 21537:0:(obd_mount.c:1487:lmd_parse()) LNet Dynamic Peer Discovery is enabled on this node. 'network' mount option cannot be taken into account. [19617.695857] LustreError: 21537:0:(obd_mount.c:1586:lmd_parse()) Bad mount options user_xattr,flock,network=tcp999,device=10.9.6.211@tcp999:/lustre [19617.698041] LustreError: 21537:0:(obd_mount.c:1681:lustre_fill_super()) Unable to mount (-22) [19618.702375] LNet: Removed LNI 10.9.6.208@tcp999 It may relate to patch https://review.whamcloud.com/36919 " LU-13028 lnet: advertise discovery when toggled " that changed lnet, but was submitted with "trivial".

            This is causing about 70% test failures in the past few days. Is this related to some other patch that landed?

            adilger Andreas Dilger added a comment - This is causing about 70% test failures in the past few days. Is this related to some other patch that landed?

            Bug due to sysfs lnet handling of peer creation which is a bad idea. 

            simmonsja James A Simmons added a comment - Bug due to sysfs lnet handling of peer creation which is a bad idea. 

            James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/34997
            Subject: LU-12312 lnet: debug canf-sanity failure
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d0b1ed545801bd884ed43a78879dec27f0470e75

            gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/34997 Subject: LU-12312 lnet: debug canf-sanity failure Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d0b1ed545801bd884ed43a78879dec27f0470e75

            People

              ashehata Amir Shehata (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: