Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12312

sanity-sec: test_31: 'network' mount option cannot be taken into account

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Lai Siyao <lai.siyao@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/1bf6a37a-7821-11e9-a028-52540065bddc

      CMD: trevis-38vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests//usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all\" 4 
      trevis-38vm8: == rpc test complete, duration -o sec ================================================================ 19:32:45 (1558035165)
      trevis-38vm8: trevis-38vm8.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
      CMD: trevis-38vm8 e2label /dev/mapper/ost8_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
      CMD: trevis-38vm8 e2label /dev/mapper/ost8_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
      CMD: trevis-38vm8 e2label /dev/mapper/ost8_flakey 2>/dev/null
      Started lustre-OST0007
      CMD: trevis-38vm9 /usr/sbin/lctl list_nids | grep tcp999
      Starting client: trevis-38vm6.trevis.whamcloud.com:  -o user_xattr,flock,network=tcp999 10.9.3.145@tcp999:/lustre /mnt/lustre
      CMD: trevis-38vm6.trevis.whamcloud.com mkdir -p /mnt/lustre
      CMD: trevis-38vm6.trevis.whamcloud.com mount -t lustre -o user_xattr,flock,network=tcp999 10.9.3.145@tcp999:/lustre /mnt/lustre
      mount.lustre: mount 10.9.3.145@tcp999:/lustre at /mnt/lustre failed: Invalid argument
      This may have multiple causes.
      Is 'lustre' the correct filesystem name?
      Are the mount options correct?
      Check the syslog for more info.
      unconfigure:
          - lnet:
                errno: -16
                descr: "LNet unconfigure error: Device or resource busy"
      
      [17996.736209] Lustre: DEBUG MARKER: == sanity-sec test 31: client mount option '-o network' ============================================== 19:30:04 (1558035004)
      [17997.693592] Lustre: DEBUG MARKER: lctl get_param -n *.lustre*.exports.'10.9.5.215@tcp'.uuid 		  2>/dev/null | grep -q -
      [17998.217952] Lustre: DEBUG MARKER: /usr/sbin/lnetctl lnet configure && /usr/sbin/lnetctl net add --if 		  eth0 		  --net tcp999
      [17998.557153] LNet: Added LNI 10.9.3.146@tcp999 [8/256/0/180]
      [18000.237970] LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation mds_statfs to node 10.9.3.145@tcp failed: rc = -107
      [18000.239925] LustreError: Skipped 9 previous similar messages
      [18000.240888] Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.9.3.145@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [18000.243842] Lustre: Skipped 18 previous similar messages
      [18007.616779] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds2' ' /proc/mounts || true
      [18007.922846] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds2
      [18009.125483] Lustre: lustre-MDT0001: Not available for connect from 10.9.3.145@tcp (stopping)
      [18009.127096] Lustre: Skipped 42 previous similar messages
      [18011.260546] LustreError: 17495:0:(client.c:1183:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff98fcd0168d80 x1633699486628912/t0(0) o41->lustre-MDT0003-osp-MDT0001@0@lo:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      [18011.264135] LustreError: 17495:0:(client.c:1183:ptlrpc_import_delay_req()) Skipped 2 previous similar messages
      [18015.357668] Lustre: server umount lustre-MDT0001 complete
      [18015.358716] Lustre: Skipped 1 previous similar message
      [18016.092394] LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_24v - Timeout occurred after 343 mins, last suite running was sanity-sec, restarting cluster to continue tests

      Attachments

        Issue Links

          Activity

            [LU-12312] sanity-sec: test_31: 'network' mount option cannot be taken into account

            Haven't seen this failure in several months, though sanity-sec test_31 is still failing in other ways.

            adilger Andreas Dilger added a comment - Haven't seen this failure in several months, though sanity-sec test_31 is still failing in other ways.
            scherementsev Sergey Cheremencev added a comment - +1 on master https://testing.whamcloud.com/test_sets/1bfa66fa-90ed-431e-8b30-4c4bf4ce2782
            scherementsev Sergey Cheremencev added a comment - +1 on master https://testing.whamcloud.com/test_sets/5b24d1b2-ab42-4509-ad28-6ba724c38368
            adilger Andreas Dilger added a comment - +1 on master https://testing.whamcloud.com/test_sets/2c04ca53-15d2-4ba3-bd71-e9171621fd6f

            I hit a timeout with sanity-sec test_31 now that this test is running again.
            https://testing.whamcloud.com/test_sets/34f16be3-752f-4f6a-a775-5899987c93c8

            adilger Andreas Dilger added a comment - I hit a timeout with sanity-sec test_31 now that this test is running again. https://testing.whamcloud.com/test_sets/34f16be3-752f-4f6a-a775-5899987c93c8

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38229/
            Subject: LU-12312 lnet: handle no discovery flag
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a83c820f89b3bd56dc1951e62431b8e2ba8181f6

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38229/ Subject: LU-12312 lnet: handle no discovery flag Project: fs/lustre-release Branch: master Current Patch Set: Commit: a83c820f89b3bd56dc1951e62431b8e2ba8181f6

            Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38229
            Subject: LU-12312 lnet: handle no discovery flag
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bab554e75fbd08b55326b6b54680e6a0f9d582b1

            gerrit Gerrit Updater added a comment - Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38229 Subject: LU-12312 lnet: handle no discovery flag Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bab554e75fbd08b55326b6b54680e6a0f9d582b1

            I'm currently working on a resolution for this problem. I'll push it as a separate patch, as the original patch, "LU-13028 lnet: advertise discovery when toggled", is still needed.

            ashehata Amir Shehata (Inactive) added a comment - I'm currently working on a resolution for this problem. I'll push it as a separate patch, as the original patch, " LU-13028 lnet: advertise discovery when toggled", is still needed.
            ashehata Amir Shehata (Inactive) added a comment - - edited

            I've been working on debugging this issue. I'd like some time to narrow down the reason for the regression before reverting the patch. At least I'd like to have clarity why this behaviour breaks the network option.

            ashehata Amir Shehata (Inactive) added a comment - - edited I've been working on debugging this issue. I'd like some time to narrow down the reason for the regression before reverting the patch. At least I'd like to have clarity why this behaviour breaks the network option.

            People

              ashehata Amir Shehata (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: