Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12992

sanity test_240: failed to mount client

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • Lustre 2.14.0
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for liuying <emoly.liu@intel.com>

      Failure Rate: 3.08% of most recent 65 runs, 35 skipped (all branches), especially in group review-dne-ssk testing

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/3f7f577a-0b7d-11ea-9487-52540065bddc

      test_240 failed with the following error:

      Starting client: onyx-30vm1.onyx.whamcloud.com:  -o user_xattr,flock onyx-30vm4@tcp:/lustre /mnt/lustre
      CMD: onyx-30vm1.onyx.whamcloud.com mkdir -p /mnt/lustre
      CMD: onyx-30vm1.onyx.whamcloud.com mount -t lustre -o user_xattr,flock onyx-30vm4@tcp:/lustre /mnt/lustre
      mount.lustre: mount onyx-30vm4@tcp:/lustre at /mnt/lustre failed: Operation not permitted
       sanity test_240: @@@@@@ FAIL: failed to mount client 
      

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_240 - failed to mount client

      Attachments

        Activity

          [LU-12992] sanity test_240: failed to mount client

          Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40364
          Subject: LU-12992 gss: retry in case of short computed shared key
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set: 1
          Commit: 380fe0c16adcbe2b4e8040f129dc27755f9bab86

          gerrit Gerrit Updater added a comment - Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40364 Subject: LU-12992 gss: retry in case of short computed shared key Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 380fe0c16adcbe2b4e8040f129dc27755f9bab86
          pjones Peter Jones added a comment -

          Landed for 2.14

          pjones Peter Jones added a comment - Landed for 2.14

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37064/
          Subject: LU-12992 gss: retry in case of short computed shared key
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 6724002732c1bedb4ae4217871d432972672747e

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37064/ Subject: LU-12992 gss: retry in case of short computed shared key Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6724002732c1bedb4ae4217871d432972672747e

          Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37065
          Subject: LU-12992 tests: another tentative reproducer
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 600a6e54b9f6963b56d56ed8365e036fda732956

          gerrit Gerrit Updater added a comment - Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37065 Subject: LU-12992 tests: another tentative reproducer Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 600a6e54b9f6963b56d56ed8365e036fda732956

          Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37064
          Subject: LU-12992 gss: retry in case of short computed shared key
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 8609331583142d48241eaffd74ccf73aaad96d1d

          gerrit Gerrit Updater added a comment - Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37064 Subject: LU-12992 gss: retry in case of short computed shared key Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8609331583142d48241eaffd74ccf73aaad96d1d

          Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37048
          Subject: LU-12992 tests: tentative reproducer
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 931e36b050e63b7c5c2c350d9fa8a4520a7df2a9

          gerrit Gerrit Updater added a comment - Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/37048 Subject: LU-12992 tests: tentative reproducer Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 931e36b050e63b7c5c2c350d9fa8a4520a7df2a9

          I’m seeing a similar issue for various sanity-sec tests For example, at https://testing.whamcloud.com/test_sets/203d27d0-1cb1-11ea-b1e8-52540065bddc, we see sanity-sec test 25 fail with, from the suite_log,

          trevis-40vm2: mount.lustre: mount trevis-40vm4@tcp:/lustre at /mnt/lustre failed: Operation not permitted
           sanity-sec test_25: @@@@@@ FAIL: unable to mount clients trevis-40vm1.trevis.whamcloud.com,trevis-40vm2 
          

          Since the client (vm2) isn’t able to mount, looking at the console log for that node, we see a couple of Lustre errors

          [23381.792880] LustreError: 18418:0:(gss_keyring.c:1411:gss_kt_update()) negotiation: rpc err 0, gss err 0
          [23381.794690] LustreError: 18418:0:(gss_keyring.c:1411:gss_kt_update()) Skipped 1 previous similar message
          [23381.796257] Lustre: 18418:0:(sec_gss.c:315:cli_ctx_expire()) ctx ffff9dcc80055e00(0->lustre-MDT0000_UUID) get expired: 1576134447(+40s)
          [23381.799069] Lustre: 18418:0:(sec_gss.c:315:cli_ctx_expire()) Skipped 3 previous similar messages
          [23381.829815] Lustre: 18421:0:(sec_gss.c:377:gss_cli_ctx_uptodate()) client refreshed ctx ffff9dcce6b5d300 idx 0xf9e70ab0fc7179a7 (0->lustre-MDT0001_UUID), expiry 1576739197(+604790s)
          [23381.832536] Lustre: 18421:0:(sec_gss.c:377:gss_cli_ctx_uptodate()) Skipped 1 previous similar message
          [23381.834400] Lustre: 18421:0:(gss_svc_upcall.c:804:gss_svc_upcall_install_rvs_ctx()) create reverse svc ctx ffff9dcce7e1ce40 to lustre-MDT0001_UUID: idx 0xa48733bdf9b2ed6f
          [23381.836877] Lustre: 18421:0:(gss_svc_upcall.c:804:gss_svc_upcall_install_rvs_ctx()) Skipped 1 previous similar message
          [23386.829605] Lustre: 18430:0:(sec_gss.c:377:gss_cli_ctx_uptodate()) client refreshed ctx ffff9dccfaadf100 idx 0x6e9f4cddf09a985c (0->lustre-MDT0000_UUID), expiry 1576739202(+604790s)
          [23386.832351] Lustre: 18430:0:(sec_gss.c:377:gss_cli_ctx_uptodate()) Skipped 2 previous similar messages
          [23386.833903] Lustre: 18430:0:(gss_svc_upcall.c:804:gss_svc_upcall_install_rvs_ctx()) create reverse svc ctx ffff9dcc80169240 to lustre-MDT0000_UUID: idx 0xa48733bdf9b2ed72
          [23386.836387] Lustre: 18430:0:(gss_svc_upcall.c:804:gss_svc_upcall_install_rvs_ctx()) Skipped 2 previous similar messages
          [23386.845207] LustreError: 18400:0:(lmv_obd.c:1261:lmv_statfs()) lustre-MDT0000-mdc-ffff9dcce7c96800: can't stat MDS #0: rc = -1
          [23386.846471] Lustre: 14703:0:(sec_gss.c:1228:gss_cli_ctx_fini_common()) gss.keyring@ffff9dccf9e40c00: destroy ctx ffff9dcc80055e00(0->lustre-MDT0000_UUID)
          [23386.846473] Lustre: 14703:0:(sec_gss.c:1228:gss_cli_ctx_fini_common()) Skipped 11 previous similar messages
          [23386.861940] Lustre: 17601:0:(gss_cli_upcall.c:398:gss_do_ctx_fini_rpc()) client finishing forward ctx ffff9dccfaadf100 idx 0x6e9f4cddf09a985c (0->lustre-MDT0000_UUID)
          [23386.864376] Lustre: 17601:0:(gss_cli_upcall.c:398:gss_do_ctx_fini_rpc()) Skipped 11 previous similar messages
          [23386.893798] LustreError: 18400:0:(lov_obd.c:827:lov_cleanup()) lustre-clilov-ffff9dcce7c96800: lov tgt 0 not cleaned! deathrow=0, lovrc=1
          [23386.902000] Lustre: Unmounted lustre-client
          [23386.903744] LustreError: 18400:0:(obd_mount.c:1672:lustre_fill_super()) Unable to mount  (-1)
          [23387.123495] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-sec test_25: @@@@@@ FAIL: unable to mount clients trevis-40vm1.trevis.whamcloud.com,trevis-40vm2 
          

          In the console logs for the MDS1/3 (vm4), we see

          [ 5283.481582] Lustre: MGS: Connection restored to decbd92d-dc4c-4 (at 10.9.5.228@tcp)
          [ 5283.482930] Lustre: Skipped 1 previous similar message
          [ 5283.604715] Lustre: 6471:0:(sec_gss.c:2066:gss_svc_handle_init()) create svc ctx ffff9117dbd56640: user from 10.9.5.228@tcp authenticated as root
          [ 5283.606945] Lustre: 6471:0:(sec_gss.c:2066:gss_svc_handle_init()) Skipped 1 previous similar message
          [ 5283.638705] LustreError: 2455:0:(gss_svc_upcall.c:969:gss_svc_upcall_handle_init()) authentication failed
          [ 5283.640512] LustreError: 2455:0:(gss_svc_upcall.c:969:gss_svc_upcall_handle_init()) Skipped 3 previous similar messages
          [ 5283.642785] Lustre: 2455:0:(sec_gss.c:370:gss_cli_ctx_uptodate()) server installed reverse ctx ffff9117dc872100 idx 0xf84af856cc294b8, expiry 1576739209(+604800s)
          [ 5283.645641] Lustre: 2455:0:(sec_gss.c:370:gss_cli_ctx_uptodate()) Skipped 1 previous similar message
          [ 5288.660041] Lustre: 2454:0:(sec_gss.c:2066:gss_svc_handle_init()) create svc ctx ffff9117b645ae40: user from 10.9.5.229@tcp authenticated as root
          [ 5288.662394] Lustre: 2454:0:(sec_gss.c:2066:gss_svc_handle_init()) Skipped 2 previous similar messages
          [ 5288.688635] Lustre: 2454:0:(sec_gss.c:370:gss_cli_ctx_uptodate()) server installed reverse ctx ffff9117dca36600 idx 0xa48733bdf9b2ed72, expiry 1576739214(+604800s)
          [ 5288.691011] Lustre: 2454:0:(sec_gss.c:370:gss_cli_ctx_uptodate()) Skipped 2 previous similar messages
          [ 5288.692523] Lustre: lustre-MDT0000: Connection restored to 15155991-9df2-4 (at 10.9.5.229@tcp)
          [ 5288.693925] Lustre: Skipped 4 previous similar messages
          [ 5288.716302] Lustre: 2454:0:(sec_gss.c:2326:gss_svc_handle_destroy()) destroy svc ctx ffff9117b645ae40 idx 0x6e9f4cddf09a985c (0->10.9.5.229@tcp)
          [ 5288.718414] Lustre: 2454:0:(sec_gss.c:2326:gss_svc_handle_destroy()) Skipped 3 previous similar messages
          [ 5288.980369] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-sec test_25: @@@@@@ FAIL: unable to mount clients trevis-40vm1.trevis.whamcloud.com,trevis-40vm2 
          

          Similar failure for sanity-sec test 27a at https://testing.whamcloud.com/test_sets/74c35394-1bab-11ea-adca-52540065bddc . In the client (vm6) console log we see that gss_kt_update() rpc error is -85 and not 0 as above

          [23759.672542] LustreError: 3900:0:(gss_keyring.c:1411:gss_kt_update()) negotiation: rpc err -85, gss err 0
          
          jamesanunez James Nunez (Inactive) added a comment - I’m seeing a similar issue for various sanity-sec tests For example, at https://testing.whamcloud.com/test_sets/203d27d0-1cb1-11ea-b1e8-52540065bddc , we see sanity-sec test 25 fail with, from the suite_log, trevis-40vm2: mount.lustre: mount trevis-40vm4@tcp:/lustre at /mnt/lustre failed: Operation not permitted sanity-sec test_25: @@@@@@ FAIL: unable to mount clients trevis-40vm1.trevis.whamcloud.com,trevis-40vm2 Since the client (vm2) isn’t able to mount, looking at the console log for that node, we see a couple of Lustre errors [23381.792880] LustreError: 18418:0:(gss_keyring.c:1411:gss_kt_update()) negotiation: rpc err 0, gss err 0 [23381.794690] LustreError: 18418:0:(gss_keyring.c:1411:gss_kt_update()) Skipped 1 previous similar message [23381.796257] Lustre: 18418:0:(sec_gss.c:315:cli_ctx_expire()) ctx ffff9dcc80055e00(0->lustre-MDT0000_UUID) get expired: 1576134447(+40s) [23381.799069] Lustre: 18418:0:(sec_gss.c:315:cli_ctx_expire()) Skipped 3 previous similar messages [23381.829815] Lustre: 18421:0:(sec_gss.c:377:gss_cli_ctx_uptodate()) client refreshed ctx ffff9dcce6b5d300 idx 0xf9e70ab0fc7179a7 (0->lustre-MDT0001_UUID), expiry 1576739197(+604790s) [23381.832536] Lustre: 18421:0:(sec_gss.c:377:gss_cli_ctx_uptodate()) Skipped 1 previous similar message [23381.834400] Lustre: 18421:0:(gss_svc_upcall.c:804:gss_svc_upcall_install_rvs_ctx()) create reverse svc ctx ffff9dcce7e1ce40 to lustre-MDT0001_UUID: idx 0xa48733bdf9b2ed6f [23381.836877] Lustre: 18421:0:(gss_svc_upcall.c:804:gss_svc_upcall_install_rvs_ctx()) Skipped 1 previous similar message [23386.829605] Lustre: 18430:0:(sec_gss.c:377:gss_cli_ctx_uptodate()) client refreshed ctx ffff9dccfaadf100 idx 0x6e9f4cddf09a985c (0->lustre-MDT0000_UUID), expiry 1576739202(+604790s) [23386.832351] Lustre: 18430:0:(sec_gss.c:377:gss_cli_ctx_uptodate()) Skipped 2 previous similar messages [23386.833903] Lustre: 18430:0:(gss_svc_upcall.c:804:gss_svc_upcall_install_rvs_ctx()) create reverse svc ctx ffff9dcc80169240 to lustre-MDT0000_UUID: idx 0xa48733bdf9b2ed72 [23386.836387] Lustre: 18430:0:(gss_svc_upcall.c:804:gss_svc_upcall_install_rvs_ctx()) Skipped 2 previous similar messages [23386.845207] LustreError: 18400:0:(lmv_obd.c:1261:lmv_statfs()) lustre-MDT0000-mdc-ffff9dcce7c96800: can't stat MDS #0: rc = -1 [23386.846471] Lustre: 14703:0:(sec_gss.c:1228:gss_cli_ctx_fini_common()) gss.keyring@ffff9dccf9e40c00: destroy ctx ffff9dcc80055e00(0->lustre-MDT0000_UUID) [23386.846473] Lustre: 14703:0:(sec_gss.c:1228:gss_cli_ctx_fini_common()) Skipped 11 previous similar messages [23386.861940] Lustre: 17601:0:(gss_cli_upcall.c:398:gss_do_ctx_fini_rpc()) client finishing forward ctx ffff9dccfaadf100 idx 0x6e9f4cddf09a985c (0->lustre-MDT0000_UUID) [23386.864376] Lustre: 17601:0:(gss_cli_upcall.c:398:gss_do_ctx_fini_rpc()) Skipped 11 previous similar messages [23386.893798] LustreError: 18400:0:(lov_obd.c:827:lov_cleanup()) lustre-clilov-ffff9dcce7c96800: lov tgt 0 not cleaned! deathrow=0, lovrc=1 [23386.902000] Lustre: Unmounted lustre-client [23386.903744] LustreError: 18400:0:(obd_mount.c:1672:lustre_fill_super()) Unable to mount (-1) [23387.123495] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-sec test_25: @@@@@@ FAIL: unable to mount clients trevis-40vm1.trevis.whamcloud.com,trevis-40vm2 In the console logs for the MDS1/3 (vm4), we see [ 5283.481582] Lustre: MGS: Connection restored to decbd92d-dc4c-4 (at 10.9.5.228@tcp) [ 5283.482930] Lustre: Skipped 1 previous similar message [ 5283.604715] Lustre: 6471:0:(sec_gss.c:2066:gss_svc_handle_init()) create svc ctx ffff9117dbd56640: user from 10.9.5.228@tcp authenticated as root [ 5283.606945] Lustre: 6471:0:(sec_gss.c:2066:gss_svc_handle_init()) Skipped 1 previous similar message [ 5283.638705] LustreError: 2455:0:(gss_svc_upcall.c:969:gss_svc_upcall_handle_init()) authentication failed [ 5283.640512] LustreError: 2455:0:(gss_svc_upcall.c:969:gss_svc_upcall_handle_init()) Skipped 3 previous similar messages [ 5283.642785] Lustre: 2455:0:(sec_gss.c:370:gss_cli_ctx_uptodate()) server installed reverse ctx ffff9117dc872100 idx 0xf84af856cc294b8, expiry 1576739209(+604800s) [ 5283.645641] Lustre: 2455:0:(sec_gss.c:370:gss_cli_ctx_uptodate()) Skipped 1 previous similar message [ 5288.660041] Lustre: 2454:0:(sec_gss.c:2066:gss_svc_handle_init()) create svc ctx ffff9117b645ae40: user from 10.9.5.229@tcp authenticated as root [ 5288.662394] Lustre: 2454:0:(sec_gss.c:2066:gss_svc_handle_init()) Skipped 2 previous similar messages [ 5288.688635] Lustre: 2454:0:(sec_gss.c:370:gss_cli_ctx_uptodate()) server installed reverse ctx ffff9117dca36600 idx 0xa48733bdf9b2ed72, expiry 1576739214(+604800s) [ 5288.691011] Lustre: 2454:0:(sec_gss.c:370:gss_cli_ctx_uptodate()) Skipped 2 previous similar messages [ 5288.692523] Lustre: lustre-MDT0000: Connection restored to 15155991-9df2-4 (at 10.9.5.229@tcp) [ 5288.693925] Lustre: Skipped 4 previous similar messages [ 5288.716302] Lustre: 2454:0:(sec_gss.c:2326:gss_svc_handle_destroy()) destroy svc ctx ffff9117b645ae40 idx 0x6e9f4cddf09a985c (0->10.9.5.229@tcp) [ 5288.718414] Lustre: 2454:0:(sec_gss.c:2326:gss_svc_handle_destroy()) Skipped 3 previous similar messages [ 5288.980369] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-sec test_25: @@@@@@ FAIL: unable to mount clients trevis-40vm1.trevis.whamcloud.com,trevis-40vm2 Similar failure for sanity-sec test 27a at https://testing.whamcloud.com/test_sets/74c35394-1bab-11ea-adca-52540065bddc . In the client (vm6) console log we see that gss_kt_update() rpc error is -85 and not 0 as above [23759.672542] LustreError: 3900:0:(gss_keyring.c:1411:gss_kt_update()) negotiation: rpc err -85, gss err 0

          People

            sebastien Sebastien Buisson
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: