Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19372

ost-pools test_17: LNetError: Refusing connection from 10.240.31.228 for 10.240.31.239@tcp999: No matching NI

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run:
      https://testing.whamcloud.com/test_sets/422547aa-0b5c-476f-98b2-98238f154fa6

      test_17 failed with the following error:

      /mnt/lustre/d17.ost-pools/dir not created in expected pool 'testpool'
      

      The mds1 console logs show connections from some external NIDs that are not part of the test cluster:
      https://testing.whamcloud.com/test_logs/b079d819-2f1a-4c24-b38e-598358d66388/show_text

      [ 9765.290252] LNetError: Refusing connection from 10.240.31.228 for 10.240.31.239@tcp999: No matching NI
      [ 9766.305861] LNetError: Refusing connection from 10.240.31.228 for 10.240.31.239@tcp999: No matching NI
      [ 9766.307442] LNetError: Skipped 1 previous similar message
      [ 9768.350650] LNetError: Refusing connection from 10.240.31.227 for 10.240.31.239@tcp999: No matching NI
      [ 9768.352244] LNetError: Skipped 1 previous similar message
      [ 9771.425810] LNetError: Refusing connection from 10.240.31.228 for 10.240.31.239@tcp999: No matching NI
      [ 9771.427411] LNetError: Skipped 1 previous similar message
      [ 9776.436509] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n debug
      [ 9776.765547] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -n debug=0
      [ 9777.659023] LNetError: Refusing connection from 10.240.31.229 for 10.240.31.239@tcp999: No matching NI
      :
      :
      [ 9668.513828] Lustre: lustre-MDT0000: Received LWP connection from 10.240.31.228@tcp, removing former export from 0@lo
      [ 9668.515971] Lustre: Skipped 1 previous similar message
      [ 9668.525463] LustreError: lustre-MDT0000-osp-MDT0002: operation ldlm_enqueue to node 0@lo failed: rc = -107
      [ 9668.527085] LustreError: Skipped 22 previous similar messages
      [ 9668.528024] Lustre: lustre-MDT0000-osp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
      [ 9668.530256] Lustre: Skipped 12 previous similar messages
      [ 9668.531413] LustreError: lustre-MDT0000-osp-MDT0002: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      [ 9668.535694] Lustre: lustre-MDT0000-osp-MDT0002: Connection restored to 0@lo (at 0@lo)
      [ 9668.536908] Lustre: Skipped 16 previous similar messages
      [ 9668.755111] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  ost-pools test_17: @@@@@@ FAIL: \/mnt\/lustre\/d17.ost-pools\/dir not created in expected pool \'testpool\' 
      [ 9668.940458] Lustre: DEBUG MARKER: ost-pools test_17: @@@@@@ FAIL: /mnt/lustre/d17.ost-pools/dir not created in expected pool 'testpool'
      [ 9669.178029] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /autotest/autotest-2/2025-09-16/lustre-reviews_review-dne-zfs-part-6_116574_40_afa9fe80-d7cb-4989-a7d8-0cb7ad36b09f//ost-pools.test_17.debug_log.$(hostname -s).1758014696.log;
      [ 9669.178029] 		dmesg > /autotest/autotest-2/2025-09-16/lustre-reviews_re
      [ 9669.626962] Lustre: lustre-MDT0000: Received MDS connection from 10.240.31.229@tcp, removing former export from 10.240.31.250@tcp
      [ 9669.628832] Lustre: Skipped 6 previous similar messages
      [ 9671.575396] Lustre: lustre-MDT0000: already connected client lustre-MDT0000-lwp-OST0000_UUID (at 10.240.31.238@tcp) with handle 0xeb3e5f0b0f134e8b. Rejecting client with the same UUID trying to reconnect with handle 0x36d75f4a182cc9ad
      [ 9671.579077] Lustre: Skipped 5 previous similar messages
      [ 9673.639664] Lustre: lustre-MDT0000: Received MDS connection from 10.240.31.228@tcp, removing former export from 0@lo
      [ 9673.641312] Lustre: Skipped 6 previous similar messages
      [ 9673.736046] LustreError: lustre-MDT0000-osp-MDT0002: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      [ 9676.700429] Lustre: lustre-MDT0000: already connected client lustre-MDT0000-lwp-OST0000_UUID (at 10.240.31.238@tcp) with handle 0xeb3e5f0b0f134e8b. Rejecting client with the same UUID trying to reconnect with handle 0x36d75f4a182cc9ad
      [ 9676.704070] Lustre: Skipped 7 previous similar messages
      [ 9679.873357] Lustre: lustre-MDT0000: Received LWP connection from 10.240.31.229@tcp, removing former export from 10.240.31.250@tcp
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/116574 - 4.18.0-553.71.1.el8_10.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/116574 - 4.18.0-553.71.1.el8_lustre.x86_64

      It looks like some other test cluster is using NIDs that conflict with this session and causing testing problems?

      I don't know if there are some test cases which are modifying the NIDs to create "fake" NIDs, but in some cases these conflict with actually running test sessions? Having a different filesystem name instead of using "lustre" for every test session would avoid the issue of bad external hosts hijacking a test session.

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      ost-pools test_17 - /mnt/lustre/d17.ost-pools/dir not created in expected pool 'testpool'

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated: