Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>
This issue relates to the following test suite run:
https://testing.whamcloud.com/test_sets/422547aa-0b5c-476f-98b2-98238f154fa6
test_17 failed with the following error:
/mnt/lustre/d17.ost-pools/dir not created in expected pool 'testpool'
The mds1 console logs show connections from some external NIDs that are not part of the test cluster:
https://testing.whamcloud.com/test_logs/b079d819-2f1a-4c24-b38e-598358d66388/show_text
[ 9765.290252] LNetError: Refusing connection from 10.240.31.228 for 10.240.31.239@tcp999: No matching NI [ 9766.305861] LNetError: Refusing connection from 10.240.31.228 for 10.240.31.239@tcp999: No matching NI [ 9766.307442] LNetError: Skipped 1 previous similar message [ 9768.350650] LNetError: Refusing connection from 10.240.31.227 for 10.240.31.239@tcp999: No matching NI [ 9768.352244] LNetError: Skipped 1 previous similar message [ 9771.425810] LNetError: Refusing connection from 10.240.31.228 for 10.240.31.239@tcp999: No matching NI [ 9771.427411] LNetError: Skipped 1 previous similar message [ 9776.436509] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n debug [ 9776.765547] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -n debug=0 [ 9777.659023] LNetError: Refusing connection from 10.240.31.229 for 10.240.31.239@tcp999: No matching NI : : [ 9668.513828] Lustre: lustre-MDT0000: Received LWP connection from 10.240.31.228@tcp, removing former export from 0@lo [ 9668.515971] Lustre: Skipped 1 previous similar message [ 9668.525463] LustreError: lustre-MDT0000-osp-MDT0002: operation ldlm_enqueue to node 0@lo failed: rc = -107 [ 9668.527085] LustreError: Skipped 22 previous similar messages [ 9668.528024] Lustre: lustre-MDT0000-osp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [ 9668.530256] Lustre: Skipped 12 previous similar messages [ 9668.531413] LustreError: lustre-MDT0000-osp-MDT0002: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. [ 9668.535694] Lustre: lustre-MDT0000-osp-MDT0002: Connection restored to 0@lo (at 0@lo) [ 9668.536908] Lustre: Skipped 16 previous similar messages [ 9668.755111] Lustre: DEBUG MARKER: /usr/sbin/lctl mark ost-pools test_17: @@@@@@ FAIL: \/mnt\/lustre\/d17.ost-pools\/dir not created in expected pool \'testpool\' [ 9668.940458] Lustre: DEBUG MARKER: ost-pools test_17: @@@@@@ FAIL: /mnt/lustre/d17.ost-pools/dir not created in expected pool 'testpool' [ 9669.178029] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /autotest/autotest-2/2025-09-16/lustre-reviews_review-dne-zfs-part-6_116574_40_afa9fe80-d7cb-4989-a7d8-0cb7ad36b09f//ost-pools.test_17.debug_log.$(hostname -s).1758014696.log; [ 9669.178029] dmesg > /autotest/autotest-2/2025-09-16/lustre-reviews_re [ 9669.626962] Lustre: lustre-MDT0000: Received MDS connection from 10.240.31.229@tcp, removing former export from 10.240.31.250@tcp [ 9669.628832] Lustre: Skipped 6 previous similar messages [ 9671.575396] Lustre: lustre-MDT0000: already connected client lustre-MDT0000-lwp-OST0000_UUID (at 10.240.31.238@tcp) with handle 0xeb3e5f0b0f134e8b. Rejecting client with the same UUID trying to reconnect with handle 0x36d75f4a182cc9ad [ 9671.579077] Lustre: Skipped 5 previous similar messages [ 9673.639664] Lustre: lustre-MDT0000: Received MDS connection from 10.240.31.228@tcp, removing former export from 0@lo [ 9673.641312] Lustre: Skipped 6 previous similar messages [ 9673.736046] LustreError: lustre-MDT0000-osp-MDT0002: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. [ 9676.700429] Lustre: lustre-MDT0000: already connected client lustre-MDT0000-lwp-OST0000_UUID (at 10.240.31.238@tcp) with handle 0xeb3e5f0b0f134e8b. Rejecting client with the same UUID trying to reconnect with handle 0x36d75f4a182cc9ad [ 9676.704070] Lustre: Skipped 7 previous similar messages [ 9679.873357] Lustre: lustre-MDT0000: Received LWP connection from 10.240.31.229@tcp, removing former export from 10.240.31.250@tcp
Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/116574 - 4.18.0-553.71.1.el8_10.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/116574 - 4.18.0-553.71.1.el8_lustre.x86_64
It looks like some other test cluster is using NIDs that conflict with this session and causing testing problems?
I don't know if there are some test cases which are modifying the NIDs to create "fake" NIDs, but in some cases these conflict with actually running test sessions? Having a different filesystem name instead of using "lustre" for every test session would avoid the issue of bad external hosts hijacking a test session.
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
ost-pools test_17 - /mnt/lustre/d17.ost-pools/dir not created in expected pool 'testpool'
Attachments
Issue Links
- is related to
-
LU-4966 MGS target registration should use proper UUID
-
- Open
-