[LU-15112] attempt to register an OST with duplicated index should fail but it does not Created: 15/Oct/21  Updated: 07/Dec/23  Resolved: 06/Jan/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Alexander Zarochentsev Assignee: Alexander Zarochentsev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-15435 __req_capsule_get(): ASSERTION( msg !... Resolved
is related to LU-4966 handle server registration errors gra... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The problem is illustrated by the following conf-sanity test:

test_129()
{
        start_mds || error "MDS start failed"
        format_ost 1
        start ost1 $(ostdevname 1) $OST_MOUNT_OPTS &&
                error "start ost1 should fail" || true
        start ost1 $(ostdevname 1) $OST_MOUNT_OPTS &&
                error "second start ost1 should fail" || true
        do_facet ost1 "$TUNEFS --writeconf $(ostdevname 1)"
        start ost1 $(ostdevname 1) $OST_MOUNT_OPTS ||
                error "start ost1 failed"
        stop ost1
        stop_mds
}

OST0000 reformatted and attempted to be registered, the first attempt fails as expected with "Address already in use" error, but the second attempt succeeds:

== conf-sanity test 129: attempt to connect an OST with the same index should fail ========================================================== 17:59:44 (1634309984)
start mds service on devvm1
Starting mds1: -o localrecov  /dev/mapper/mds1_flakey /mnt/lustre-mds1
Started lustre-MDT0000
start mds service on devvm1
Starting mds2: -o localrecov  /dev/mapper/mds2_flakey /mnt/lustre-mds2
Started lustre-MDT0001
devvm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
devvm1: executing wait_import_state_mount FULL mdc.lustre-MDT0001-mdc-*.mds_server_uuid
Format ost1: /dev/mapper/ost1_flakey
Starting ost1: -o localrecov  /dev/mapper/ost1_flakey /mnt/lustre-ost1
mount.lustre: mount /dev/mapper/ost1_flakey at /mnt/lustre-ost1 failed: Address already in use
The target service's index is already in use. (/dev/mapper/ost1_flakey)
Start of /dev/mapper/ost1_flakey on ost1 failed 98
Starting ost1: -o localrecov  /dev/mapper/ost1_flakey /mnt/lustre-ost1
Commit the device label on /dev/mapper/ost1_flakey
Started lustre-OST0000
 conf-sanity test_129: @@@@@@ FAIL: second start ost1 should fail 
  Trace dump:
  = ./../tests/test-framework.sh:6320:error()
  = conf-sanity.sh:9306:test_129()
  = ./../tests/test-framework.sh:6624:run_one()
  = ./../tests/test-framework.sh:6671:run_one_logged()
  = ./../tests/test-framework.sh:6497:run_test()
  = conf-sanity.sh:9313:main()
Dumping lctl log to /tmp/test_logs/1634309863/conf-sanity.test_129.*.1634310000.log
Dumping logs only on local client.
FAIL 129 (26s)


 Comments   
Comment by Gerrit Updater [ 15/Oct/21 ]

"Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45259
Subject: LU-15112 tests: same index ost add should fail
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 18f9c246c8ec613d8aa14d0d1ca9c2d38549321f

Comment by Gerrit Updater [ 16/Dec/21 ]

"Alexander Zarochentsev <alexander.zarochentsev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45871
Subject: LU-15112 ptlrpc: make rq_replied flag always correct
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6e4fd9229a46fed2fbb615b040bb8017845b83bd

Comment by Gerrit Updater [ 06/Jan/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45871/
Subject: LU-15112 ptlrpc: make rq_replied flag always correct
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 94f3f1b511609fa190cee64c7e8244f21ef70792

Comment by Gerrit Updater [ 06/Jan/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45259/
Subject: LU-15112 mgc: do not ignore target registration failure
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cefabee52586f443bfd5163f6ac0b5e1b56a9db7

Comment by Cory Spitz [ 06/Jan/22 ]

Fixed for 2.15.0.

Generated at Sat Feb 10 03:15:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.