[LU-13686] lctl pool_add returns error randomly Created: 17/Jun/20  Updated: 13/Aug/20  Resolved: 13/Aug/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Sergey Cheremencev Assignee: Sergey Cheremencev
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

lctl pool_add returns error code depending on order of OSTs, see example:

[root@dhcppc3 lustre-wc-rel]# lctl pool_add lustre.qpool1 OST0000
OST lustre-OST0000_UUID added to pool lustre.qpool1
[root@dhcppc3 lustre-wc-rel]# lctl pool_add lustre.qpool1 OST[0-1]
OST lustre-OST0000_UUID is already in pool lustre.qpool1
OST lustre-OST0001_UUID added to pool lustre.qpool1
[root@dhcppc3 lustre-wc-rel]# echo $?
0
[root@dhcppc3 lustre-wc-rel]# lctl pool_remove lustre.qpool1 OST[0-1]
OST lustre-OST0000_UUID removed from pool lustre.qpool1
OST lustre-OST0001_UUID removed from pool lustre.qpool1
[root@dhcppc3 lustre-wc-rel]# lctl pool_add lustre.qpool1 OST0001
OST lustre-OST0001_UUID added to pool lustre.qpool1
[root@dhcppc3 lustre-wc-rel]# lctl pool_add lustre.qpool1 OST[0-1]
OST lustre-OST0001_UUID is already in pool lustre.qpool1
OST lustre-OST0000_UUID added to pool lustre.qpool1
pool_add: File exists
[root@dhcppc3 lustre-wc-rel]# echo $?
17 

The problem is caused by jt_pool_cmd, that returns the latest result ignoring previous errors.



 Comments   
Comment by Gerrit Updater [ 17/Jun/20 ]

Sergey Cheremencev (sergey.cheremencev@hpe.com) uploaded a new patch: https://review.whamcloud.com/38960
Subject: LU-13686 utils: pool_add/remove error code fix
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4c30498826c4ee7618ed61f88d1aa2f22666d670

Comment by James Nunez (Inactive) [ 23/Jun/20 ]

I was going to open a ticket for this, but you've already done the work. I'll just add what I see when using 'lctl pool_add' on 2.12.x release:

# lctl pool_add testfs.onepool OST[8-13]
OST testfs-OST0010_UUID is not part of the 'testfs' fs.
OST testfs-OST0011_UUID is not part of the 'testfs' fs.
OST testfs-OST0012_UUID is not part of the 'testfs' fs.
OST testfs-OST0013_UUID is not part of the 'testfs' fs.
OST testfs-OST0008_UUID added to pool testfs.onepool
OST testfs-OST0009_UUID added to pool testfs.onepool
OST testfs-OST000a_UUID added to pool testfs.onepool
OST testfs-OST000b_UUID added to pool testfs.onepool
OST testfs-OST000c_UUID added to pool testfs.onepool
OST testfs-OST000d_UUID added to pool testfs.onepool
OST testfs-OST000e_UUID added to pool testfs.onepool
OST testfs-OST000f_UUID added to pool testfs.onepool
pool_add: No such file or directory
Comment by Gerrit Updater [ 13/Aug/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38960/
Subject: LU-13686 utils: pool_add/remove error code fix
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 77cf3c5d7bf3384a0c0035c5f66b539817f04ff7

Comment by Peter Jones [ 13/Aug/20 ]

Landed for 2.14

Generated at Sat Feb 10 03:03:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.