[LU-15055] spurious qmt_pool_add_rem() message when adding OST to pool: -EEXIST Created: 01/Oct/21 Updated: 27/Jun/22 Resolved: 27/Jun/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Sergey Cheremencev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
| Comments |
| Comment by Peter Jones [ 26/Nov/21 ] |
|
Alex Any suggestions on this one? Peter |
| Comment by Andreas Dilger [ 26/Nov/21 ] |
|
sergey, I suspect that this message is actually related to a pool quota change? The other possibility is that there is something bad/wrong happening with how OST pools are configured, and the same OST is being added to a pool multiple times. Also related, the formatting of the error message is broken:
CERROR("%s: can't %s %s pool %s: rc = %d\n",
add ? "add to" : "remove", obd->obd_name,
slavename, poolname, rc);
resulting in: qmt_pool_add_rem()) add to: can't scratch-QMT0000 scratch-OST0000_UUID pool pool1: rc = -17 but it should really be something like: qmt_pool_add_rem()) scratch-QMT0000: can't add scratch-OST0000_UUID to pool 'pool1': rc = -17 The simple solution would be to quiet this console error message for the -EEXIST, but it would be better to understand why this message suddenly started being seen, and fix the root cause (e.g. duplicate config records for OSTs, or an issue in the test script?). |
| Comment by Andreas Dilger [ 26/Nov/21 ] |
|
The only patches that are even tangentially related to configuration that landed in the target window are: 16321de596 LU-14661 obdclass: Add peer/peer NI when processing llog 301d76a711 LU-14876 out: don't connect to busy MDS-MDS export and no patches even come close to the OST pools or quota code or tests in this window. Possibly something in [11762.531609] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity-quota test 1b: Quota pools: Block hard limit \(normal use and out of quota\) === 14:55:59 \(1637679359\) (1637673981) [11780.265188] Lustre: DEBUG MARKER: lctl pool_new lustre.qpool1 [11788.955306] Lustre: DEBUG MARKER: lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.qpool1 2>/dev/null || echo foo [11790.247590] Lustre: DEBUG MARKER: lctl get_param -n lod.lustre-MDT0002-mdtlov.pools.qpool1 2>/dev/null || echo foo [11791.573141] Lustre: DEBUG MARKER: /usr/sbin/lctl pool_add lustre.qpool1 lustre-OST[0000-0007/1] [11802.006568] LustreError: 624610:0:(qmt_pool.c:1390:qmt_pool_add_rem()) add to: can't lustre-QMT0000 lustre-OST0000_UUID pool qpool1: rc = -17 [11802.008583] LustreError: 624610:0:(qmt_pool.c:1390:qmt_pool_add_rem()) Skipped 1 previous similar message : [11907.682379] Lustre: DEBUG MARKER: /usr/sbin/lctl pool_list lustre [11908.326534] Lustre: DEBUG MARKER: /usr/sbin/lctl pool_list lustre.qpool1 [11908.975981] Lustre: DEBUG MARKER: lctl pool_remove lustre.qpool1 lustre-OST0000_UUID [11916.181171] LustreError: 632050:0:(qmt_pool.c:1390:qmt_pool_add_rem()) remove: can't lustre-QMT0000 lustre-OST0000_UUID pool qpool1: rc = -22 [11916.183136] LustreError: 632050:0:(qmt_pool.c:1390:qmt_pool_add_rem()) Skipped 7 previous similar messages [11917.638271] Lustre: DEBUG MARKER: lctl pool_remove lustre.qpool1 lustre-OST0001_UUID [11926.306964] Lustre: DEBUG MARKER: lctl pool_remove lustre.qpool1 lustre-OST0002_UUID [11935.124558] LustreError: 632331:0:(qmt_pool.c:1390:qmt_pool_add_rem()) remove: can't lustre-QMT0000 lustre-OST0002_UUID pool qpool1: rc = -22 [11935.126564] LustreError: 632331:0:(qmt_pool.c:1390:qmt_pool_add_rem()) Skipped 1 previous similar message [11936.973878] Lustre: DEBUG MARKER: lctl pool_remove lustre.qpool1 lustre-OST0003_UUID [11945.649745] Lustre: DEBUG MARKER: lctl pool_remove lustre.qpool1 lustre-OST0004_UUID so it is also removing the same OST from the pool twice and lu_tgt_pool_remove() is returning -EINVAL? |
| Comment by Etienne Aujames [ 08/Apr/22 ] |
|
Hello, I was debugging the https://review.whamcloud.com/#/c/46955 (" I think the root cause is the following code: static int lod_pool_add_q(struct obd_device *obd, char *poolname, char *ostname) { int err = lod_pool_add(obd, poolname, ostname); if (!err) { obd = obd_find_qmt0(obd->obd_name); <------ if (obd) obd_pool_add(obd, poolname, ostname); } return err; } If the mds0 has 2 MDT, the MGS will add 1 pool_add record in CONFIGS/lustre-MDT0000 and CONFIGS/lustre-MDT0001. So obd_pool_add/lod_pool_add_q will be call 2 times (1 for each MDT), QMT0000 will be found 2 times (on mds0) and finally qmt_pool_add/qmt_pool_add_rem will try to add the same OST on QMT0000. So maybe we can add/remove the pool to QMT0000 only for the MDT0000: static int lod_pool_add_q(struct obd_device *obd, char *poolname, char *ostname) { int err = lod_pool_add(obd, poolname, ostname); if (!err && strstr(obd->obd_name, "-MDT0000")) { obd = obd_find_qmt0(obd->obd_name); if (obd) obd_pool_add(obd, poolname, ostname); } return err; } |
| Comment by Gerrit Updater [ 13/Apr/22 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/47059 |
| Comment by Gerrit Updater [ 27/Jun/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47059/ |
| Comment by Peter Jones [ 27/Jun/22 ] |
|
Landed for 2.16 |