[LU-14277] any create blocked due any OST fail Created: 23/Dec/20 Updated: 12/Jun/23 Resolved: 26/Jan/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Alexey Lyashkov | Assignee: | Alexey Lyashkov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
multiple MDT threads blocked by one declare_create call due OST fail. |
| Comments |
| Comment by Alex Zhuravlev [ 23/Dec/20 ] |
|
I think this is a dup of |
| Comment by Alexey Lyashkov [ 23/Dec/20 ] |
|
It probably. But my fix much much simple - just revert some code landed with LOV -> LOD transmission. |
| Comment by Alexey Lyashkov [ 23/Dec/20 ] |
|
one more bug introduced by last changes in replace LQ_DIRTY with bitfield. c1d0a355a6a (Lai Siyao 2019-08-05 02:08:02 +0800 327) up_write(<d->ltd_qos.lq_rw_sem);
c7f2e70a27e (Alex Zhuravlev 2012-09-20 15:56:31 +0400 328)
c7f2e70a27e (Alex Zhuravlev 2012-09-20 15:56:31 +0400 329) if (placed != real_count) {
c7f2e70a27e (Alex Zhuravlev 2012-09-20 15:56:31 +0400 330) /* This should never happen */
c1d0a355a6a (Lai Siyao 2019-08-05 02:08:02 +0800 331) LCONSOLE_ERROR_MSG(0x14e, "Failed to place all tgts in the "
c7f2e70a27e (Alex Zhuravlev 2012-09-20 15:56:31 +0400 332) "round-robin list (%d of %d).\n",
c7f2e70a27e (Alex Zhuravlev 2012-09-20 15:56:31 +0400 333) placed, real_count);
c7f2e70a27e (Alex Zhuravlev 2012-09-20 15:56:31 +0400 334) for (i = 0; i < lqr->lqr_pool.op_count; i++) {
c1d0a355a6a (Lai Siyao 2019-08-05 02:08:02 +0800 335) LCONSOLE(D_WARNING, "rr #%d tgt idx=%d\n", i,
c7f2e70a27e (Alex Zhuravlev 2012-09-20 15:56:31 +0400 336) lqr->lqr_pool.op_array[i]);
c7f2e70a27e (Alex Zhuravlev 2012-09-20 15:56:31 +0400 337) }
c7f2e70a27e (Alex Zhuravlev 2012-09-20 15:56:31 +0400 338) lqr->lqr_dirty = 1;
c7f2e70a27e (Alex Zhuravlev 2012-09-20 15:56:31 +0400 339) RETURN(-EAGAIN);
lqr_dirty was without lock held. |
| Comment by Andreas Dilger [ 23/Dec/20 ] |
Why not push that patch under
That lqr_dirty = 1 is in a "should never happen" code path with a very clear error message that is not being seen here, so it doesn't seem to be related to this problem. |
| Comment by Andreas Dilger [ 23/Dec/20 ] |
I don't see any patch like that... There no reference to "LQ_DIRTY" in any of the patches in Git. The very first patch introducing lqr_dirty (replacing lq_dirty_rr) is using a bitfield, and it has been a bitfield since that time, even though the code was moved around to different files a few times: commit 665e36b780faa2144cecccd29a0d8a8196a76903
Author: Nathan Rutman <nathan@clusterfs.com>
AuthorDate: Mon Sep 15 23:25:34 2008 +0000
b=14836
i=nathan
i=adilger
OST pools on HEAD, comprehensive patch including 17054:19007;
16935:18918,19012,19089,19128; 16978:18872
:
:
+/* Round-robin allocator data */
+struct lov_qos_rr {
+ __u32 lqr_start_idx; /* start index of new inode */
+ __u32 lqr_offset_idx; /* aliasing for start_idx */
+ int lqr_start_count; /* reseed counter */
+ struct ost_pool lqr_pool; /* round-robin optimized list */
+ unsigned long lqr_dirty:1; /* recalc round-robin list */
+};
+
+/* Stripe placement optimization */
struct lov_qos {
struct list_head lq_oss_list; /* list of OSSs that targets use */
struct rw_semaphore lq_rw_sem;
__u32 lq_active_oss_count;
- __u32 *lq_rr_array; /* round-robin optimized list */
- unsigned int lq_rr_size; /* rr array size */
unsigned int lq_prio_free; /* priority for free space */
+ struct lov_qos_rr lq_rr; /* round robin qos data */
unsigned long lq_dirty:1, /* recalc qos data */
- lq_dirty_rr:1, /* recalc round-robin list */
lq_same_space:1,/* the ost's all have approx.
the same space avail */
lq_reset:1; /* zero current penalties */
The patch that introduced lq_dirty_rr was also using a bitfield: commit 113303973ec9f8484eb2355a1a6ef3c4c7fd6a56
Author: Nathan Rutman <nathan@clusterfs.com>
AuthorDate: Sat Feb 10 06:33:41 2007 +0000
land b1_5 onto HEAD
:
:
+struct lov_qos {
+ struct list_head lq_oss_list; /* list of OSSs that targets use */
+ struct rw_semaphore lq_rw_sem;
+ __u32 lq_active_oss_count;
+ __u32 *lq_rr_array; /* round-robin optimized list */
+ unsigned int lq_rr_size; /* rr array size */
+ unsigned int lq_prio_free; /* priority for free space */
+ unsigned int lq_dirty:1, /* recalc qos data */
+ lq_dirty_rr:1, /* recalc round-robin list */
+ lq_same_space:1,/* the ost's all have approx.
+ the same space avail */
+ lq_reset:1; /* zero current penalties */
+};
I'm happy to see a simpler fix and look forward to seeing your patch. |
| Comment by Alexey Lyashkov [ 24/Dec/20 ] |
|
Andreas, lets open a lustre 2.0..2.4 code. Before an LOD was introduced. [root@lustre lov]# grep _DIRTY * lov_obd.c: cfs_set_bit(LQ_DIRTY, &lov->lov_qos.lq_flags); lov_qos.c: cfs_set_bit(LQ_DIRTY, &lov->lov_qos.lq_flags); lov_qos.c: cfs_set_bit(LQ_DIRTY, &lov->lov_qos.lq_flags); lov_qos.c: if (!cfs_test_bit(LQ_DIRTY, &lov->lov_qos.lq_flags)) lov_qos.c: cfs_clear_bit(LQ_DIRTY, &lov->lov_qos.lq_flags); lov_qos.c: if (!cfs_test_bit(LQ_DIRTY, &lov->lov_qos.lq_flags) && lov_qos.c: if (cfs_test_bit(LQ_DIRTY, &lov->lov_qos.lq_flags)) lov_qos.c: cfs_set_bit(LQ_DIRTY, &lov->lov_qos.lq_flags); lproc_lov.c: cfs_set_bit(LQ_DIRTY, &lov->lov_qos.lq_flags); lproc_lov.c: cfs_set_bit(LQ_DIRTY, &lov->lov_qos.lq_flags); But i talk about LQ_SF_PROGRESS bit. void qos_statfs_update(struct obd_device *obd, __u64 max_age, int wait)
{
struct lov_obd *lov = &obd->u.lov;
struct obd_info *oinfo;
int rc = 0;
struct ptlrpc_request_set *set = NULL;
ENTRY;
if (cfs_time_beforeq_64(max_age, obd->obd_osfs_age))
/* statfs data are quite recent, don't need to refresh it */
RETURN_EXIT;
/* Check if statfs already in progress */
if (cfs_test_and_set_bit(LQ_SF_PROGRESS, &lov->lov_qos.lq_flags))
GOTO(out, rc = 0);
LQ_SF_PROGRESS was gone in LOV -> LOD transmission, but LQ_DIRTY still exist in 2.7..2.12. for 2.12 code this code is safe. --- a/lustre/lod/lod_internal.h +++ b/lustre/lod/lod_internal.h @@ -83,7 +83,7 @@ enum lq_flag { LQ_SAME_SPACE = 1, /* the ost's all have approx. the same space avail */ LQ_RESET = 2, /* zero current penalties */ - + LQ_SF_PROGRESS = 3 /* statfs op in progress */ }; struct lod_qos { diff --git a/lustre/lod/lod_qos.c b/lustre/lod/lod_qos.c index 1c377422b1..be0c10f0e8 100644 --- a/lustre/lod/lod_qos.c +++ b/lustre/lod/lod_qos.c @@ -291,11 +291,11 @@ void lod_qos_statfs_update(const struct lu_env *env, struct lod_device *lod) /* statfs data are quite recent, don't need to refresh it */ RETURN_EXIT; - down_write(&lod->lod_qos.lq_rw_sem); - - if (obd->obd_osfs_age > max_age) - goto out; + /* Check if statfs already in progress */ + if (test_and_set_bit(LQ_SF_PROGRESS, &lod->lod_qos.lq_flags)) + RETURN_EXIT; + down_read(&osts->op_rw_sem); for (i = 0; i < osts->op_count; i++) { idx = osts->op_array[i]; avail = OST_TGT(lod,idx)->ltd_statfs.os_bavail; @@ -306,10 +306,12 @@ void lod_qos_statfs_update(const struct lu_env *env, struct lod_device *lod) /* recalculate weigths */ set_bit(LQ_DIRTY, &lod->lod_qos.lq_flags); } + up_read(&osts->op_rw_sem); + obd->obd_osfs_age = ktime_get_seconds(); -out: - up_write(&lod->lod_qos.lq_rw_sem); + clear_bit(LQ_SF_PROGRESS, &lod->lod_qos.lq_flags); + EXIT; } read semaphore is good protection from the global pool modification and set dirty bit is good atomic operation. |
| Comment by Alexey Lyashkov [ 24/Dec/20 ] |
|
Hm.. it looks this patch just don't accepted by WC in past :-/ |
| Comment by Alexander Zarochentsev [ 24/Dec/20 ] |
|
there was an attempt to convert those bitfields to atomic bits : |
| Comment by Andreas Dilger [ 24/Dec/20 ] |
The original version of I don't think that equals "don't accepted by WC". Everyone has to follow the same process for landing patches. I have lots of patches that I would like landed, but I know the work is mine to update them and to get them past testing before they can land. |
| Comment by Alex Zhuravlev [ 24/Dec/20 ] |
|
I think the rule is very simple: patch accepted - you're lucky, grab a cake; not yet accepted - press rebase. |
| Comment by Alexey Lyashkov [ 24/Dec/20 ] |
|
Andreas, this patch is needs because of large number points where unprotected set used. Alex, sorry about reverting notice. |
| Comment by Alex Zhuravlev [ 24/Dec/20 ] |
|
shadowI'm fine, I'm just trying to say that the list of the patches (ready to land!) can be very long and sometimes it takes months to get through - because of priorities, luck, etc. I do have patches hanging around with +2 for months, but basically master can't accept too many in short time, it's just a way to keep it more-less stable. the good thing is that rebase button simplifies efforts to maintain patch uptodate (not guaranteed, of course). and a good test demonstrating the issue helps, obviously. |
| Comment by Alexey Lyashkov [ 24/Dec/20 ] |
|
Ok. i will pickup an |
| Comment by Cory Spitz [ 21/Dec/21 ] |
| Comment by Gerrit Updater [ 26/Jan/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/41497/ |
| Comment by Peter Jones [ 26/Jan/22 ] |
|
Landed for 2.15 |