[LU-6465] OSD ID mapping cache is not safe to use. Created: 14/Apr/15  Updated: 12/May/16  Resolved: 05/May/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: Di Wang Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

t seems osd_id_map is not safe to use right now. So the pair of [FID, OID] is added to cache after lookup, but if another thread delete the object, it will only invalidate the cache in its own thread info.

int osd_oi_delete(struct osd_thread_info *info,
                  struct osd_device *osd, const struct lu_fid *fid,
                  handle_t *th, enum oi_check_flags flags)
{
        struct lu_fid *oi_fid = &info->oti_fid2;

        /* clear idmap cache */
        if (lu_fid_eq(fid, &info->oti_cache.oic_fid))
                fid_zero(&info->oti_cache.oic_fid);
      ..............

And other threads can still get the OID from the cache, and if the inode has been reused by other object, then we will see bunch of

Lustre: lustre-MDT0003-osd: FID [0x2c0000404:0x1f34:0x0] != self_fid [0x2c0000404:0x281c:0x0]

Unfortunately, it will also trigger osd-scrub in this case, this is what I saw in the DNE2 failover test.



 Comments   
Comment by Di Wang [ 14/Apr/15 ]

And disable the oic_cache does make problem go away.

[root@testnode lustre-release_new]# git diff
diff --git a/lustre/osd-ldiskfs/osd_handler.c b/lustre/osd-ldiskfs/osd_handler.c
index 7ab83f3..cce5dfa 100644
--- a/lustre/osd-ldiskfs/osd_handler.c
+++ b/lustre/osd-ldiskfs/osd_handler.c
@@ -607,13 +607,15 @@ static int osd_fid_lookup(const struct lu_env *env, struct osd_object *obj,
        if (conf != NULL && conf->loc_flags & LOC_F_NEW)
                GOTO(out, result = 0);
 
+#if 0
+       /* Disable OIC cache until LU-6465 is resolved */
        /* Search order: 1. per-thread cache. */
        if (lu_fid_eq(fid, &oic->oic_fid) &&
            likely(oic->oic_dev == dev)) {
                id = &oic->oic_lid;
                goto iget;
        }
-
+#endif
        id = &info->oti_id;
        if (!list_empty(&scrub->os_inconsistent_items)) {
                /* Search order: 2. OI scrub pending list. */
Comment by Gerrit Updater [ 20/Apr/15 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/14499
Subject: LU-6465 osd: NO OI scrub because of cached invalid OI mapping
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 06534d926e58b3908302d389aac4a39d73e6d2b8

Comment by Gerrit Updater [ 05/May/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14499/
Subject: LU-6465 osd: NO OI scrub because of cached invalid OI mapping
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: dac584c6946d15e1ca9e6feeb26b164768041c40

Generated at Sat Feb 10 02:00:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.