Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • None
    • None
    • http://github.com/chaos/lustre, version 2.1.1-11chaos
    • 3
    • 6399

    Description

      I've been getting widespread reports that with 2.1 clients users are seeing random ENOENT errors on opens (and maybe stats?).

      Sometimes the file is written, closed, and reopened on the same client node. But the open will report that the file does not exist. A few minutes later the file is definitely there, so the problem is transitory.

      We have also had instances of this where the ENOENT occurs on a node other than where the file was created. One node will create, write, and close the file, and then another will attempt to open it only to get ENOENT.

      Here is an example failure from a simul test:

      09:04:12: Set iteration 4
      09:04:12: Running test #0(iter 0): open, shared mode.
      09:04:12:       Beginning setup
      09:04:12:       Finished setup          (0.001 sec)
      09:04:12:       Beginning test
      09:04:12: Process 177(hype338): FAILED in simul_open, open failed: No such file or directory
      

      There tend to not be any obvious messages in the console logs associated with these events.

      Attachments

        1. hype336-lu1397-1337981358181.llog.gz
          5.60 MB
          Prakash Surya
        2. ior-lustre_debug.diff
          1 kB
          Lai Siyao
        3. open.stp
          0.9 kB
          Prakash Surya
        4. open-v2.stp
          2 kB
          Prakash Surya

        Issue Links

          Activity

            [LU-1397] ENOENT on open()
            pjones Peter Jones added a comment -

            Thanks Prakash. We will track landing this code under LU-506 so I am closing this ticket as a duplicate of that. As to whether this fix will also address the instances you may have observed prior to applying the intially flawed LU-1234 patch, it may well do because the cache mechanism has been altered by this change.

            pjones Peter Jones added a comment - Thanks Prakash. We will track landing this code under LU-506 so I am closing this ticket as a duplicate of that. As to whether this fix will also address the instances you may have observed prior to applying the intially flawed LU-1234 patch, it may well do because the cache mechanism has been altered by this change.

            Thanks for the fix, Lai. We haven't seen any ENOENT failures in the past few days of testing, with patch set 4 applied. This can be marked resolved.

            prakash Prakash Surya (Inactive) added a comment - Thanks for the fix, Lai. We haven't seen any ENOENT failures in the past few days of testing, with patch set 4 applied. This can be marked resolved.

            Ok, Thanks Lai. We should start testing with the new patch set in place today.

            prakash Prakash Surya (Inactive) added a comment - Ok, Thanks Lai. We should start testing with the new patch set in place today.
            laisiyao Lai Siyao added a comment -

            Prakash, yes, the ENOENT failure found this time is introduced by the earlier patch for LU-1234. It's best to enable debuglog trigger as well in your verification test, thanks.

            laisiyao Lai Siyao added a comment - Prakash, yes, the ENOENT failure found this time is introduced by the earlier patch for LU-1234 . It's best to enable debuglog trigger as well in your verification test, thanks.

            Lai, was this issue introduced by one of the earlier versions of that patch? I ask because we have a vague recollection that we saw ENOENT issues prior to applying it, albeit less frequently than we currently do. So, if this specific case was introduced (and now fixed?) by http://review.whamcloud.com/2400, there still may be other issues lurking.

            Either way, I am going to try and get the new revision of the patch applied and tested today.

            And thanks for the detailed explanation Fan Yong! It's very helpful.

            prakash Prakash Surya (Inactive) added a comment - Lai, was this issue introduced by one of the earlier versions of that patch? I ask because we have a vague recollection that we saw ENOENT issues prior to applying it, albeit less frequently than we currently do. So, if this specific case was introduced (and now fixed?) by http://review.whamcloud.com/2400 , there still may be other issues lurking. Either way, I am going to try and get the new revision of the patch applied and tested today. And thanks for the detailed explanation Fan Yong! It's very helpful.
            laisiyao Lai Siyao added a comment -

            Patch is updated: http://review.whamcloud.com/#change,2400.

            Please revert previous patch in your code base and patch against this one.

            laisiyao Lai Siyao added a comment - Patch is updated: http://review.whamcloud.com/#change,2400 . Please revert previous patch in your code base and patch against this one.
            laisiyao Lai Siyao added a comment -

            Yong, you're right, I think it's better to remove d_rehash(), this looks to be a bug from old kernel (from d_splice_alias(), and this line is removed in latest code).

            I'll commit a patch later.

            laisiyao Lai Siyao added a comment - Yong, you're right, I think it's better to remove d_rehash(), this looks to be a bug from old kernel (from d_splice_alias(), and this line is removed in latest code). I'll commit a patch later.
            yong.fan nasf (Inactive) added a comment - - edited

            After more investigation, I think that the operation sequences on the client when failure occurred were as following:

            1) The thread "54770" was one IOR thread which was trying to parsing "/p/lcrater2/swltest/SWL/Hype2/IO/37988/test_dir/miranda_io.out.10022". Before processed the last component "miranda_io.out.10022", it needed to lookup the "test_dir" firstly.

            2) The dentry "ffff8804302af900" for the "test_dir" existed in memory at that time, but was marked as "DCACHE_LUSTRE_INVALID". So "ll_compare()" against such dentry "ffff8804302af900" returned unmatched for the thread "54770". So the thread "54770" created new negative dentry "ffff8804245c3800" for the "test_dir" (reverse guess from the result), and triggered real lookup:

            00000080:00002000:6.0:1337981358.181442:0:54770:0:(dcache.c:103:ll_dcompare()) found name test_dir(ffff8804302af900) - flags 0/4000000 - refc 25
            00000080:00200000:6.0:1337981358.181444:0:54770:0:(namei.c:454:ll_lookup_it()) VFS Op:name=test_dir,dir=144237863481960082/33582994(ffff8803b184d838),intent=0
            ...
            00800000:00000002:6.0:1337981358.181449:0:54770:0:(lmv_intent.c:405:lmv_intent_lookup()) LOOKUP_INTENT with fid1=[0x2006f9298:0xe692:0x0], fid2=[0x0:0x0:0x0], name='test_dir' -> mds #0
            00000002:00010000:6.0:1337981358.181450:0:54770:0:(mdc_locks.c:917:mdc_intent_lock()) (name: test_dir,[0x0:0x0:0x0]) in obj [0x2006f9298:0xe692:0x0], intent: lookup flags 00
            

            3) The thread "54770" succeeded to re-obtain related ibits lock from MDS, and tried to update inode and dentry for the "test_dir" on the client:

            00000002:00002000:6.0:1337981358.181799:0:54770:0:(mdc_locks.c:844:mdc_finish_intent_lock()) D_IT dentry test_dir intent: lookup status 0 disp b rc 0
            ...
            00000080:00200000:6.0:1337981358.181809:0:54770:0:(namei.c:151:ll_iget()) got inode: ffff8803b184d3f8 for [0x2006f9298:0xe693:0x0]
            

            4) When the thread "54770" was in the function "ll_splice_alias()" to process the "test_dir", another thread "54766" (the failure thread as described above) began to lookup the "test_dir" for parsing "/p/lcrater2/swltest/SWL/Hype2/IO/37988/test_dir/miranda_io.out.10018".

            5) There was race condition between the thread "54770" and the thread "54766" for the dentry "test_dir":

            struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de)
            {
                    struct dentry *new;
            
                    if (inode) {
                           new = ll_find_alias(inode, de);
                           if (new) {
                                  ll_dops_init(new, 1, 1);
            (5.1)===>                      d_rehash(de);
            (5.3)===>                      d_move(new, de);
                                  iput(inode);
                                  CDEBUG(D_DENTRY,
                                         "Reuse dentry %p inode %p refc %d flags %#x\n",
                                         new, new->d_inode, d_refcount(new), new->d_flags);
                                  return new;
                           }
                    }
            ...
            }
            
            
            int ll_dcompare(struct dentry *parent, struct qstr *d_name, struct qstr *name)
            {
            ...
            (5.2)===>        CDEBUG(D_DENTRY,"found name %.*s(%p) - flags %d/%x - refc %d\n",
                           name->len, name->name, dchild,
                           d_mountpoint(dchild), dchild->d_flags & DCACHE_LUSTRE_INVALID,
                           atomic_read(&dchild->d_count));
            ...
            }
            

            5.1) The thread "54770" found the former dentry "ffff8804302af900", and tried to reuse "ffff8804302af900". But firstly, it rehashed the negative dentry "ffff8804245c3800" before established the inode alias.

            5.2) The thread "54766" found the negative dentry "ffff8804245c3800" in "__d_lookup()", which was just rehashed by the thread "54770". Since the negative dentry "ffff8804245c3800" is not marked as "DCACHE_LUSTRE_INVALID", the thread "54766" regarded it as valid entry for the "test_dir".

            5.3) The thread "54770" called "d_move()" to establish relative inode alias. After that, the exported dentry for "test_dir" became the valid "ffff8804302af900".

            00000080:00010000:6.0:1337981358.181811:0:54770:0:(namei.c:410:ll_lookup_it_finish()) setting l_data to inode ffff8803b184d3f8 (144237863481960083/33582994)
            ...
            00000080:00002000:2.0:1337981358.181812:0:54766:0:(dcache.c:103:ll_dcompare()) found name 37988(ffff8804302af540) - flags 0/0 - refc 6
            00000080:00200000:2.0:1337981358.181812:0:54766:0:(dcache.c:359:ll_revalidate_it()) VFS Op:name=37988,intent=0
            00000080:00200000:2.0:1337981358.181813:0:54766:0:(file.c:2465:ll_inode_permission()) VFS Op:inode=144237863481960082/33582994(ffff8803b184d838), inode mode 41c0 mask 1
            00000080:00002000:2.0:1337981358.181814:0:54766:0:(dcache.c:103:ll_dcompare()) found name test_dir(ffff8804245c3800) - flags 0/0 - refc 1
            00000080:00002000:6.0:1337981358.181815:0:54770:0:(namei.c:378:ll_splice_alias()) Reuse dentry ffff8804302af900 inode ffff8803b184d3f8 refc 26 flags 0x4000000
            00000080:00010000:6.0:1337981358.181817:0:54770:0:(dcache.c:324:ll_lookup_finish_locks()) setting l_data to inode ffff8803b184d3f8 (144237863481960083/33582994)
            00000080:00010000:6.0:1337981358.181818:0:54770:0:(dcache.c:233:ll_intent_drop_lock()) releasing lock with cookie 0x829fa7d835b2db86 from it ffff880431f49b98
            

            So here, we should not make the negative dentry "ffff8804245c3800" to be visible to others in "ll_splice_alias()". Either remove "rehash" or mark the dentry as "DCACHE_LUSTRE_INVALID". So need more suitable processing for that.

            yong.fan nasf (Inactive) added a comment - - edited After more investigation, I think that the operation sequences on the client when failure occurred were as following: 1) The thread "54770" was one IOR thread which was trying to parsing "/p/lcrater2/swltest/SWL/Hype2/IO/37988/test_dir/miranda_io.out.10022". Before processed the last component "miranda_io.out.10022", it needed to lookup the "test_dir" firstly. 2) The dentry "ffff8804302af900" for the "test_dir" existed in memory at that time, but was marked as "DCACHE_LUSTRE_INVALID". So "ll_compare()" against such dentry "ffff8804302af900" returned unmatched for the thread "54770". So the thread "54770" created new negative dentry "ffff8804245c3800" for the "test_dir" (reverse guess from the result), and triggered real lookup: 00000080:00002000:6.0:1337981358.181442:0:54770:0:(dcache.c:103:ll_dcompare()) found name test_dir(ffff8804302af900) - flags 0/4000000 - refc 25 00000080:00200000:6.0:1337981358.181444:0:54770:0:(namei.c:454:ll_lookup_it()) VFS Op:name=test_dir,dir=144237863481960082/33582994(ffff8803b184d838),intent=0 ... 00800000:00000002:6.0:1337981358.181449:0:54770:0:(lmv_intent.c:405:lmv_intent_lookup()) LOOKUP_INTENT with fid1=[0x2006f9298:0xe692:0x0], fid2=[0x0:0x0:0x0], name= 'test_dir' -> mds #0 00000002:00010000:6.0:1337981358.181450:0:54770:0:(mdc_locks.c:917:mdc_intent_lock()) (name: test_dir,[0x0:0x0:0x0]) in obj [0x2006f9298:0xe692:0x0], intent: lookup flags 00 3) The thread "54770" succeeded to re-obtain related ibits lock from MDS, and tried to update inode and dentry for the "test_dir" on the client: 00000002:00002000:6.0:1337981358.181799:0:54770:0:(mdc_locks.c:844:mdc_finish_intent_lock()) D_IT dentry test_dir intent: lookup status 0 disp b rc 0 ... 00000080:00200000:6.0:1337981358.181809:0:54770:0:(namei.c:151:ll_iget()) got inode: ffff8803b184d3f8 for [0x2006f9298:0xe693:0x0] 4) When the thread "54770" was in the function "ll_splice_alias()" to process the "test_dir", another thread "54766" (the failure thread as described above) began to lookup the "test_dir" for parsing "/p/lcrater2/swltest/SWL/Hype2/IO/37988/test_dir/miranda_io.out.10018". 5) There was race condition between the thread "54770" and the thread "54766" for the dentry "test_dir": struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de) { struct dentry * new ; if (inode) { new = ll_find_alias(inode, de); if ( new ) { ll_dops_init( new , 1, 1); (5.1)===> d_rehash(de); (5.3)===> d_move( new , de); iput(inode); CDEBUG(D_DENTRY, "Reuse dentry %p inode %p refc %d flags %#x\n" , new , new ->d_inode, d_refcount( new ), new ->d_flags); return new ; } } ... } int ll_dcompare(struct dentry *parent, struct qstr *d_name, struct qstr *name) { ... (5.2)===> CDEBUG(D_DENTRY, "found name %.*s(%p) - flags %d/%x - refc %d\n" , name->len, name->name, dchild, d_mountpoint(dchild), dchild->d_flags & DCACHE_LUSTRE_INVALID, atomic_read(&dchild->d_count)); ... } 5.1) The thread "54770" found the former dentry "ffff8804302af900", and tried to reuse "ffff8804302af900". But firstly, it rehashed the negative dentry "ffff8804245c3800" before established the inode alias. 5.2) The thread "54766" found the negative dentry "ffff8804245c3800" in "__d_lookup()", which was just rehashed by the thread "54770". Since the negative dentry "ffff8804245c3800" is not marked as "DCACHE_LUSTRE_INVALID", the thread "54766" regarded it as valid entry for the "test_dir". 5.3) The thread "54770" called "d_move()" to establish relative inode alias. After that, the exported dentry for "test_dir" became the valid "ffff8804302af900". 00000080:00010000:6.0:1337981358.181811:0:54770:0:(namei.c:410:ll_lookup_it_finish()) setting l_data to inode ffff8803b184d3f8 (144237863481960083/33582994) ... 00000080:00002000:2.0:1337981358.181812:0:54766:0:(dcache.c:103:ll_dcompare()) found name 37988(ffff8804302af540) - flags 0/0 - refc 6 00000080:00200000:2.0:1337981358.181812:0:54766:0:(dcache.c:359:ll_revalidate_it()) VFS Op:name=37988,intent=0 00000080:00200000:2.0:1337981358.181813:0:54766:0:(file.c:2465:ll_inode_permission()) VFS Op:inode=144237863481960082/33582994(ffff8803b184d838), inode mode 41c0 mask 1 00000080:00002000:2.0:1337981358.181814:0:54766:0:(dcache.c:103:ll_dcompare()) found name test_dir(ffff8804245c3800) - flags 0/0 - refc 1 00000080:00002000:6.0:1337981358.181815:0:54770:0:(namei.c:378:ll_splice_alias()) Reuse dentry ffff8804302af900 inode ffff8803b184d3f8 refc 26 flags 0x4000000 00000080:00010000:6.0:1337981358.181817:0:54770:0:(dcache.c:324:ll_lookup_finish_locks()) setting l_data to inode ffff8803b184d3f8 (144237863481960083/33582994) 00000080:00010000:6.0:1337981358.181818:0:54770:0:(dcache.c:233:ll_intent_drop_lock()) releasing lock with cookie 0x829fa7d835b2db86 from it ffff880431f49b98 So here, we should not make the negative dentry "ffff8804245c3800" to be visible to others in "ll_splice_alias()". Either remove "rehash" or mark the dentry as "DCACHE_LUSTRE_INVALID". So need more suitable processing for that.
            yong.fan nasf (Inactive) added a comment - - edited

            Lai, there are some clew in the logs from Parkash:

            =============
            00000080:00200000:2.0:1337981358.181813:0:54766:0:(file.c:2465:ll_inode_permission()) VFS Op:inode=144237863481960082/33582994(ffff8803b184d838), inode mode 41c0 mask 100000080:00002000:2.0:1337981358.181814:0:54766:0:(dcache.c:103:ll_dcompare()) found name test_dir(ffff8804245c3800) - flags 0/0 - refc 1
            =============

            The thread "54766" is the just the thread to lookup "/p/lcrater2/swltest/SWL/Hype2/IO/37988/test_dir/miranda_io.out.10018". But when it tried to find the parent "test_dir", it got an invalid dentry "ffff8804245c3800", although it was not marked as "DCACHE_LUSTRE_INVALID". Because the valid "test_dir" should be "ffff8804302af900". So the the "d_inode" for such invalid dentry "ffff8804245c3800" should be NULL, then VFS path parse failed at:

            static int __link_path_walk(const char *name, struct nameidata *nd)
            {
            ...
                            /* This does the actual lookups.. */
                            err = do_lookup(nd, &this, &next);
                            if (err)
                                    break;
            
                            err = -ENOENT;
                            inode = next.dentry->d_inode;
                            if (!inode)
                                    goto out_dput;
            ...
            }
            

            That may be why Lustre did "NOT" report "ENOENT", but VFS did. We can follow this clew for further investigation.

            yong.fan nasf (Inactive) added a comment - - edited Lai, there are some clew in the logs from Parkash: ============= 00000080:00200000:2.0:1337981358.181813:0:54766:0:(file.c:2465:ll_inode_permission()) VFS Op:inode=144237863481960082/33582994(ffff8803b184d838), inode mode 41c0 mask 100000080:00002000:2.0:1337981358.181814:0:54766:0:(dcache.c:103:ll_dcompare()) found name test_dir(ffff8804245c3800) - flags 0/0 - refc 1 ============= The thread "54766" is the just the thread to lookup "/p/lcrater2/swltest/SWL/Hype2/IO/37988/test_dir/miranda_io.out.10018". But when it tried to find the parent "test_dir", it got an invalid dentry "ffff8804245c3800", although it was not marked as "DCACHE_LUSTRE_INVALID". Because the valid "test_dir" should be "ffff8804302af900". So the the "d_inode" for such invalid dentry "ffff8804245c3800" should be NULL, then VFS path parse failed at: static int __link_path_walk( const char *name, struct nameidata *nd) { ... /* This does the actual lookups.. */ err = do_lookup(nd, & this , &next); if (err) break ; err = -ENOENT; inode = next.dentry->d_inode; if (!inode) goto out_dput; ... } That may be why Lustre did "NOT" report "ENOENT", but VFS did. We can follow this clew for further investigation.

            I talked to the user, and the job that triggered the error was on it's first iteration in which it was opening and creating new files to write to. 'ls'-ing the directory shows that the file is indeed missing. My current guess is that an error in the pathname lookup is the root cause.

            prakash Prakash Surya (Inactive) added a comment - I talked to the user, and the job that triggered the error was on it's first iteration in which it was opening and creating new files to write to. 'ls'-ing the directory shows that the file is indeed missing. My current guess is that an error in the pathname lookup is the root cause.

            People

              laisiyao Lai Siyao
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: