Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3140

fid sequence may reused for different objects.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.1.0, Lustre 2.2.0, Lustre 2.3.0, Lustre 2.4.0
    • None
    • single node setup.
    • 3
    • 7624

    Description

      While investing a Cray issues we create a debug patch to verify a correct mode assigned to the md object after revalidating, but result is very confusing.

      same fid assigned to the different files.

      00000004:00000002:2.0:1365510202.512393:0:19372:0:(mdt_reint.c:281:mdt_md_create()) @@@ Create (bdir->[0x200000bd0:0x1:0x0]) in [0x200000400:0x7:0x0] req@ffff880036324928 x1431841008517781/t0(17179869191) o36->1e287f29-a399-fb10-7bd0-d9d83ebc1ce1@0@lo:0/0 lens 456/536 e 0 to 0 dl 1365510223 ref 1 fl Complete:/4/0 rc 0/0

      00000004:00000002:6.0:1365510375.912867:0:23947:0:(mdt_reint.c:281:mdt_md_create()) @@@ Create (f9-2->[0x200000bd0:0x1:0x0]) in [0x61ab:0x54d71902:0x0] req@ffff88002a88c008 x1431841008518517/t0(0) o36->e35c9b0d-1221-a3ae-0357-637f30903536@0@lo:0/0 lens 456/536 e 0 to 0 dl 1365510395 ref 1 fl Interpret:/0/0 rc 0/0

      as you see - bdir and f9-2 have a same fid.

      replicating a very easy
      1) apply an patch (to hit assert and panic) or dump log after test loop.

      bash-3.2$ git diff
      diff --git a/lustre/llite/llite_lib.c b/lustre/llite/llite_lib.c
      index 7dad8f9..c40cfd2 100644
      --- a/lustre/llite/llite_lib.c
      +++ b/lustre/llite/llite_lib.c
      @@ -1654,6 +1654,20 @@ void ll_update_inode(struct inode *inode, struct lustre_md *md)
               struct ll_sb_info *sbi = ll_i2sbi(inode);
       
               LASSERT ((lsm != NULL) == ((body->valid & OBD_MD_FLEASIZE) != 0));
      +        if (body->valid & OBD_MD_FLTYPE) {
      +               if ((!S_ISDIR(inode->i_mode) && S_ISDIR(body->mode)) ||
      +                   (S_ISDIR(inode->i_mode) && !S_ISDIR(body->mode))) {
      +//             if ((inode->i_mode & S_IFMT) != (body->mode & S_IFMT)) {
      +                       CERROR("ino %p->%lu type is %s but server reply has %s "
      +                              "type (%o and %o). FID "DFID"\n", inode, inode->i_ino,
      +                              S_ISDIR(inode->i_mode) ? "directory" :"file",
      +                              S_ISDIR(body->mode) ? "directory" : "file",
      +                              inode->i_mode & S_IFMT, body->mode & S_IFMT,
      +                              PFID(&lli->lli_fid));
      +                       LBUG();
      +               }
      +       }
      +
               if (lsm != NULL) {
                       LASSERT(S_ISREG(inode->i_mode));
      

      2) run PTLDEBUG=-1 SUBSYSTEM=-1 DEBUG_SIZE=400 llmount.sh to setup a single node lustre cluster

      3) run a replay-dual with simple script.
      for ((i=0; i!=1000; i++)) do
      for ((j=1; j!=10; j++)) do
      echo "Iteration $i"
      SLOW=YES ONLY=$j DIR=/mnt/lustre sh ./replay-dual.sh
      done
      done

      in 90% times you have hit a bug after first loop.

      Attachments

        Activity

          People

            tappro Mikhail Pershin
            shadow Alexey Lyashkov
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: