Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16806

mkdir: cannot create directory ‘test5’: Object is remote

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.14.0, Lustre 2.15.2
    • server (garter):
      toss 4.5-6rc6
      4.18.0-425.19.2.1toss.t4.x86_64
      2.14.0_19.llnl

      client (mutt):
      toss 4.5-6
      4.18.0-425.19.2.1toss.t4.x86_64
      2.15.2_5.llnl
    • 3
    • 9223372036854775807

    Description

      Attempts to create directories are intermittently failing when using "mkdir".

      So far, "lfs mkdir" succeeds every time, so does file creation using "touch".

       

      Attachments

        Activity

          [LU-16806] mkdir: cannot create directory ‘test5’: Object is remote

          The patch fixed the issue on 2.14.

          I'm a bit confused by what the patch is doing.

          Is the idea that when (lds->lds_dir_def_striping_set == 0) that means that the parent directory doesn't have default striping set for its sub-directories, so when the sub-directory is created it should instead inherit from the root of the whole filesystem?

           

          defazio Gian-Carlo Defazio added a comment - The patch fixed the issue on 2.14. I'm a bit confused by what the patch is doing. Is the idea that when (lds->lds_dir_def_striping_set == 0) that means that the parent directory doesn't have default striping set for its sub-directories, so when the sub-directory is created it should instead inherit from the root of the whole filesystem?  
          laisiyao Lai Siyao added a comment -

          Yes, upgrading to 2.15.x can get around this issue since related code has been removed.

          laisiyao Lai Siyao added a comment - Yes, upgrading to 2.15.x can get around this issue since related code has been removed.
          pjones Peter Jones added a comment -

          So LLNL's exposure to this issue will disappear once they upgrade to 2.15.x?

          pjones Peter Jones added a comment - So LLNL's exposure to this issue will disappear once they upgrade to 2.15.x?
          laisiyao Lai Siyao added a comment -

          This is caused by a bug in 2.14 code, but it's not an issue in the following release because related code has been removed.

          laisiyao Lai Siyao added a comment - This is caused by a bug in 2.14 code, but it's not an issue in the following release because related code has been removed.

          "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51045
          Subject: LU-16806 lod: access DMV when it's valid
          Project: fs/lustre-release
          Branch: b2_14
          Current Patch Set: 1
          Commit: 65887f6728d68d8b2e10969df25e5303603a4797

          gerrit Gerrit Updater added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51045 Subject: LU-16806 lod: access DMV when it's valid Project: fs/lustre-release Branch: b2_14 Current Patch Set: 1 Commit: 65887f6728d68d8b2e10969df25e5303603a4797

          I've added mkdir-attempt-client-and-mds.tar.gz which has debug logs for the client and mds's with debug=-1.

          defazio Gian-Carlo Defazio added a comment - I've added mkdir-attempt-client-and-mds.tar.gz which has debug logs for the client and mds's with debug=-1.
          laisiyao Lai Siyao added a comment -

          Could you collect MDS debug logs upon failure? I'm afraid some patch for backward compatibility is missing, maybe it's on server side.

          laisiyao Lai Siyao added a comment - Could you collect MDS debug logs upon failure? I'm afraid some patch for backward compatibility is missing, maybe it's on server side.

          Hi Serguei or Peter,

          Can you ask an appropriate Whamcloud-er to advise?  The two test clusters involved will be taken away from us for other testing on Tuesday or Wednesday of next week.

          thanks

          ofaaland Olaf Faaland added a comment - Hi Serguei or Peter, Can you ask an appropriate Whamcloud-er to advise?  The two test clusters involved will be taken away from us for other testing on Tuesday or Wednesday of next week. thanks

          It looks like this one needs attention of experts in higher-level Lustre. 

          For example, is LUDOC-289 relevant here?

          ssmirnov Serguei Smirnov added a comment - It looks like this one needs attention of experts in higher-level Lustre.  For example, is  LUDOC-289 relevant here?
          pjones Peter Jones added a comment -

          Serguei

          Can you please advise?

          Thanks

          Peter

          pjones Peter Jones added a comment - Serguei Can you please advise? Thanks Peter

          I've uploaded debug logs for the attempted creation of the directory junk8

          debug log which contains failed attempt for junk8 is
          dk.mutt12.1683589208

          It then succeeded (after failing about 10 times) in
          dk.mutt12.1683589684

          defazio Gian-Carlo Defazio added a comment - I've uploaded debug logs for the attempted creation of the directory junk8 debug log which contains failed attempt for junk8 is dk.mutt12.1683589208 It then succeeded (after failing about 10 times) in dk.mutt12.1683589684

          People

            laisiyao Lai Siyao
            defazio Gian-Carlo Defazio
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: