Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.4.3
    • None
    • 3
    • 15976

    Description

      lfs getstripe on fifo files hangs on 'open' system call. This worked in 2.1.x version.

      mhanafi@pfe23:/nobackupp8/mhanafi> rm testfifo
      mhanafi@pfe23:/nobackupp8/mhanafi> mkfifo testfifo
      mhanafi@pfe23:/nobackupp8/mhanafi> strace lfs getstripe testfifo
      execve("/usr/bin/lfs", ["lfs", "getstripe", "testfifo"], [/* 35 vars */]) = 0
      brk(0)                                  = 0x6c1000
      mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fffedb02000
      access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
      open("/etc/ld.so.cache", O_RDONLY)      = 3
      fstat(3, {st_mode=S_IFREG|0644, st_size=270183, ...}) = 0
      mmap(NULL, 270183, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fffedac0000
      close(3)                                = 0
      open("/lib64/libpthread.so.0", O_RDONLY) = 3
      read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200Z\0\0\0\0\0\0"..., 832) = 832
      fstat(3, {st_mode=S_IFREG|0755, st_size=135764, ...}) = 0
      mmap(NULL, 2212784, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fffed6c9000
      fadvise64(3, 0, 2212784, POSIX_FADV_WILLNEED) = 0
      mprotect(0x7fffed6e0000, 2097152, PROT_NONE) = 0
      mmap(0x7fffed8e0000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7fffed8e0000
      mmap(0x7fffed8e2000, 13232, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fffed8e2000
      close(3)                                = 0
      open("/lib64/libc.so.6", O_RDONLY)      = 3
      read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\355\1\0\0\0\0\0"..., 832) = 832
      fstat(3, {st_mode=S_IFREG|0755, st_size=1775524, ...}) = 0
      mmap(NULL, 3639480, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fffed350000
      fadvise64(3, 0, 3639480, POSIX_FADV_WILLNEED) = 0
      mprotect(0x7fffed4c0000, 2093056, PROT_NONE) = 0
      mmap(0x7fffed6bf000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16f000) = 0x7fffed6bf000
      mmap(0x7fffed6c4000, 18616, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fffed6c4000
      close(3)                                = 0
      mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fffedabf000
      mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fffedabe000
      mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fffedabd000
      arch_prctl(ARCH_SET_FS, 0x7fffedabe700) = 0
      mprotect(0x7fffed6bf000, 16384, PROT_READ) = 0
      mprotect(0x7fffed8e0000, 4096, PROT_READ) = 0
      mprotect(0x66a000, 4096, PROT_READ)     = 0
      mprotect(0x7fffedb04000, 4096, PROT_READ) = 0
      munmap(0x7fffedac0000, 270183)          = 0
      set_tid_address(0x7fffedabe9d0)         = 61077
      set_robust_list(0x7fffedabe9e0, 0x18)   = 0
      futex(0x7fffffffe7dc, FUTEX_WAKE_PRIVATE, 1) = 0
      futex(0x7fffffffe7dc, 0x189 /* FUTEX_??? */, 1, NULL, 7fffedabe700) = -1 EAGAIN (Resource temporarily unavailable)
      rt_sigaction(SIGRTMIN, {0x7fffed6ce8f0, [], SA_RESTORER|SA_SIGINFO, 0x7fffed6d8810}, NULL, 8) = 0
      rt_sigaction(SIGRT_1, {0x7fffed6ce980, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7fffed6d8810}, NULL, 8) = 0
      rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
      getrlimit(RLIMIT_STACK, {rlim_cur=300000*1024, rlim_max=7340032*1024}) = 0
      shmget(IPC_PRIVATE, 65680, 0600)        = 438992901
      shmat(438992901, 0, 0)                  = ?
      shmctl(438992901, IPC_RMID, 0)          = 0
      brk(0)                                  = 0x6c1000
      brk(0x6e2000)                           = 0x6e2000
      open("testfifo", O_RDONLY^
      

      Attachments

        Activity

          [LU-5704] lfs getstripe hangs on fifo files
          pjones Peter Jones added a comment -

          Landed for 2.9

          pjones Peter Jones added a comment - Landed for 2.9

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19039/
          Subject: LU-5704 utils: stop open hangs on fifo files
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: fab6073165dfbd107f54057e33363c77978af6cb

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19039/ Subject: LU-5704 utils: stop open hangs on fifo files Project: fs/lustre-release Branch: master Current Patch Set: Commit: fab6073165dfbd107f54057e33363c77978af6cb

          Thanks for clarification, Bob.

          jaylan Jay Lan (Inactive) added a comment - Thanks for clarification, Bob.

          Jay,
          At this point I believe my worry was unfounded. That comment was from before I created and submitted my patch. It has since undergone a complete set of review tests as well as passed inspection by reviewers.

          I think it's safe.

          bogl Bob Glossman (Inactive) added a comment - Jay, At this point I believe my worry was unfounded. That comment was from before I created and submitted my patch. It has since undergone a complete set of review tests as well as passed inspection by reviewers. I think it's safe.

          Bob, if you are worried that this patch "will cause bad side effects," we are not comfortable applying this patch to our production systems.

          jaylan Jay Lan (Inactive) added a comment - Bob, if you are worried that this patch "will cause bad side effects," we are not comfortable applying this patch to our production systems.

          Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/19039
          Subject: LU-5704 utils: stop open hangs on fifo files
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: bbb6139de7761d43a5b86e1ee14550d5f7ba2c02

          gerrit Gerrit Updater added a comment - Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/19039 Subject: LU-5704 utils: stop open hangs on fifo files Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bbb6139de7761d43a5b86e1ee14550d5f7ba2c02
          bogl Bob Glossman (Inactive) added a comment - - edited

          'lfs getstripe' doesn't call llapi_file_get_stripe() at all. It does however do a different call sequence that has an open() call with only O_RDONLY in the flags.

          I can work up a patch to add O_NONBLOCK, but this particular call sequence is used for so many other things too I'm worried that doing so will cause bad side effects.

          bogl Bob Glossman (Inactive) added a comment - - edited 'lfs getstripe' doesn't call llapi_file_get_stripe() at all. It does however do a different call sequence that has an open() call with only O_RDONLY in the flags. I can work up a patch to add O_NONBLOCK, but this particular call sequence is used for so many other things too I'm worried that doing so will cause bad side effects.

          The reason that those other tools do not hang when opening a FIFO is because they pass open flags to prevent this.

          It appears that our open() call in llapi_file_get_stripe() is only passing O_RDONLY and it should also include O_NONBLOCK to avoid exactly this problem.

          adilger Andreas Dilger added a comment - The reason that those other tools do not hang when opening a FIFO is because they pass open flags to prevent this. It appears that our open() call in llapi_file_get_stripe() is only passing O_RDONLY and it should also include O_NONBLOCK to avoid exactly this problem.

          the fact that getfattr/getfacl/stat on a fifo don't hang isn't relevant IMHO. you can do all those on a fifo in lustre too.

          lfs is a lustre specific tool. it does open on it's target(s) prior to doing anything else. I strongly disagree that it's a bug for it to hang in certain unlikely situations, like for instance trying to do lustre related stripe operations on files where that concept doesn't apply. a read of a fifo that nobody ever writes to will hang in precisely the same way.

          bogl Bob Glossman (Inactive) added a comment - the fact that getfattr/getfacl/stat on a fifo don't hang isn't relevant IMHO. you can do all those on a fifo in lustre too. lfs is a lustre specific tool. it does open on it's target(s) prior to doing anything else. I strongly disagree that it's a bug for it to hang in certain unlikely situations, like for instance trying to do lustre related stripe operations on files where that concept doesn't apply. a read of a fifo that nobody ever writes to will hang in precisely the same way.

          Regardless of whether this ever worked, it's still a bug. This isn't "normal behavior" of any command that gets file attributes. I can do a stat on a fifo without hanging. I can do a getfattr on a fifo without hanging. I can do a getfacl on a fifo without hanging. Getting stripe info is no different than these. The implementation is flawed if it hangs and could be easily fixed with a fifo check that immediately prints the exact message shown after writing to the pipe. I can't imagine the fix would take more than a few minutes?!?

          kolano Paul Kolano (Inactive) added a comment - Regardless of whether this ever worked, it's still a bug. This isn't "normal behavior" of any command that gets file attributes. I can do a stat on a fifo without hanging. I can do a getfattr on a fifo without hanging. I can do a getfacl on a fifo without hanging. Getting stripe info is no different than these. The implementation is flawed if it hangs and could be easily fixed with a fifo check that immediately prints the exact message shown after writing to the pipe. I can't imagine the fix would take more than a few minutes?!?

          People

            bogl Bob Glossman (Inactive)
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: