Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4584

Lock revocation process fails consistently

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • None
    • 3
    • 12530

    Description

      Some users have reported to us that the "rm" command is taking a long time. Some investigation revealed that at least the first "rm" in a directory takes just over 100 seconds, which of course sounds like OBD_TIMEOUT_DEFAULT.

      This isn't necessarily the simplest reproducer, but the following reproducer is completely consistent:

      1. set directory striping default count to 48
      2. touch a file on client A
      3. rm file on client B

      The clients are running 2.4.0-19chaos, servers are at 2.4.0-21chaos. The servers are using zfs as the backend.

      I have some lustre logs that I will share and talk about in additional posts to this ticket. But essentially it looks like the server always times out on a AST to client A (explaining the 100 second delay). It is not really clear yet to me why that happens, because client A appears to be completely responsive. My current suspicion is the the MDT is to blame.

      Attachments

        1. 172.16.66.4@tcp.log.bz2
          40 kB
        2. 172.16.66.5@tcp.log.bz2
          53 kB
        3. 172.20.20.201@o2ib500.log.bz2
          8.52 MB
        4. client_log_20140206.txt
          375 kB
        5. inflames.log
          2.40 MB

        Issue Links

          Activity

            [LU-4584] Lock revocation process fails consistently
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-5686 [ LU-5686 ]
            pjones Peter Jones made changes -
            Link Original: This issue is duplicated by HP-163 [ HP-163 ]
            pjones Peter Jones made changes -
            Link New: This issue is duplicated by HP-163 [ HP-163 ]
            adilger Andreas Dilger made changes -
            Resolution New: Cannot Reproduce [ 5 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]

            No updates in ticket since sites have upgraded to newer releases.

            adilger Andreas Dilger added a comment - No updates in ticket since sites have upgraded to newer releases.
            pjones Peter Jones made changes -
            End date New: 02/Nov/15
            Start date New: 05/Feb/14
            pjones Peter Jones made changes -
            Link New: This issue is related to NTAP-82 [ NTAP-82 ]
            pjones Peter Jones made changes -
            Link Original: This issue is related to LDEV-38 [ LDEV-38 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to JFC-12 [ JFC-12 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to LDEV-38 [ LDEV-38 ]

            People

              bfaccini Bruno Faccini (Inactive)
              morrone Christopher Morrone (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: