Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11857

repeated "could not delete orphan [0x200060151:0x38a8:0x0]: rc = -2" messages

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      After upgrade from 2.9 to 2.12 the MDS syslog gets beaucoup error messages lfs-MDD0000 cannot delete orphan messages. On the active MDS the orph_lfs-MDD00 process is pegged at 100%

      [Sat Jan 12 23:21:18 2019] LustreError: 14125:0:(mdd_orphans.c:327:mdd_orphan_destroy()) lfs-MDD0000: could not delete orphan [0x200060151:0x38a8:0x0]: rc = -2
      [Sat Jan 12 23:21:18 2019] LustreError: 14125:0:(mdd_orphans.c:327:mdd_orphan_destroy()) Skipped 8067628 previous similar messages
      [Sat Jan 12 23:31:18 2019] LustreError: 14125:0:(mdd_orphans.c:327:mdd_orphan_destroy()) lfs-MDD0000: could not delete orphan [0x200060151:0x38a8:0x0]: rc = -2
      [Sat Jan 12 23:31:18 2019] LustreError: 14125:0:(mdd_orphans.c:327:mdd_orphan_destroy()) Skipped 7958773 previous similar messages
      [Sat Jan 12 23:41:18 2019] LustreError: 14125:0:(mdd_orphans.c:327:mdd_orphan_destroy()) lfs-MDD0000: could not delete orphan [0x200060151:0x38a8:0x0]: rc = -2
      

      Attachments

        Issue Links

          Activity

            [LU-11857] repeated "could not delete orphan [0x200060151:0x38a8:0x0]: rc = -2" messages
            pjones Peter Jones added a comment -

            No objections it seems

            pjones Peter Jones added a comment - No objections it seems
            pjones Peter Jones added a comment -

            So, it seems like this is believed to be a duplicate of the recently landed LU-11418 fix. Jeff, if this issue is not currently causing MSU any heartburn and explicitly trying to prove/disprove whether the theory is sound would be disruptive, is it enough to close this ticket as a duplicate of LU-11418 and reopen it if it is seen on a release including the fix (2.13, or an upcoming 2.12.x maintenance releases)

            pjones Peter Jones added a comment - So, it seems like this is believed to be a duplicate of the recently landed LU-11418 fix. Jeff, if this issue is not currently causing MSU any heartburn and explicitly trying to prove/disprove whether the theory is sound would be disruptive, is it enough to close this ticket as a duplicate of LU-11418 and reopen it if it is seen on a release including the fix (2.13, or an upcoming 2.12.x maintenance releases)

            The patch has passed review and testing and is scheduled to land in 2.13 shortly. This should avoid the repeated attempts to destroy the same object.

            adilger Andreas Dilger added a comment - The patch has passed review and testing and is scheduled to land in 2.13 shortly. This should avoid the repeated attempts to destroy the same object.

            I can. This is a production LFS. Given that there is data in place, should I? Not arguing, just applying caution and respect for end user's data.

            aeonjeffj Jeff Johnson (Inactive) added a comment - I can. This is a production LFS. Given that there is data in place, should I? Not arguing, just applying caution and respect for end user's data.
            bzzz Alex Zhuravlev added a comment - aeonjeffj can you please try https://review.whamcloud.com/#/c/33661/ ?
            bzzz Alex Zhuravlev added a comment - - edited
            bzzz Alex Zhuravlev added a comment - - edited I think https://review.whamcloud.com/#/c/33661/ should help

            After seven hours the orph_lfs-MDD00 thread still pegged at 100%

            Attaching orph_100pct_201901142137.txt.bz2

            aeonjeffj Jeff Johnson (Inactive) added a comment - After seven hours the  orph_lfs-MDD00 thread still pegged at 100% Attaching orph_100pct_201901142137.txt.bz2

            In 13 minutes the mdd_orphans.c:327:mdd_orphan_destroy have incremented quantity from 1979094 to 8056841  (6077747 syslog msg repeats in 13 minutes).

             

            aeonjeffj Jeff Johnson (Inactive) added a comment - In 13 minutes the  mdd_orphans.c:327:mdd_orphan_destroy  have incremented quantity from 1979094 to 8056841  (6077747 syslog msg repeats in 13 minutes).  
            aeonjeffj Jeff Johnson (Inactive) added a comment - - edited

            Update:  Performed a full shutdown of the file system and all server systems.  Rebooted and performed orderly start of the file system.

            The reported orph_lfs-MDD00 thread at 100% CPU and

            syslog entries of 

            LustreError: 109872:0:(mdd_orphans.c:327:mdd_orphan_destroy()) ls15-MDD0000: could not delete orphan [0x200060151:0x38a8:0x0] 

            continue to occur. After 15 minutes of wall time with the file system targets mounted the errors persist.

            Let me know if you want me to upload another dump of the debug buffer.

            aeonjeffj Jeff Johnson (Inactive) added a comment - - edited Update:  Performed a full shutdown of the file system and all server systems.  Rebooted and performed orderly start of the file system. The reported  orph_lfs-MDD00 thread at 100% CPU and syslog entries of  LustreError: 109872:0:(mdd_orphans.c:327:mdd_orphan_destroy()) ls15-MDD0000: could not delete orphan [0x200060151:0x38a8:0x0] continue to occur. After 15 minutes of wall time with the file system targets mounted the errors persist. Let me know if you want me to upload another dump of the debug buffer.
            pjones Peter Jones added a comment -

            Alex

            Could you please investigate?

            Thanks

            Peter

            pjones Peter Jones added a comment - Alex Could you please investigate? Thanks Peter

            People

              bzzz Alex Zhuravlev
              aeonjeffj Jeff Johnson (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: