Details

    • 3
    • 11583

    Description

      OSS crashed with Projection Fault at lqe64_hash_keycmp

      See attached files.
      Looks like we are crashing here

      libcfs/libcfs/hash.c
      static cfs_hlist_node_t *
      cfs_hash_bd_lookup_intent(cfs_hash_t *hs, cfs_hash_bd_t *bd,
      .
      .
      .
      cfs_hlist_for_each(ehnode, hhead) { <<<<<<<< CRASH
      if (!cfs_hash_keycmp(hs, key, ehnode))
      continue;

      cfs_hlist_for_each defined as hlist_for_each but hlist_for_each doesn't seem to be defined any where.

      See attached backtrace info.

      Attachments

        1. debug.tgz
          344 kB
        2. service188.nov13.2013.tgz
          11 kB
        3. syslog.tgz
          54 kB

        Issue Links

          Activity

            [LU-4249] exception RIP: lqe64_hash_keycmp+12

            Do we need to undo #9833 (the debug patch) and apply #11019?
            Please advise.

            Yes, that debug patch a little bit heavier than the former one (but it provides more detailed information), it wasn't supposed to be carried in production system all the time. Thanks.

            niu Niu Yawei (Inactive) added a comment - Do we need to undo #9833 (the debug patch) and apply #11019? Please advise. Yes, that debug patch a little bit heavier than the former one (but it provides more detailed information), it wasn't supposed to be carried in production system all the time. Thanks.
            pjones Peter Jones added a comment -

            Well, even without this happening it would still be quite possible that the release you baseline on lags behind the tip of the maintenance branch. If such a situation arises where there are multiple "floating" patches that affect the same area then we can always assist in resolving the conflict. I would think that introducing patch dependencies should make this easier though. It's probably simpler to talk about this on the next call than back and forth in a ticket

            pjones Peter Jones added a comment - Well, even without this happening it would still be quite possible that the release you baseline on lags behind the tip of the maintenance branch. If such a situation arises where there are multiple "floating" patches that affect the same area then we can always assist in resolving the conflict. I would think that introducing patch dependencies should make this easier though. It's probably simpler to talk about this on the next call than back and forth in a ticket

            But wouldn't that cause your engineers who work in future patches have different code base from ours with accumulated b2_4/b2_5 patches?

            jaylan Jay Lan (Inactive) added a comment - But wouldn't that cause your engineers who work in future patches have different code base from ours with accumulated b2_4/b2_5 patches?
            pjones Peter Jones added a comment -

            I will leave it to Niu to comment definitively about whether the debug patch needs to be removed but my understanding is that #11019 is intended to be the fix for b2_4. The fixes for maintenance branches will be landed when a release is scheduled for those branches.

            pjones Peter Jones added a comment - I will leave it to Niu to comment definitively about whether the debug patch needs to be removed but my understanding is that #11019 is intended to be the fix for b2_4. The fixes for maintenance branches will be landed when a release is scheduled for those branches.

            Do we need to undo #9833 (the debug patch) and apply #11019?
            Please advise.

            Also please land #11019 and #11020.

            jaylan Jay Lan (Inactive) added a comment - Do we need to undo #9833 (the debug patch) and apply #11019? Please advise. Also please land #11019 and #11020.
            pjones Peter Jones added a comment -

            Landed for 2.7

            pjones Peter Jones added a comment - Landed for 2.7
            niu Niu Yawei (Inactive) added a comment - - edited b2_4: http://review.whamcloud.com/11019 b2_5: http://review.whamcloud.com/11020
            niu Niu Yawei (Inactive) added a comment - http://review.whamcloud.com/10988

            Hi, Ihara, the ENOLCK error message problem is addressed in another ticket, please see LU-4920.

            This debug patch is to collect information on lqe refcount when system crash on "exception RIP: ...", please keep the debug patch applied until the crash problem reproduced. Thanks.

            niu Niu Yawei (Inactive) added a comment - Hi, Ihara, the ENOLCK error message problem is addressed in another ticket, please see LU-4920 . This debug patch is to collect information on lqe refcount when system crash on "exception RIP: ...", please keep the debug patch applied until the crash problem reproduced. Thanks.

            Niu, we applied patches, and then, same error messages have been showing up. Attahced are recent debug log and syslog messages after applies patches.

            ihara Shuichi Ihara (Inactive) added a comment - Niu, we applied patches, and then, same error messages have been showing up. Attahced are recent debug log and syslog messages after applies patches.

            People

              niu Niu Yawei (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: