Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5071

statahead.c:1704:do_statahead_enter()) ASSERTION( lli->u.d.d_sai == ((void *)0) ) failed:

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.5.1
    • None
    • 2
    • 14002

    Description

      Hello,

      We are seeing following error message on Lustre 2.5.1 clients, and it makes the system not responsive. multiple clients were affected with this issue.

      System Details: Lustre 2.5.1 / RHEL 6.5

      Here are the node names, time stamps and one according message:
      May 4 11:03:28 uc1n055 kernel: LustreError: 1979:0:(statahead.c:1704:do_statahead_enter()) ASSERTION( lli->u.d.d_sai == ((void *)0) ) failed:
      May 4 18:15:43 uc1n468 kernel: LustreError: 42888:0:(statahead.c:1704:do_statahead_enter()) ASSERTION( lli->u.d.d_sai == ((void *)0) ) failed:
      May 4 18:54:19 uc1n059 kernel: LustreError: 111650:0:(lovsub_lock.c:103:lovsub_lock_state()) ASSERTION( cl_lock_is_mutexed(slice->cls_lock) ) failed:
      May 9 09:21:08 uc1n129 kernel: LustreError: 93767:0:(statahead.c:1704:do_statahead_enter()) LBUG
      May 10 09:28:14 uc1n996 kernel: LustreError: 7387:0:(osc_lock.c:1224:osc_lock_wait()) LBUG
      May 15 07:50:57 uc1n198 kernel: LustreError: 25007:0:(statahead.c:1704:do_statahead_enter()) ASSERTION( lli->u.d.d_sai == ((void *)0) ) failed:

      Attachments

        1. ddn_lustre_showall-uc1n996_2014-05-15_192030.tar.bz2
          206 kB
        2. messages_uc1n055
          22 kB
        3. messages_uc1n059
          39 kB
        4. messages_uc1n129
          11 kB
        5. messages_uc1n198
          174 kB
        6. messages_uc1n468
          31 kB

        Issue Links

          Activity

            [LU-5071] statahead.c:1704:do_statahead_enter()) ASSERTION( lli->u.d.d_sai == ((void *)0) ) failed:
            haasken Ryan Haasken added a comment -

            Not a lot of information here to go on. The assertion which was triggered looks like the same one as in LU-1356, but that bug was fixed way back in 2.3.0 and 2.1.4.

            haasken Ryan Haasken added a comment - Not a lot of information here to go on. The assertion which was triggered looks like the same one as in LU-1356 , but that bug was fixed way back in 2.3.0 and 2.1.4.

            In regards to the above comment...

            Does the above issue is fixed on 2.5.2, or still its a LBUG. Our customer saw the message once in the log.

            May 10 09:28:14 uc1n996 kernel: LustreError:7387:0:(osc_lock.c:1224:osc_lock_wait()) LBUG
            May 10 09:28:14 uc1n996 kernel: Pid: 7387, comm: less

            rganesan@ddn.com Rajeshwaran Ganesan added a comment - In regards to the above comment... Does the above issue is fixed on 2.5.2, or still its a LBUG. Our customer saw the message once in the log. May 10 09:28:14 uc1n996 kernel: LustreError:7387:0:(osc_lock.c:1224:osc_lock_wait()) LBUG May 10 09:28:14 uc1n996 kernel: Pid: 7387, comm: less
            haasken Ryan Haasken added a comment -

            Zhenyu has identified two of the LBUGs as LU-3498 and LU-4558, and both of those bugs are fixed in b2_5 and master. Since the LBUG which is in the summary of this ticket has been fixed, should this bug be resolved?

            I suppose there is still this LBUG:

            May 10 09:28:14 uc1n996 kernel: LustreError: 7387:0:(osc_lock.c:1224:osc_lock_wait()) LBUG 
            

            But without any information other than the location of the LBUG, I think this bug isn't helpful. There is no information about that LBUG in any of the attachments either, as far as I can tell. If the bug will be kept open for the osc_lock_wait() LBUG, would it be possible to update the summary and description so that it doesn't look like LU-3498?

            haasken Ryan Haasken added a comment - Zhenyu has identified two of the LBUGs as LU-3498 and LU-4558 , and both of those bugs are fixed in b2_5 and master. Since the LBUG which is in the summary of this ticket has been fixed, should this bug be resolved? I suppose there is still this LBUG: May 10 09:28:14 uc1n996 kernel: LustreError: 7387:0:(osc_lock.c:1224:osc_lock_wait()) LBUG But without any information other than the location of the LBUG, I think this bug isn't helpful. There is no information about that LBUG in any of the attachments either, as far as I can tell. If the bug will be kept open for the osc_lock_wait() LBUG, would it be possible to update the summary and description so that it doesn't look like LU-3498 ?
            pjones Peter Jones added a comment -

            Rajesh

            These are included by default. For example, http://review.whamcloud.com/#/c/10363/ has a link to the build on the Jenkins server http://build.whamcloud.com/job/lustre-reviews/23961/ Selecting the desired distro version allows you to drill into specific build artifacts - http://build.whamcloud.com/job/lustre-reviews/23961/arch=i686,build_type=server,distro=el6,ib_stack=inkernel/artifact/artifacts/ , say.

            Peter

            pjones Peter Jones added a comment - Rajesh These are included by default. For example, http://review.whamcloud.com/#/c/10363/ has a link to the build on the Jenkins server http://build.whamcloud.com/job/lustre-reviews/23961/ Selecting the desired distro version allows you to drill into specific build artifacts - http://build.whamcloud.com/job/lustre-reviews/23961/arch=i686,build_type=server,distro=el6,ib_stack=inkernel/artifact/artifacts/ , say. Peter

            Could you please provide source RPM with the patches?

            rganesan@ddn.com Rajeshwaran Ganesan added a comment - Could you please provide source RPM with the patches?
            bobijam Zhenyu Xu added a comment -

            the lovsub_lock_state() LBUG was fixed in b2_5 branch, the patch is at http://review.whamcloud.com/9881

            bobijam Zhenyu Xu added a comment - the lovsub_lock_state() LBUG was fixed in b2_5 branch, the patch is at http://review.whamcloud.com/9881
            bobijam Zhenyu Xu added a comment -

            the do_statahead_enter() LBUG can be cured by this back port patch http://review.whamcloud.com/10363

            bobijam Zhenyu Xu added a comment - the do_statahead_enter() LBUG can be cured by this back port patch http://review.whamcloud.com/10363

            Servers are in 2.4.3
            Clients are in 2.5.1

            rganesan@ddn.com Rajeshwaran Ganesan added a comment - Servers are in 2.4.3 Clients are in 2.5.1
            pjones Peter Jones added a comment -

            Rajesh

            Could you please confirm that it is vanilla 2.5.1 on both servers and clients for this cluster? Are any other Lustre versions or patches involved?

            Bobijam

            Does this seem related to existing tickets LU-4797/4693/4558?

            Thanks

            Peter

            pjones Peter Jones added a comment - Rajesh Could you please confirm that it is vanilla 2.5.1 on both servers and clients for this cluster? Are any other Lustre versions or patches involved? Bobijam Does this seem related to existing tickets LU-4797 /4693/4558? Thanks Peter

            People

              bobijam Zhenyu Xu
              rganesan@ddn.com Rajeshwaran Ganesan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: