Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4555

Patched (LU-2779) 2.4.1 Lustre Clients still crashing with LBUG

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.4.1
    • None
    • 2
    • 12443

    Description

      Customer has applied patch from LU-2779 to 2.4.1 clients.

      We've verified the patch has been applied:

      diff -ru /usr/src/lustre-2.4.1/lustre/osc/osc_cache.c ./lustre/osc/osc_cache.c
      — /usr/src/lustre-2.4.1/lustre/osc/osc_cache.c 2013-09-19 11:06:59.000000000 -0700
      +++ ./lustre/osc/osc_cache.c 2013-12-18 06:52:09.000000000 -0800
      @@ -896,7 +896,7 @@
      "%s: wait ext to %d timedout, recovery in progress?\n",
      osc_export(obj)>exp_obd>obd_name, state);

      • lwi = LWI_INTR(LWI_ON_SIGNAL_NOOP, NULL);
        + lwi = LWI_INTR(NULL, NULL);
        rc = l_wait_event(ext->oe_waitq, extent_wait_cb(ext, state),
        &lwi);
        }

      And, the client RPMs have been rebuilt and installed. However, the Lustre clients are still failing with the following error:

      LustreError: 17450:0:(cl_lock.c:1964:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageWriteback(cl_page_vmpage(env, page)))) ) failed:
      LustreError: 17450:0:(cl_lock.c:1964:discard_cb()) LBUG Kernel panic - not syncing: LBUG
      Pid: 17450, comm: tar Tainted: GF

      The patched RPM is attached. Please advise.

      Attachments

        Issue Links

          Activity

            [LU-4555] Patched (LU-2779) 2.4.1 Lustre Clients still crashing with LBUG

            We have the opinion that the 2.4.3 patch has fixed this problem.
            ~ jfc.

            jfc John Fuchs-Chesney (Inactive) added a comment - - edited We have the opinion that the 2.4.3 patch has fixed this problem. ~ jfc.
            bfaccini Bruno Faccini (Inactive) added a comment - - edited

            Here are some news/updates for this ticket after the conf-call :
            _ it has been agreed that further update for this problem at AWE will be in this ticket and no longer in LU-4581 for LLNL.
            _ there are no known recent occurrence of this problem at AWE.
            _ as already requested in LU-4581, customer is running with D_CACHE traces enabled on their Clients.
            _ debug buffer size has also been increased (exact value to be provided).
            _ a crash-dump will be taken and available for debugging upon next occurrence.
            _ would be also of interest to have the exact Lustre version being run on Clients/Servers along with the list/detail of any additional patches applied.

            bfaccini Bruno Faccini (Inactive) added a comment - - edited Here are some news/updates for this ticket after the conf-call : _ it has been agreed that further update for this problem at AWE will be in this ticket and no longer in LU-4581 for LLNL. _ there are no known recent occurrence of this problem at AWE. _ as already requested in LU-4581 , customer is running with D_CACHE traces enabled on their Clients. _ debug buffer size has also been increased (exact value to be provided). _ a crash-dump will be taken and available for debugging upon next occurrence. _ would be also of interest to have the exact Lustre version being run on Clients/Servers along with the list/detail of any additional patches applied.
            pjones Peter Jones added a comment -

            This is still occurring at DDN site so this does not appear to be a duplicate of LU-4581. Can we please get a level set on where are with this ticket? Oz, how frequently does this issue occur at the customer site? Are there any logs/stacks/crash dumps associated with the crashes?

            pjones Peter Jones added a comment - This is still occurring at DDN site so this does not appear to be a duplicate of LU-4581 . Can we please get a level set on where are with this ticket? Oz, how frequently does this issue occur at the customer site? Are there any logs/stacks/crash dumps associated with the crashes?

            duplicate of LU-4581. I closed this one because it has a stack trace over there.

            jay Jinshan Xiong (Inactive) added a comment - duplicate of LU-4581 . I closed this one because it has a stack trace over there.
            bfaccini Bruno Faccini (Inactive) added a comment - - edited

            Hello Oz,
            Is there any crash-dump available for this issue, and if yes can you provide it with the necessary vmlinux and lustre modules ?
            BTW, LU-4581 seems to also report the same issue, even if we still need to clarify if it occurs running with patch/change #5419 from LU-2779.

            bfaccini Bruno Faccini (Inactive) added a comment - - edited Hello Oz, Is there any crash-dump available for this issue, and if yes can you provide it with the necessary vmlinux and lustre modules ? BTW, LU-4581 seems to also report the same issue, even if we still need to clarify if it occurs running with patch/change #5419 from LU-2779 .

            Any updates on this one? Please advise.
            Thanks

            orentas Oz Rentas (Inactive) added a comment - Any updates on this one? Please advise. Thanks
            pjones Peter Jones added a comment -

            Bobijam

            Could you please assist with this ticket?

            Thanks

            Peter

            pjones Peter Jones added a comment - Bobijam Could you please assist with this ticket? Thanks Peter

            People

              bobijam Zhenyu Xu
              orentas Oz Rentas (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: