Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12322

negative grant and tgt_grant.c:561:tgt_grant_incoming() LBUG

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.10.7
    • None
    • CentOS 7.6
    • 3
    • 9223372036854775807

    Description

       

      New LBUG tonight with 2.10.7, likely a duplicate of LU-12120.

      [219417.266382] LustreError: 281433:0:(tgt_grant.c:559:tgt_grant_incoming()) oak-OST0053: cli 0eb5afeb-9924-327f-d61d-428dac6cb441/ffff883c7ef04800 dirty 0 pend 0 grant -29360128
      [219417.283969] LustreError: 281433:0:(tgt_grant.c:561:tgt_grant_incoming()) LBUG
      [219417.292035] Pid: 281433, comm: ll_ost00_045 3.10.0-693.2.2.el7_lustre.pl3.x86_64 #1 SMP Thu Mar 15 13:06:45 PDT 2018
      [219417.303881] Call Trace:
      [219417.306716] [<ffffffff8103a212>] save_stack_trace_tsk+0x22/0x40
      [219417.313560] [<ffffffffc08087cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [219417.320982] [<ffffffffc080887c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [219417.328013] [<ffffffffc0bf81b0>] tgt_grant_prepare_read+0x0/0x3b0 [ptlrpc]
      [219417.335979] [<ffffffffc0bf82bb>] tgt_grant_prepare_read+0x10b/0x3b0 [ptlrpc]
      [219417.344121] [<ffffffffc119df6d>] ofd_set_info_hdl+0x23d/0x4a0 [ofd]
      [219417.351343] [<ffffffffc0bda115>] tgt_request_handle+0x925/0x1370 [ptlrpc]
      [219417.359202] [<ffffffffc0b82dd6>] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc]
      [219417.367930] [<ffffffffc0b86512>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [219417.375101] [<ffffffff810b098f>] kthread+0xcf/0xe0
      [219417.380671] [<ffffffff816b4f58>] ret_from_fork+0x58/0x90
      [219417.386823] [<ffffffffffffffff>] 0xffffffffffffffff
      [219417.392529] Kernel panic - not syncing: LBUG
      

       prior to this, a lot of evictions and network errors 

      [218920.410034] LustreError: 257942:0:(events.c:449:server_bulk_callback()) event type 5, status -125, desc ffff881d0b310a00
      [218923.029069] LustreError: 257941:0:(events.c:449:server_bulk_callback()) event type 5, status -125, desc ffff882566547000
      [218928.063611] LustreError: 257941:0:(events.c:449:server_bulk_callback()) event type 5, status -125, desc ffff880050289600
      [218957.709743] Lustre: oak-OST004d: haven't heard from client ed27e8aa-82e0-d7bd-37eb-95d83ba476b8 (at 10.8.7.32@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff883c83a64000, cur 1558412848 expire 1558412698 last 1558412621 

       

      Attaching vmcore-dmesg, output of basic crash commands and foreach bt.

      Please note that (just in case) I didn't have the patch for LU-12018 on this OSS. It was a stock 2.10.7. Now the OSS has been updated.

      Attachments

        1. oak-io2-s2-crash-cmd-20190520.txt
          8 kB
          Stephane Thiell
        2. oak-io2-s2-foreach-bt-20190520.txt
          1.58 MB
          Stephane Thiell
        3. oak-io2-s2-vmcore-dmesg-20190520.txt
          855 kB
          Stephane Thiell

        Issue Links

          Activity

            People

              wc-triage WC Triage
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: