Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12531

LustreError: 3438:0:(pers.c:49:ptlrpc_fill_bulk_md()) ASSERTION( mdidx < desc->bd_md_max_brw ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.5
    • None
    • 3
    • 9223372036854775807

    Description

      I was wondering if you would be interested in the following crashes we see:

      2019-06-11T14:12:48+08:00 nanny1351 kernel: Lustre: pn-OST0012-osc-ffff881f9cca3000: Connection restored to 172.16.0.16@tcp (at 172.16.0.16@tcp)
      2019-06-11T14:12:50+08:00 nanny1351 kernel: LustreError: 3438:0:(pers.c:49:ptlrpc_fill_bulk_md()) ASSERTION( mdidx < desc->bd_md_max_brw ) failed:
      2019-06-11T14:12:50+08:00 nanny1351 kernel: LustreError: 3438:0:(pers.c:49:ptlrpc_fill_bulk_md()) LBUG
      2019-06-11T14:12:50+08:00 nanny1351 kernel: Pid: 3438, comm: ptlrpcd_00_55
      2019-06-11T14:12:50+08:00 nanny1351 kernel: #012Call Trace:
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffffc03d67ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffffc03d683c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffffc0ac6cee>] ptlrpc_fill_bulk_md+0xde/0x150 [ptlrpc]
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffffc0a9d35f>] ptlrpc_register_bulk+0x2ff/0x9d0 [ptlrpc]
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffffc0a9e446>] ptl_send_rpc+0x256/0xe50 [ptlrpc]
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffffc0ad2923>] ? sptlrpc_req_refresh_ctx+0x153/0x910 [ptlrpc]
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffff810ca2ae>] ? account_entity_dequeue+0xae/0xd0
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffffc0610529>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffffc0a93508>] ptlrpc_send_new_req+0x468/0xa60 [ptlrpc]
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffff810c818e>] ? vtime_account_idle+0xe/0x50
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffffc0a96738>] ptlrpc_check_set.part.23+0x878/0x1d90 [ptlrpc]
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffffc0a97cab>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffffc0ac4a4b>] ptlrpcd_check+0x4db/0x5c0 [ptlrpc]
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffffc0ac4deb>] ptlrpcd+0x2bb/0x560 [ptlrpc]
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffff810c4820>] ? default_wake_function+0x0/0x20
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffffc0ac4b30>] ? ptlrpcd+0x0/0x560 [ptlrpc]
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffff810b099f>] kthread+0xcf/0xe0
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffff810c818e>] ? vtime_account_idle+0xe/0x50
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffff810b08d0>] ? kthread+0x0/0xe0
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90
      2019-06-11T14:12:50+08:00 nanny1351 kernel: [<ffffffff810b08d0>] ? kthread+0x0/0xe0
      2019-06-11T14:12:50+08:00 nanny1351 kernel:
      2019-06-11T14:12:50+08:00 nanny1351 kernel: Kernel panic - not syncing: LBUG
      

      Interestingly many nodes suffered this crash within 15 minutes (or maybe they were two separate events) ?

      2019-06-11T14:03:38+08:00 nanny1343 kernel: LustreError: 3386:0:(pers.c:49:ptlrpc_fill_bulk_md()) LBUG
      2019-06-11T14:03:38+08:00 nanny1349 kernel: LustreError: 3708:0:(pers.c:49:ptlrpc_fill_bulk_md()) LBUG
      2019-06-11T14:03:38+08:00 nanny1347 kernel: LustreError: 3591:0:(pers.c:49:ptlrpc_fill_bulk_md()) LBUG
      2019-06-11T14:12:50+08:00 nanny1351 kernel: LustreError: 3438:0:(pers.c:49:ptlrpc_fill_bulk_md()) LBUG
      2019-06-11T14:15:30+08:00 nanny1331 kernel: LustreError: 3566:0:(pers.c:49:ptlrpc_fill_bulk_md()) LBUG
      

      This is Lustre 2.10.1 running on Linux nanny347 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
      Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz (272 hardware threads (68cores x 4 ht)).

      Attachments

        Activity

          People

            wc-triage WC Triage
            Tomaka Jacek Tomaka (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: