Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12004

Crash in do_csum

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      I see this semi-frequently in master even after LU-11697, so this must be something else.

      This is typically only in racer and the full crash looks like this:

      [ 8628.366285] Lustre: DEBUG MARKER: == racer test 1: racer on clients: centos-70.localnet DURATION=2700 ================================== 05:27:21 (1549708041)
      [ 8629.054425] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200000402:0x4:0x0], use llapi_layout_get_by_path()
      [ 8630.549219] Lustre: DEBUG MARKER: racer test_1: @@@@@@ FAIL: generate lss conf (mds1)
      [ 8634.303466] LustreError: 14083:0:(mdt_lvb.c:430:mdt_lvbo_fill()) lustre-MDT0000: small buffer size 472 for EA 496 (max_mdsize 496): rc = -34
      [ 8779.449264] BUG: unable to handle kernel paging request at ffff8800aa2dc000
      [ 8779.449670] IP: [<ffffffff813ee500>] do_csum+0x70/0x180
      [ 8779.449670] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ 8779.449670] CPU: 9 PID: 15375 Comm: ll_ost_io04_000  3.10.0-7.6-debug #1
      [ 8779.449670] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 8779.509742] Call Trace:
      [ 8779.509742]  [<ffffffff813ee61e>] ip_compute_csum+0xe/0x30
      [ 8779.509742]  [<ffffffffa035e62e>] obd_dif_ip_fn+0xe/0x10 [obdclass]
      [ 8779.523520]  [<ffffffffa035e6f9>] obd_page_dif_generate_buffer+0xc9/0x190 [obdclass]
      [ 8779.523520]  [<ffffffffa05e18db>] tgt_checksum_niobuf_rw+0x28b/0xea0 [ptlrpc]
      [ 8779.541604]  [<ffffffffa05e7e8d>] tgt_brw_read+0xc2d/0x1e60 [ptlrpc]
      [ 8779.541604]  [<ffffffffa05e62a5>] tgt_request_handle+0x915/0x1610 [ptlrpc]
      [ 8779.541604]  [<ffffffffa058b3d9>] ptlrpc_server_handle_request+0x259/0xad0 [ptlrpc]
      [ 8779.541604]  [<ffffffffa058f3bc>] ptlrpc_main+0xb7c/0x22c0 [ptlrpc]
      [ 8779.541604]  [<ffffffff810b4ed4>] kthread+0xe4/0xf0
      [ 8779.541604]  [<ffffffff817c4c77>] ret_from_fork_nospec_begin+0x21/0x21
      

      note that even before ti10dif was landed I still saw this, just a bit different trace.

      It seems in all cases only tgt_brw_read is hitting this

      Attachments

        Activity

          People

            wc-triage WC Triage
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: