Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15487

crash after mdd_dir_page_build() error

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.12.7
    • None
    • 3
    • 9223372036854775807

    Description

      Seeing crashes in random locations with apparent memory corruption shortly after mdd_dir_page_build() reports an error. https://testing.whamcloud.com/test_sets/293f9d80-1e10-4042-86b5-7816504cc1ae
      https://testing.whamcloud.com/test_sets/a6c5e9e1-dbdd-418e-8e51-f417bdee3be7

       LNetError: 11244:0:(o2iblnd_cb.c:3371:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds
       LNetError: 11244:0:(o2iblnd_cb.c:3446:kiblnd_check_conns()) Timed out RDMA with 172.168.202.16@o2ib (105): c: 7, oc: 0, rc: 8
       Lustre: 11433:0:(mdd_object.c:3460:mdd_dir_page_build()) build page failed: -22!
       LustreError: 11251:0:(events.c:496:ptlrpc_master_callback()) ASSERTION( callback == request_out_callback || callback == reply_in_callback || callback == client_bulk_callback || callback == request_in_callback || callback == reply_out_callback || callback == server_bulk_callback ) failed: 
       LustreError: 11251:0:(events.c:496:ptlrpc_master_callback()) LBUG
       Pid: 11251, comm: kiblnd_sd_01_02 3.10.0-1062.1.1.el7_lustre.x86_64 #1 SMP Wed Mar 25 16:04:09 PDT 2020
       Call Trace:
       libcfs_call_trace+0x8c/0xc0 [libcfs]
       lbug_with_loc+0x4c/0xa0 [libcfs]
       ptlrpc_master_callback+0xbd/0xc0 [ptlrpc]
       lnet_eq_enqueue_event+0x2e/0x140 [lnet]
       lnet_finalize+0x24c/0xd40 [lnet]
       kiblnd_recv+0x1cd/0x7c0 [ko2iblnd]
       lnet_ni_recv+0xc8/0x330 [lnet]
       lnet_recv_put+0x85/0xb0 [lnet]
       lnet_parse_local+0x5ae/0xd40 [lnet]
       lnet_parse+0x99a/0x11e0 [lnet]
       kiblnd_handle_rx+0x213/0x6b0 [ko2iblnd]
       kiblnd_scheduler+0xf42/0x1190 [ko2iblnd]
       kthread+0xd1/0xe0
      

      Not yet sure of cause/effect, but filing ticket to track and submit a debug patch.

       LDISKFS-fs warning (device dm-18): ldiskfs_dx_add_entry:2629: Large directory feature is not enabled on this filesystem
       Lustre: 52007:0:(mdd_object.c:3460:mdd_dir_page_build()) build page failed: -22!
       WARNING: CPU: 80 PID: 74750 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
       list_del corruption. prev->next should be ffffa1d80d4a7000, but was           (null)
       LustreError: 56942:0:(events.c:496:ptlrpc_master_callback()) ASSERTION( callback == request_out_callback || callback == reply_in_callback || callback == client_bulk_callback || callback == request_in_callback || callback == reply_out_callback || callback == server_bulk_callback ) failed: 
      CPU: 80 PID: 74750 Comm: mdt_rdpg01_086 Kdump: loaded 3.10.0-1062.1.1.el7_lustre.x86_64 #1
       __list_del_entry+0xa1/0xd0
       list_del+0xd/0x30
       ptlrpc_server_drop_request+0xe5/0x6d0 [ptlrpc]
       ptlrpc_server_finish_active_request+0x92/0x140 [ptlrpc]
       ptlrpc_server_handle_request+0x401/0xab0 [ptlrpc]
       ptlrpc_main+0xb34/0x1470 [ptlrpc]
       kthread+0xd1/0xe0
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: