Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15681

crash in lnet_process_id_hash()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.15.0
    • None
    • CentOS Linux release 8.5.2111
      Kernel: 4.18.0-348.7.1.el8_5.x86_64
    • 3
    • 9223372036854775807

    Description

      Dear Devs,

      During heavy workload we are experiencing kernel crash caused by page fault in  lnet_process_id_hash()

      <pre>

      [  520.767199] BUG: unable to handle kernel paging request at 00000000deadbf1f
      [  520.775831] PGD 0 P4D 0 
      [  520.779875] Oops: 0000 1 SMP NOPTI
      [  520.785037] CPU: 10 PID: 492691 Comm: ll_ost00_016 Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-348.7.1.el8_5.x86_64 #1
      [  520.800422] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant DL325 Gen10 Plus, BIOS A43 12/03/2021
      [  520.812168] RIP: 0010:lnet_process_id_hash+0x5/0x50 [ptlrpc]
      [  520.820123] Code: 7e 28 39 7a 0c 75 d4 8b 7e 2c 39 7a 10 75 cc 8b 46 30 39 42 14 0f 94 c0 0f b6 c0 8d 44 40 fd c3 0f 1f 44 00 00 0f 1f 44 00 00 <33> 57 14 be ff ff ff ff 69 ca 47 86 c8 61 48 85 ff 74 18 0f b6 47
      [  520.842104] RSP: 0018:ffffaa79b40c3be0 EFLAGS: 00010202
      [  520.848959] RAX: ffffffffc1a36690 RBX: 5a5a5a5a5a5a5a5a RCX: 00000000deadbeef
      [  520.857883] RDX: 000000000cdd1d51 RSI: 0000000000000001 RDI: 00000000deadbf0b
      [  520.866635] RBP: ffffaa79b40c3c70 R08: ffff8ac3fecaabf8 R09: 00000000000003e8
      [  520.875303] R10: 0000000000000000 R11: ffff8ac3feca8ec4 R12: ffff8a4e59d5c000
      [  520.884015] R13: ffffffffc1baa580 R14: fffffffffffffff0 R15: 00000000deadbeef
      [  520.893240] FS:  0000000000000000(0000) GS:ffff8ac3fec80000(0000) knlGS:0000000000000000
      [  520.903294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  520.911071] CR2: 00000000deadbf1f CR3: 0000000c407c6000 CR4: 0000000000350ee0
      [  520.920020] Call Trace:
      [  520.924182]  ptlrpc_connection_get+0x27f/0x920 [ptlrpc]
      [  520.931034]  target_handle_connect+0x6de/0x29d0 [ptlrpc]
      [  520.937816]  ? internal_add_timer+0x42/0x60
      [  520.943593]  tgt_request_handle+0x565/0x1a40 [ptlrpc]
      [  520.950382]  ? ptlrpc_nrs_req_get_nolock0+0xfb/0x1f0 [ptlrpc]
      [  520.957780]  ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
      [  520.965373]  ptlrpc_main+0xc06/0x1560 [ptlrpc]
      [  520.971430]  ? __schedule+0x2c5/0x760
      [  520.976758]  ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
      [  520.983264]  kthread+0x116/0x130
      [  520.987811]  ? kthread_flush_work_fn+0x10/0x10
      [  520.993636]  ret_from_fork+0x22/0x40

      </pre>

       

      Attachments

        Activity

          People

            wc-triage WC Triage
            lflis Lukasz Flis
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: