Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6799

getxattr failed: -2 triggers a Kernel BUG on MDS

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.5.3
    • MDS/OSS:
       - RHEL 6.6 w/ Bull kernel 2.6.32-504.16.2.el6.Bull.74.x86_64
       - Lustre 2.5.3.90
       - OFED 3.12
      Routers/Clients:
       - RHEL 7.1
       - Lustre 2.7.0
       - MLNX_OFED 2.3
    • 3
    • 9223372036854775807

    Description

      The LustreError: (mdt_xattr.c:131:mdt_getxattr_one()) getxattr failed: -2 triggers the following kernel BUG on the MDS of one of our filesystem.

      2015-06-30 13:40:01
      2015-06-30 13:45:01 Lustre: DEBUG MARKER: Tue Jun 30 13:45:01 2015
      2015-06-30 13:45:01
      2015-06-30 13:50:01 Lustre: DEBUG MARKER: Tue Jun 30 13:50:01 2015
      2015-06-30 13:50:01
      2015-06-30 13:50:32 LustreError: 20168:0:(mdt_xattr.c:131:mdt_getxattr_one()) getxattr failed: -2
      2015-06-30 13:50:32 BUG: unable to handle kernel NULL pointer dereference at (null)
      2015-06-30 13:50:32 IP: [<ffffffff8129c452>] sg_next+0x2/0x30
      2015-06-30 13:50:32 PGD 0
      2015-06-30 13:50:32 Oops: 0000 [#1] SMP
      2015-06-30 13:50:32 last sysfs file: /sys/devices/pci0000:80/0000:80:05.0/0000:85:00.0/host8/rport-8:0-0/target8:0:0/8:0:0:0/state
      2015-06-30 13:50:32 CPU 23
      2015-06-30 13:50:32 Modules linked in: osp(U) mdd(U) lfsck(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) ldiskfs(U) lustre(U) lov(U) osc(U) mdc(U) lquota(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic crc32c_intel libcfs(U) nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bonding 8021q garp stp llc rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) mlx4_core(U) dm_round_robin scsi_dh_rdac dm_multipath mic(U) uinput ipmi_devintf ipmi_si ipmi_msghandler sg lpc_ich mfd_core ioatdma lpfc scsi_transport_fc scsi_tgt igb dca i2c_algo_bit i2c_core ptp pps_core compat(U) ext4 jbd2 mbcache sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod megaraid_sas [last unloaded: scsi_wait_scan]
      2015-06-30 13:50:32
      2015-06-30 13:50:32 Pid: 20168, comm: mdt03_049 Not tainted 2.6.32-504.16.2.el6.Bull.74.x86_64 #1 BULL bullx super-node
      2015-06-30 13:50:32 RIP: 0010:[<ffffffff8129c452>]  [<ffffffff8129c452>] sg_next+0x2/0x30
      2015-06-30 13:50:32 RSP: 0018:ffff880847f518c8  EFLAGS: 00010246
      2015-06-30 13:50:32 RAX: 0000000000000000 RBX: ffff88046e524000 RCX: 0000000000000000
      2015-06-30 13:50:32 RDX: 0000000000000101 RSI: ffffc900175fb240 RDI: 0000000000000000
      2015-06-30 13:50:32 RBP: ffff880847f51940 R08: ffffea003958cb98 R09: 0000000000000301
      2015-06-30 13:50:32 R10: 0000000000001000 R11: 0000000000000000 R12: ffff880c78a28940
      2015-06-30 13:50:32 R13: ffff88046e536000 R14: ffffc900175fb240 R15: ffff880c7980c090
      2015-06-30 13:50:32 FS:  0000000000000000(0000) GS:ffff880c8e540000(0000) knlGS:0000000000000000
      2015-06-30 13:50:32 CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      2015-06-30 13:50:32 CR2: 0000000000000000 CR3: 0000000001a85000 CR4: 00000000000007e0
      2015-06-30 13:50:32 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      2015-06-30 13:50:32 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      2015-06-30 13:50:32 Process mdt03_049 (pid: 20168, threadinfo ffff880847f50000, task ffff88084585aab0)
      2015-06-30 13:50:32 Stack:
      2015-06-30 13:50:32  ffffffffa09cb46e ffff880847f51fd8 ffff88084585aab0 ffffffff00000101
      2015-06-30 13:50:32 <d> ffffffff81a9ac80 ffff880c78a289c0 0000030100000001 ffff880847f51920
      2015-06-30 13:50:32 <d> ffffffff8105e100 ffff880876a8d558 ffff880c796ffb80 ffffc900175fb240
      2015-06-30 13:50:32 Call Trace:
      2015-06-30 13:50:32  [<ffffffffa09cb46e>] ? kiblnd_map_tx+0x19e/0x540 [ko2iblnd]
      2015-06-30 13:50:32  [<ffffffff8105e100>] ? __dequeue_entity+0x30/0x50
      2015-06-30 13:50:32  [<ffffffffa09cbe0a>] kiblnd_setup_rd_iov+0x13a/0x2b0 [ko2iblnd]
      2015-06-30 13:50:32  [<ffffffffa09d151a>] kiblnd_send+0x5da/0x9b0 [ko2iblnd]
      2015-06-30 13:50:32  [<ffffffffa05e6d6b>] lnet_ni_send+0x4b/0xf0 [lnet]
      2015-06-30 13:50:32  [<ffffffffa05eafa5>] lnet_send+0x655/0xb80 [lnet]
      2015-06-30 13:50:32  [<ffffffffa05ec00a>] LNetPut+0x31a/0x860 [lnet]
      2015-06-30 13:50:32  [<ffffffffa07fbc40>] ptl_send_buf+0x1e0/0x550 [ptlrpc]
      2015-06-30 13:50:32  [<ffffffffa081b8b8>] ? at_measured+0x108/0x380 [ptlrpc]
      2015-06-30 13:50:32  [<ffffffffa083c445>] ? null_authorize+0x75/0x100 [ptlrpc]
      2015-06-30 13:50:32  [<ffffffffa07fc22b>] ptlrpc_send_reply+0x27b/0x7f0 [ptlrpc]
      2015-06-30 13:50:32  [<ffffffffa07c7054>] target_send_reply_msg+0x54/0x190 [ptlrpc]
      2015-06-30 13:50:32  [<ffffffffa07c7576>] target_send_reply+0x3e6/0x720 [ptlrpc]
      2015-06-30 13:50:32  [<ffffffffa0ea2df9>] mdt_handle_common+0x5d9/0x1470 [mdt]
      2015-06-30 13:50:32  [<ffffffffa0edf645>] mds_regular_handle+0x15/0x20 [mdt]
      2015-06-30 13:50:32  [<ffffffffa0811ee5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
      2015-06-30 13:50:32  [<ffffffffa053e4ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      2015-06-30 13:50:32  [<ffffffffa054f7d5>] ? lc_watchdog_touch+0x65/0x170 [libcfs]
      2015-06-30 13:50:32  [<ffffffffa080a919>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
      2015-06-30 13:50:32  [<ffffffff81057819>] ? __wake_up_common+0x59/0x90
      2015-06-30 13:50:32  [<ffffffffa081466d>] ptlrpc_main+0xaed/0x1770 [ptlrpc]
      2015-06-30 13:50:32  [<ffffffffa0813b80>] ? ptlrpc_main+0x0/0x1770 [ptlrpc]
      2015-06-30 13:50:32  [<ffffffff8109e71e>] kthread+0x9e/0xc0
      2015-06-30 13:50:32  [<ffffffff8100c20a>] child_rip+0xa/0x20
      2015-06-30 13:50:32  [<ffffffff8109e680>] ? kthread+0x0/0xc0
      2015-06-30 13:50:32  [<ffffffff8100c200>] ? child_rip+0x0/0x20
      2015-06-30 13:50:32 Code: 5c 41 5d 41 5e 41 5f c9 c3 55 48 c7 c2 30 cc 29 81 be 80 00 00 00 48 89 e5 e8 6b ff ff ff c9 c3 66 0f 1f 84 00 00 00 00 00 31 c0 <f6> 07 02 55 48 89 e5 75 0d 48 8b 57 20 48 8d 47 20 f6 c2 01 75
      2015-06-30 13:50:32 RIP  [<ffffffff8129c452>] sg_next+0x2/0x30
      2015-06-30 13:50:32  RSP <ffff880847f518c8>
      2015-06-30 13:50:32 CR2: 0000000000000000
      

      We hit this issue on our new filesystem only. This is a dedicated filesystem for our Lustre 2.7 clients/routers. We already had 4 occurrences since the beginning of this week.

      This LustreError is also reported in the console of the MDS of a second filesystem, but there is no kernel BUG. The main difference is that the clients/routers of this second FS are running the same software stack than the servers (RHEL6.6/Lustre 2.5.3.90/OFED3.12).

      Here are some traces from the crash (bt/bt -f):

            KERNEL: /usr/lib/debug/lib/modules/2.6.32-504.16.2.el6.Bull.74.x86_64/vmlinux
          DUMPFILE: vmcore  [PARTIAL DUMP]
              CPUS: 32
              DATE: Tue Jun 30 13:50:31 2015
            UPTIME: 19:57:11
      LOAD AVERAGE: 0.10, 0.06, 0.10
             TASKS: 1685
          NODENAME: mds2
           RELEASE: 2.6.32-504.16.2.el6.Bull.74.x86_64
           VERSION: #1 SMP Tue Apr 28 01:43:42 CEST 2015
           MACHINE: x86_64  (2266 Mhz)
            MEMORY: 64 GB
             PANIC: "Oops: 0000 [#1] SMP " (check log for details)
               PID: 20168
           COMMAND: "mdt03_049"
              TASK: ffff88084585aab0  [THREAD_INFO: ffff880847f50000]
               CPU: 23
             STATE: TASK_RUNNING (PANIC)
      
      crash> bt
      PID: 20168  TASK: ffff88084585aab0  CPU: 23  COMMAND: "mdt03_049"
       #0 [ffff880847f514b0] machine_kexec at ffffffff8103b71b
       #1 [ffff880847f51510] crash_kexec at ffffffff810c9942
       #2 [ffff880847f515e0] oops_end at ffffffff8152f070
       #3 [ffff880847f51610] no_context at ffffffff8104c80b
       #4 [ffff880847f51660] __bad_area_nosemaphore at ffffffff8104ca95
       #5 [ffff880847f516b0] bad_area_nosemaphore at ffffffff8104cb63
       #6 [ffff880847f516c0] __do_page_fault at ffffffff8104d25c
       #7 [ffff880847f517e0] do_page_fault at ffffffff81530fbe
       #8 [ffff880847f51810] page_fault at ffffffff8152e375
          [exception RIP: sg_next+2]
          RIP: ffffffff8129c452  RSP: ffff880847f518c8  RFLAGS: 00010246
          RAX: 0000000000000000  RBX: ffff88046e524000  RCX: 0000000000000000
          RDX: 0000000000000101  RSI: ffffc900175fb240  RDI: 0000000000000000
          RBP: ffff880847f51940   R8: ffffea003958cb98   R9: 0000000000000301
          R10: 0000000000001000  R11: 0000000000000000  R12: ffff880c78a28940
          R13: ffff88046e536000  R14: ffffc900175fb240  R15: ffff880c7980c090
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #9 [ffff880847f518c8] kiblnd_map_tx at ffffffffa09cb46e [ko2iblnd]
      #10 [ffff880847f51948] kiblnd_setup_rd_iov at ffffffffa09cbe0a [ko2iblnd]
      #11 [ffff880847f519a8] kiblnd_send at ffffffffa09d151a [ko2iblnd]
      #12 [ffff880847f51a48] lnet_ni_send at ffffffffa05e6d6b [lnet]
      #13 [ffff880847f51a68] lnet_send at ffffffffa05eafa5 [lnet]
      #14 [ffff880847f51ad8] LNetPut at ffffffffa05ec00a [lnet]
      #15 [ffff880847f51b38] ptl_send_buf at ffffffffa07fbc40 [ptlrpc]
      #16 [ffff880847f51be8] ptlrpc_send_reply at ffffffffa07fc22b [ptlrpc]
      #17 [ffff880847f51c68] target_send_reply_msg at ffffffffa07c7054 [ptlrpc]
      #18 [ffff880847f51c98] target_send_reply at ffffffffa07c7576 [ptlrpc]
      #19 [ffff880847f51d08] mdt_handle_common at ffffffffa0ea2df9 [mdt]
      #20 [ffff880847f51d58] mds_regular_handle at ffffffffa0edf645 [mdt]
      #21 [ffff880847f51d68] ptlrpc_server_handle_request at ffffffffa0811ee5 [ptlrpc]
      #22 [ffff880847f51e48] ptlrpc_main at ffffffffa081466d [ptlrpc]
      #23 [ffff880847f51ee8] kthread at ffffffff8109e71e
      #24 [ffff880847f51f48] kernel_thread at ffffffff8100c20a
      
      crash> dis -rl ffffffffa09cb46e
      0xffffffffa09cb2d0 <kiblnd_map_tx>:     push   %rbp
      0xffffffffa09cb2d1 <kiblnd_map_tx+1>:   mov    %rsp,%rbp
      0xffffffffa09cb2d4 <kiblnd_map_tx+4>:   push   %r15     
      0xffffffffa09cb2d6 <kiblnd_map_tx+6>:   push   %r14     
      0xffffffffa09cb2d8 <kiblnd_map_tx+8>:   push   %r13     
      0xffffffffa09cb2da <kiblnd_map_tx+10>:  push   %r12     
      0xffffffffa09cb2dc <kiblnd_map_tx+12>:  push   %rbx     
      0xffffffffa09cb2dd <kiblnd_map_tx+13>:  sub    $0x48,%rsp
      0xffffffffa09cb2e1 <kiblnd_map_tx+17>:  nopl   0x0(%rax,%rax,1)
      0xffffffffa09cb2e6 <kiblnd_map_tx+22>:  mov    %ecx,-0x44(%rbp)
      0xffffffffa09cb2e9 <kiblnd_map_tx+25>:  mov    0x10(%rsi),%rax 
      0xffffffffa09cb2ed <kiblnd_map_tx+29>:  mov    %rdx,%rbx       
      0xffffffffa09cb2f0 <kiblnd_map_tx+32>:  mov    0x50(%rdi),%rdi 
      0xffffffffa09cb2f4 <kiblnd_map_tx+36>:  mov    %rsi,%r14       
      0xffffffffa09cb2f7 <kiblnd_map_tx+39>:  mov    0x40(%rax),%r12 
      0xffffffffa09cb2fb <kiblnd_map_tx+43>:  mov    %rdi,-0x50(%rbp)
      0xffffffffa09cb2ff <kiblnd_map_tx+47>:  cmp    %rdx,0x80(%rsi) 
      0xffffffffa09cb306 <kiblnd_map_tx+54>:  setne  %al             
      0xffffffffa09cb309 <kiblnd_map_tx+57>:  movzbl %al,%edx        
      0xffffffffa09cb30c <kiblnd_map_tx+60>:  add    $0x1,%edx       
      0xffffffffa09cb30f <kiblnd_map_tx+63>:  mov    %edx,-0x48(%rbp)
      0xffffffffa09cb312 <kiblnd_map_tx+66>:  mov    %edx,0xb0(%r14) 
      0xffffffffa09cb319 <kiblnd_map_tx+73>:  mov    %ecx,0x88(%rsi) 
      0xffffffffa09cb31f <kiblnd_map_tx+79>:  mov    0x8(%r12),%rdi  
      0xffffffffa09cb324 <kiblnd_map_tx+84>:  mov    0x90(%rsi),%r13 
      0xffffffffa09cb32b <kiblnd_map_tx+91>:  mov    0x2b8(%rdi),%rax
      0xffffffffa09cb332 <kiblnd_map_tx+98>:  test   %rax,%rax       
      0xffffffffa09cb335 <kiblnd_map_tx+101>: je     0xffffffffa09cb430 <kiblnd_map_tx+352>
      0xffffffffa09cb33b <kiblnd_map_tx+107>: mov    %edx,%ecx
      0xffffffffa09cb33d <kiblnd_map_tx+109>: mov    %r13,%rsi
      0xffffffffa09cb340 <kiblnd_map_tx+112>: mov    -0x44(%rbp),%edx
      0xffffffffa09cb343 <kiblnd_map_tx+115>: callq  *0x28(%rax)
      0xffffffffa09cb346 <kiblnd_map_tx+118>: xor    %edx,%edx
      0xffffffffa09cb348 <kiblnd_map_tx+120>: xor    %r15d,%r15d
      0xffffffffa09cb34b <kiblnd_map_tx+123>: test   %eax,%eax
      0xffffffffa09cb34d <kiblnd_map_tx+125>: mov    %eax,0x4(%rbx)
      0xffffffffa09cb350 <kiblnd_map_tx+128>: jne    0xffffffffa09cb3b9 <kiblnd_map_tx+233>
      0xffffffffa09cb352 <kiblnd_map_tx+130>: jmpq   0xffffffffa09cb3f0 <kiblnd_map_tx+288>
      0xffffffffa09cb357 <kiblnd_map_tx+135>: nopw   0x0(%rax,%rax,1)
      0xffffffffa09cb360 <kiblnd_map_tx+144>: mov    %edx,-0x60(%rbp)
      0xffffffffa09cb363 <kiblnd_map_tx+147>: mov    %rcx,-0x68(%rbp)
      0xffffffffa09cb367 <kiblnd_map_tx+151>: callq  *0x40(%rax)
      0xffffffffa09cb36a <kiblnd_map_tx+154>: mov    -0x60(%rbp),%edx
      0xffffffffa09cb36d <kiblnd_map_tx+157>: mov    -0x68(%rbp),%rcx
      0xffffffffa09cb371 <kiblnd_map_tx+161>: lea    0x0(%r13,%r13,2),%rsi
      0xffffffffa09cb376 <kiblnd_map_tx+166>: mov    %eax,0x8(%rbx,%rsi,4)
      0xffffffffa09cb37a <kiblnd_map_tx+170>: mov    0x8(%r12),%rdi
      0xffffffffa09cb37f <kiblnd_map_tx+175>: mov    %rcx,%rsi
      0xffffffffa09cb382 <kiblnd_map_tx+178>: add    0x90(%r14),%rsi
      0xffffffffa09cb389 <kiblnd_map_tx+185>: mov    0x2b8(%rdi),%rax
      0xffffffffa09cb390 <kiblnd_map_tx+192>: test   %rax,%rax
      0xffffffffa09cb393 <kiblnd_map_tx+195>: je     0xffffffffa09cb3e8 <kiblnd_map_tx+280>
      0xffffffffa09cb395 <kiblnd_map_tx+197>: mov    %edx,-0x60(%rbp)
      0xffffffffa09cb398 <kiblnd_map_tx+200>: callq  *0x38(%rax)
      0xffffffffa09cb39b <kiblnd_map_tx+203>: mov    -0x60(%rbp),%edx
      0xffffffffa09cb39e <kiblnd_map_tx+206>: lea    0x0(%r13,%r13,2),%rcx
      0xffffffffa09cb3a3 <kiblnd_map_tx+211>: add    $0x1,%edx
      0xffffffffa09cb3a6 <kiblnd_map_tx+214>: shl    $0x2,%rcx
      0xffffffffa09cb3aa <kiblnd_map_tx+218>: mov    %rax,0xc(%rcx,%rbx,1)
      0xffffffffa09cb3af <kiblnd_map_tx+223>: add    0x8(%rbx,%rcx,1),%r15d
      0xffffffffa09cb3b4 <kiblnd_map_tx+228>: cmp    %edx,0x4(%rbx)
      0xffffffffa09cb3b7 <kiblnd_map_tx+231>: jbe    0xffffffffa09cb3f0 <kiblnd_map_tx+288>
      0xffffffffa09cb3b9 <kiblnd_map_tx+233>: mov    0x8(%r12),%rdi
      0xffffffffa09cb3be <kiblnd_map_tx+238>: movslq %edx,%r13
      0xffffffffa09cb3c1 <kiblnd_map_tx+241>: mov    %r13,%rcx
      0xffffffffa09cb3c4 <kiblnd_map_tx+244>: shl    $0x5,%rcx
      0xffffffffa09cb3c8 <kiblnd_map_tx+248>: mov    0x2b8(%rdi),%rax
      0xffffffffa09cb3cf <kiblnd_map_tx+255>: mov    %rcx,%rsi
      0xffffffffa09cb3d2 <kiblnd_map_tx+258>: add    0x90(%r14),%rsi
      0xffffffffa09cb3d9 <kiblnd_map_tx+265>: test   %rax,%rax
      0xffffffffa09cb3dc <kiblnd_map_tx+268>: jne    0xffffffffa09cb360 <kiblnd_map_tx+144>
      0xffffffffa09cb3de <kiblnd_map_tx+270>: mov    0x18(%rsi),%eax
      0xffffffffa09cb3e1 <kiblnd_map_tx+273>: jmp    0xffffffffa09cb371 <kiblnd_map_tx+161>
      0xffffffffa09cb3e3 <kiblnd_map_tx+275>: nopl   0x0(%rax,%rax,1)
      0xffffffffa09cb3e8 <kiblnd_map_tx+280>: mov    0x10(%rsi),%rax
      0xffffffffa09cb3ec <kiblnd_map_tx+284>: jmp    0xffffffffa09cb39e <kiblnd_map_tx+206>
      0xffffffffa09cb3ee <kiblnd_map_tx+286>: xchg   %ax,%ax
      0xffffffffa09cb3f0 <kiblnd_map_tx+288>: mov    %rbx,%rsi
      0xffffffffa09cb3f3 <kiblnd_map_tx+291>: mov    %r12,%rdi
      0xffffffffa09cb3f6 <kiblnd_map_tx+294>: callq  0xffffffffa09be470 <kiblnd_find_rd_dma_mr>
      0xffffffffa09cb3fb <kiblnd_map_tx+299>: test   %rax,%rax
      0xffffffffa09cb3fe <kiblnd_map_tx+302>: je     0xffffffffa09cb492 <kiblnd_map_tx+450>
      0xffffffffa09cb404 <kiblnd_map_tx+308>: cmp    %rbx,0x80(%r14)
      0xffffffffa09cb40b <kiblnd_map_tx+315>: je     0xffffffffa09cb58e <kiblnd_map_tx+702>
      0xffffffffa09cb411 <kiblnd_map_tx+321>: mov    0x1c(%rax),%eax
      0xffffffffa09cb414 <kiblnd_map_tx+324>: mov    %eax,(%rbx)
      0xffffffffa09cb416 <kiblnd_map_tx+326>: xor    %r8d,%r8d
      0xffffffffa09cb419 <kiblnd_map_tx+329>: add    $0x48,%rsp
      0xffffffffa09cb41d <kiblnd_map_tx+333>: mov    %r8d,%eax
      0xffffffffa09cb420 <kiblnd_map_tx+336>: pop    %rbx
      0xffffffffa09cb421 <kiblnd_map_tx+337>: pop    %r12
      0xffffffffa09cb423 <kiblnd_map_tx+339>: pop    %r13
      0xffffffffa09cb425 <kiblnd_map_tx+341>: pop    %r14
      0xffffffffa09cb427 <kiblnd_map_tx+343>: pop    %r15
      0xffffffffa09cb429 <kiblnd_map_tx+345>: leaveq
      0xffffffffa09cb42a <kiblnd_map_tx+346>: retq
      0xffffffffa09cb42b <kiblnd_map_tx+347>: nopl   0x0(%rax,%rax,1)
      0xffffffffa09cb430 <kiblnd_map_tx+352>: mov    (%rdi),%r15
      0xffffffffa09cb433 <kiblnd_map_tx+355>: test   %r15,%r15
      0xffffffffa09cb436 <kiblnd_map_tx+358>: je     0xffffffffa09cb596 <kiblnd_map_tx+710>
      0xffffffffa09cb43c <kiblnd_map_tx+364>: mov    0x1c0(%r15),%rcx
      0xffffffffa09cb443 <kiblnd_map_tx+371>: test   %rcx,%rcx
      0xffffffffa09cb446 <kiblnd_map_tx+374>: mov    %rcx,-0x58(%rbp)
      0xffffffffa09cb44a <kiblnd_map_tx+378>: je     0xffffffffa09cb596 <kiblnd_map_tx+710>
      0xffffffffa09cb450 <kiblnd_map_tx+384>: mov    -0x44(%rbp),%r9d
      0xffffffffa09cb454 <kiblnd_map_tx+388>: test   %r9d,%r9d
      0xffffffffa09cb457 <kiblnd_map_tx+391>: jle    0xffffffffa09cb476 <kiblnd_map_tx+422>
      0xffffffffa09cb459 <kiblnd_map_tx+393>: mov    %r13,%rax
      0xffffffffa09cb45c <kiblnd_map_tx+396>: xor    %edx,%edx
      0xffffffffa09cb45e <kiblnd_map_tx+398>: xchg   %ax,%ax
      0xffffffffa09cb460 <kiblnd_map_tx+400>: add    $0x1,%edx
      0xffffffffa09cb463 <kiblnd_map_tx+403>: mov    %rax,%rdi
      0xffffffffa09cb466 <kiblnd_map_tx+406>: mov    %edx,-0x60(%rbp)
      0xffffffffa09cb469 <kiblnd_map_tx+409>: callq  0xffffffff8129c450 <sg_next>
      0xffffffffa09cb46e <kiblnd_map_tx+414>: mov    -0x60(%rbp),%edx
      
      crash> bt -f                                                                             
      PID: 20168  TASK: ffff88084585aab0  CPU: 23  COMMAND: "mdt03_049"                        
       #0 [ffff880847f514b0] machine_kexec at ffffffff8103b71b
          ffff880847f514b8: 00000000030a1000 ffff8800030a1000
          ffff880847f514c8: 00000000030a0000 ffff880847f51818
          ffff880847f514d8: 8800000000000000 000000000000ffff
          ffff880847f514e8: ffff880847f51818 ffff880847f51518
          ffff880847f514f8: 0000000000000009 ffff88084585aab0
          ffff880847f51508: ffff880847f515d8 ffffffff810c9942                                  
       #1 [ffff880847f51510] crash_kexec at ffffffff810c9942
          ffff880847f51518: ffff880c7980c090 ffffc900175fb240
          ffff880847f51528: ffff88046e536000 ffff880c78a28940
          ffff880847f51538: ffff880847f51940 ffff88046e524000
          ffff880847f51548: 0000000000000000 0000000000001000
          ffff880847f51558: 0000000000000301 ffffea003958cb98
          ffff880847f51568: 0000000000000000 0000000000000000
          ffff880847f51578: 0000000000000101 ffffc900175fb240
          ffff880847f51588: 0000000000000000 ffffffffffffffff                                  
          ffff880847f51598: ffffffff8129c452 0000000000000010
          ffff880847f515a8: 0000000000010246 ffff880847f518c8
          ffff880847f515b8: 0000000000000018 ffff880847f51618
          ffff880847f515c8: 0000000000000246 ffff880847f51818
          ffff880847f515d8: ffff880847f51608 ffffffff8152f070                                  
       #2 [ffff880847f515e0] oops_end at ffffffff8152f070                                      
          ffff880847f515e8: 0000000000000000 ffff880847f51818
          ffff880847f515f8: 0000000000000000 0000000000000009
          ffff880847f51608: ffff880847f51658 ffffffff8104c80b                                  
       #3 [ffff880847f51610] no_context at ffffffff8104c80b                                    
          ffff880847f51618: ffffffff81531036 000000000000000a
          ffff880847f51628: ffff88047fe9be00 0000000000000000
          ffff880847f51638: 0000000000000000 ffff880847f51818
          ffff880847f51648: ffff88084585aab0 0000000000030001
          ffff880847f51658: ffff880847f516a8 ffffffff8104ca95                                  
       #4 [ffff880847f51660] __bad_area_nosemaphore at ffffffff8104ca95
          ffff880847f51668: ffffffff8134382e ffff88047fe9be00
          ffff880847f51678: 0000000000000000 0000000000000028
          ffff880847f51688: 0000000000000000 0000000000000000
          ffff880847f51698: ffffc900175fb240 ffff88084585aab0
          ffff880847f516a8: ffff880847f516b8 ffffffff8104cb63                                  
       #5 [ffff880847f516b0] bad_area_nosemaphore at ffffffff8104cb63
          ffff880847f516b8: ffff880847f517d8 ffffffff8104d25c                                  
       #6 [ffff880847f516c0] __do_page_fault at ffffffff8104d25c
          ffff880847f516c8: 000000000001da70 000000000000004e
          ffff880847f516d8: ffff880847f51818 0000000000000000
          ffff880847f516e8: 0000000000000068 0000000000000000
          ffff880847f516f8: ffffffff8100bb8e ffff880847f51810
          ffff880847f51708: 0000000000000000 0000000000000001
          ffff880847f51718: ffffffff816490a0 0000000000000000
          ffff880847f51728: 0000000000010500 0000000000001ba9
          ffff880847f51738: ffff880c8e540000 0000000000000046
          ffff880847f51748: 0000000000000246 ffffffffffffff10                                  
          ffff880847f51758: ffffffff81075d81 0000000000000010
          ffff880847f51768: 0000000000000246 ffff880847f51780
          ffff880847f51778: 0000000000000018 ffffc90017a2d032
          ffff880847f51788: 0000000000000246 ffffffff00000017
          ffff880847f51798: ffffffff8115c6dd 0000000000000010
          ffff880847f517a8: 0000000000000286 ffff880847f51818
          ffff880847f517b8: 0000000000000000 0000000000000000
          ffff880847f517c8: ffffc900175fb240 ffff880c7980c090
          ffff880847f517d8: ffff880847f51808 ffffffff81530fbe
       #7 [ffff880847f517e0] do_page_fault at ffffffff81530fbe
          ffff880847f517e8: 0000000000000001 ffff880c78a28940
          ffff880847f517f8: ffff88046e536000 ffffc900175fb240
          ffff880847f51808: ffff880847f51940 ffffffff8152e375
       #8 [ffff880847f51810] page_fault at ffffffff8152e375
          [exception RIP: sg_next+2]
          RIP: ffffffff8129c452  RSP: ffff880847f518c8  RFLAGS: 00010246
          RAX: 0000000000000000  RBX: ffff88046e524000  RCX: 0000000000000000
          RDX: 0000000000000101  RSI: ffffc900175fb240  RDI: 0000000000000000
          RBP: ffff880847f51940   R8: ffffea003958cb98   R9: 0000000000000301
          R10: 0000000000001000  R11: 0000000000000000  R12: ffff880c78a28940
          R13: ffff88046e536000  R14: ffffc900175fb240  R15: ffff880c7980c090
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                                       
          ffff880847f51818: ffff880c7980c090 ffffc900175fb240
          ffff880847f51828: ffff88046e536000 ffff880c78a28940
          ffff880847f51838: ffff880847f51940 ffff88046e524000
          ffff880847f51848: 0000000000000000 0000000000001000
          ffff880847f51858: 0000000000000301 ffffea003958cb98
          ffff880847f51868: 0000000000000000 0000000000000000
          ffff880847f51878: 0000000000000101 ffffc900175fb240
          ffff880847f51888: 0000000000000000 ffffffffffffffff
          ffff880847f51898: ffffffff8129c452 0000000000000010
          ffff880847f518a8: 0000000000010246 ffff880847f518c8
          ffff880847f518b8: 0000000000000018 ffff880847f51940
          ffff880847f518c8: ffffffffa09cb46e
       #9 [ffff880847f518c8] kiblnd_map_tx at ffffffffa09cb46e [ko2iblnd]
          ffff880847f518d0: ffff880847f51fd8 ffff88084585aab0
          ffff880847f518e0: ffffffff00000101 ffffffff81a9ac80
          ffff880847f518f0: ffff880c78a289c0 0000030100000001
          ffff880847f51900: ffff880847f51920 ffffffff8105e100
          ffff880847f51910: ffff880876a8d558 ffff880c796ffb80                                  <=== HERE rbx: ffff880c796ffb80
          ffff880847f51920: ffffc900175fb240 ffff88046e524000
          ffff880847f51930: 0000000000000000 ffff8810639ee640
          ffff880847f51940: ffff880847f519a0 ffffffffa09cbe0a                                  
      #10 [ffff880847f51948] kiblnd_setup_rd_iov at ffffffffa09cbe0a [ko2iblnd]                
          ffff880847f51950: ffff880800000378 ffff880c002ffef8                                  
          ffff880847f51960: ffff88046e53c000 ffffc90048a88000                                  
          ffff880847f51970: 0005000221056e78 ffff88105fff6200
          ffff880847f51980: 0000000000300270 ffffc900175fb240
          ffff880847f51990: 0005000221056e78 0000000000000001
          ffff880847f519a0: ffff880847f51a40 ffffffffa09d151a
      #11 [ffff880847f519a8] kiblnd_send at ffffffffa09d151a [ko2iblnd]
          ffff880847f519b0: 0000000000300270 ffff8810638aa000
          ffff880847f519c0: ffffc90048788488 0000000000000000
          ffff880847f519d0: ffff881000000001 000000006714f9c0                                  
          ffff880847f519e0: ffff8810639ee630 ffff880c796ffb80                                  
          ffff880847f519f0: 000000000000ec08 ffff88105fff6200                                  
          ffff880847f51a00: 0005000221056e78 0000000000003039                                  
          ffff880847f51a10: ffff880847f51a60 ffff880c796ffb80                                  
          ffff880847f51a20: ffff88105fff6200 ffff88105fff6200                                  <=== HERE r12: ffff88105fff6200
          ffff880847f51a30: 0000000000000000 0000000000050003                                  
          ffff880847f51a40: ffff880847f51a60 ffffffffa05e6d6b                                  
      #12 [ffff880847f51a48] lnet_ni_send at ffffffffa05e6d6b [lnet]                           
          ffff880847f51a50: ffff880c796ffb80 0000000000000000                                  
          ffff880847f51a60: ffff880847f51ad0 ffffffffa05eafa5                                  
      #13 [ffff880847f51a68] lnet_send at ffffffffa05eafa5 [lnet]                              
          ffff880847f51a70: ffff880847f51ac0 0000000000000000                                  
          ffff880847f51a80: 000500030a64b972 0005000221056e03                                  
          ffff880847f51a90: ffff880847f51ad0 ffff8804780fd180                                  
          ffff880847f51aa0: 000500030a64b972 ffff88105fff6200                                  
          ffff880847f51ab0: ffff8810639ee5c0 0000000000000003                                  
          ffff880847f51ac0: 00000000002c3f4b 0000000000000001                                  
          ffff880847f51ad0: ffff880847f51b30 ffffffffa05ec00a                                  
      #14 [ffff880847f51ad8] LNetPut at ffffffffa05ec00a [lnet]                                
          ffff880847f51ae0: 0000000147f51b30 0005000221056e03                                  
          ffff880847f51af0: 000500030a64b972 0000000000003039                                  
          ffff880847f51b00: ffff880847f51bd0 ffff880459fb2240                                  
          ffff880847f51b10: 0000000000000000 ffffc90048788108                                  
          ffff880847f51b20: 0000000000000023 0000000000000001                                  
          ffff880847f51b30: ffff880847f51be0 ffffffffa07fbc40                                  
      #15 [ffff880847f51b38] ptl_send_buf at ffffffffa07fbc40 [ptlrpc]                         
          ffff880847f51b40: 00055926b072ae94 00000001000000c0                                  
          ffff880847f51b50: 0000000000000000 ffffc90048788000                                  
          ffff880847f51b60: 0000000000000023 ffffffffa081b8b8                                  
          ffff880847f51b70: 0030027047f51b88 ffffc90048788070                                  
          ffff880847f51b80: ffffc90048788108 0000000100300270
          ffff880847f51b90: 0000000048788108 ffffc90048788000
          ffff880847f51ba0: 0000000000000023 ffffffffa083c445
          ffff880847f51bb0: ffff88105ac1ec00 ffff88105ac1ec00
          ffff880847f51bc0: ffffc90048788000 ffff880459fb2240
          ffff880847f51bd0: 0000000000000000 0000000000000001
          ffff880847f51be0: ffff880847f51c60 ffffffffa07fc22b
      #16 [ffff880847f51be8] ptlrpc_send_reply at ffffffffa07fc22b [ptlrpc]
          ffff880847f51bf0: ffff88100000000a 00055926b072ae94
          ffff880847f51c00: ffff8808000000c0 ffff8804573e5e40
          ffff880847f51c10: ffff8804573e5e58 ffff8804573e5e70
          ffff880847f51c20: ffff880847f51c70 ffff880850ce0a80
          ffff880847f51c30: ffff880847f51c30 ffff88105ac1ec00
          ffff880847f51c40: 0000000000000000 ffffffffa0f18580
          ffff880847f51c50: ffff88106aa76000 000000000000030c
          ffff880847f51c60: ffff880847f51c90 ffffffffa07c7054
      #17 [ffff880847f51c68] target_send_reply_msg at ffffffffa07c7054 [ptlrpc]
          ffff880847f51c70: ffff880847f51ce0 ffffc90048788000
          ffff880847f51c80: ffff88105ac1ec00 ffffffffa0f18580
          ffff880847f51c90: ffff880847f51d00 ffffffffa07c7576
      #18 [ffff880847f51c98] target_send_reply at ffffffffa07c7576 [ptlrpc]
          ffff880847f51ca0: ffff8810653a0880 0000000000001001
          ffff880847f51cb0: ffffc90048788108 0000000d0012ef0c
          ffff880847f51cc0: ffff880847f51ce0 00000000a0802b6c
          ffff880847f51cd0: ffffc9004027b9e8 ffff88105ac1ec00
          ffff880847f51ce0: ffff881062860000 ffffffffa0f18580
          ffff880847f51cf0: ffff88105ac1efa0 0000000000000000
          ffff880847f51d00: ffff880847f51d50 ffffffffa0ea2df9
      #19 [ffff880847f51d08] mdt_handle_common at ffffffffa0ea2df9 [mdt]
          ffff880847f51d10: ffff880847f51d30 ffffffff00000002
          ffff880847f51d20: ffff880847f51d60 ffff88105ac1ec00
          ffff880847f51d30: ffff880850ce0a80 ffff88106aa76000
          ffff880847f51d40: ffff881067ecd940 ffff88105ac1ef40
          ffff880847f51d50: ffff880847f51d60 ffffffffa0edf645
      #20 [ffff880847f51d58] mds_regular_handle at ffffffffa0edf645 [mdt]
          ffff880847f51d60: ffff880847f51e40 ffffffffa0811ee5
      #21 [ffff880847f51d68] ptlrpc_server_handle_request at ffffffffa0811ee5 [ptlrpc]
          ffff880847f51d70: ffff880847f51d80 ffffffffa053e4ce
          ffff880847f51d80: ffff880847f51da0 ffffffffa054f7d5
          ffff880847f51d90: ffff881067ecd940 ffff88106aa76000
          ffff880847f51da0: ffff880847f51e40 ffffffffa080a919
          ffff880847f51db0: ffff880847f51e00 ffffffff81057819
          ffff880847f51dc0: ffff88106af7bdc0 0000000300000000
          ffff880847f51dd0: ffff88106af7bdc0 ffff88106aa76080
          ffff880847f51de0: 0000000000000282 0000000000000014
          ffff880847f51df0: 0000000000000001 0000000000000282
          ffff880847f51e00: 0000000055928287 00000000000884c1
          ffff880847f51e10: ffff880847f51e40 ffff881067ecd940
          ffff880847f51e20: ffff88106aa76000 0000000000000040
          ffff880847f51e30: 0000000000000005 ffff880850ce0a80
          ffff880847f51e40: ffff880847f51ee0 ffffffffa081466d
      #22 [ffff880847f51e48] ptlrpc_main at ffffffffa081466d [ptlrpc]
          ffff880847f51e50: ffff880847f51e60 ffff88106aa76204
          ffff880847f51e60: ffff88106aa76080 0000000047f51fd8
          ffff880847f51e70: ffff880850ce0a80 ffff8810670b4800
          ffff880847f51e80: 000000004585aab0 ffff88106aa76068
          ffff880847f51e90: ffff88106aa76048 ffff881067ecd978
          ffff880847f51ea0: ffff8810670b4800 ffff88106aa76030
          ffff880847f51eb0: ffff880847f51ee0 ffff881062805780
          ffff880847f51ec0: ffff880847f51ef8 ffffffffa0813b80
          ffff880847f51ed0: ffff881067ecd940 ffff880479c40ab0
          ffff880847f51ee0: ffff880847f51f40 ffffffff8109e71e
      #23 [ffff880847f51ee8] kthread at ffffffff8109e71e
          ffff880847f51ef0: ffffffff00000000 5a5a5a5a00000000
          ffff880847f51f00: 5a5a5a5a00000000 ffff880847f51f08
          ffff880847f51f10: ffff880847f51f08 0000000000000000
          ffff880847f51f20: ffff881062805780 ffffffff81ebb4c8
          ffff880847f51f30: ffff880479c40ab0 0000000000000000
          ffff880847f51f40: ffff880479c47f40 ffffffff8100c20a
      #24 [ffff880847f51f48] kernel_thread at ffffffff8100c20a
      

      Could you help me to troubleshoot this issue? Is there anything I should look into the crash dump?

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              bruno.travouillon Bruno Travouillon (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: