Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5169

Lustre client panic during MDS failover

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.5.1
    • None
    • Lustre servers: 2.4.3
      Lustre clients: 2.5.1
    • 2
    • 14248

    Description

      The setup is as follows:

      There are two filesystems: pfs2dat2 and pfs2wor2

      Clients:
      uc1n996
      uc1n997

      For pfs2dat2:
      MDS: pfs2n12/13
      OSS: pfs2n14/15

      For pfs2wor2:
      MDS: pfs2n16/17
      OSS: pfs2n18/19/20/21

      The two MDSes involved in failover were pfs2n12 and pds2n13. The client uc1n996 panicked with the following stack trace:
      last sysfs file:
      /sys/devices/system/cpu/online
      CPU 5
      Modules linked in: iptable_filter ip_tables
      nfs lockd fscache auth_rpcgss nfs_acl sunrpc lmv(U) fld(U) mgc(U) lustre(U)
      lov(U) osc(U) mdc(U) fid(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U)
      sha512_generic sha256_generic crc32c_intel libcfs(U) ib_ipoib rdma_ucm ib_ucm
      ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_multipath vhost_net
      macvtap macvlan tun kvm_intel kvm uinput microcode iTCO_wdt
      iTCO_vendor_support acpi_pad power_meter dcdbas sg mlx4_ib ib_sa ib_mad
      ib_core mlx4_en mlx4_core sb_edac edac_core lpc_ich mfd_core shpchp igb
      i2c_algo_bit i2c_core ixgbe dca ptp pps_core mdio xfs exportfs sd_mod
      crc_t10dif wmi ahci megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last
      unloaded: speedstep_lib]

      Pid: 2895, comm: ptlrpcd_rcv Not tainted
      2.6.32-431.11.2.el6.x86_64 #1 Dell Inc. PowerEdge R620/0PXXHP
      RIP: 0010:[<ffffffffa0708bde>]
      [<ffffffffa0708bde>] lustre_msg_get_opc+0xe/0x110 [ptlrpc]
      RSP: 0018:ffff88082b5ddc80 EFLAGS: 00010282
      RAX: ffff8800a585e208 RBX: 0000000000000000
      RCX: ffff8801a22893a0
      RDX: 0000000000000002 RSI: 0000000000000000
      RDI: 3237323033093932
      RBP: ffff88082b5ddc90 R08: 0000000000000000
      R09: 00000000fffffffc
      R10: 0000000000000002 R11: 0000000000000004
      R12: ffff8809421d7000
      R13: ffff8800a585e208 R14: 00000032a434f11a
      R15: ffff8801a22890c8
      FS: 0000000000000000(0000)
      GS:ffff88085c440000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0018 ES: 0018 CR0:
      000000008005003b
      CR2: 000000346b2727d0 CR3: 000000102a8e5000
      CR4: 00000000000407e0
      DR0: 0000000000000000 DR1: 0000000000000000
      DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0
      DR7: 0000000000000400
      Process ptlrpcd_rcv (pid: 2895, threadinfo
      ffff88082b5dc000, task ffff8808314deaa0)
      Stack:
      ffff88082b5ddc90 0000000000000000
      ffff88082b5ddcd0 ffffffffa08b6c2d
      <d> ffff880563411000 ffff8801a2289000
      ffff8801a2289000 ffff88102d915800
      <d> ffff8801a22892e0 00000032a434f11a
      ffff88082b5ddd00 ffffffffa06fd312
      Call Trace:
      [<ffffffffa08b6c2d>]
      mdc_replay_open+0xad/0x420 [mdc]
      [<ffffffffa06fd312>]
      ptlrpc_replay_interpret+0x142/0x740 [ptlrpc]
      [<ffffffffa06fe994>]
      ptlrpc_check_set+0x2c4/0x1b40 [ptlrpc]
      [<ffffffffa0729ebb>] ptlrpcd_check+0x53b/0x560
      [ptlrpc]
      [<ffffffffa072a3db>] ptlrpcd+0x20b/0x370
      [ptlrpc]
      [<ffffffff81065df0>] ?
      default_wake_function+0x0/0x20
      [<ffffffffa072a1d0>] ? ptlrpcd+0x0/0x370
      [ptlrpc]
      [<ffffffff8109aee6>] kthread+0x96/0xa0
      [<ffffffff8100c20a>] child_rip+0xa/0x20
      [<ffffffff8109ae50>] ? kthread+0x0/0xa0
      [<ffffffff8100c200>] ? child_rip+0x0/0x20
      Code: 24 48 48 83 c4 68 4c 89 e0 5b 41 5c 41
      5d 41 5e 41 5f c9 c3 45 31 e4 e9 26 ff ff ff 90 55 48 89 e5 53 48 83 ec 08 0f
      1f 44 00 00 <81> 7f 08 d3 0b d0 0b 48 89 fb 74 76 c7 05 fc 7e 0a 00 00 01 00
      RIP [<ffffffffa0708bde>]
      lustre_msg_get_opc+0xe/0x110 [ptlrpc]
      RSP <ffff88082b5ddc80>
      --[ end trace ee65cdcf6a61aa8a ]--

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              spimpale Swapnil Pimpale (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: