Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5751

misconfiguration crashes cfs_cpt_set_node

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.7.0
    • Lustre 2.7.0
    • None
    • 3
    • 16147

    Description

      We are experimenting with various configurations and usage of Lustre on a UV system, and thought we'd try stuffing all things Lustre onto the node adjacent to the ib card with a single partition on node 3.

      <1>[96791.728121] BUG: unable to handle kernel NULL pointer dereference at (null)
      <1>[96791.736878] IP: [<ffffffff81260485>] memcpy+0x5/0x120
      <4>[96791.742555] PGD 13ced2dc067 PUD 13ced2db067 PMD 0
      <0>[96791.747929] Oops: 0000 1 SMP

      Stack traceback for pid 33047
      0xffff88bde41ec380 33047 33046 1 20 R 0xffff88bde41ec9f0 *modprobe
      [<ffffffff81260485>] memcpy+0x5/0x120
      [<ffffffffa0eb5929>] cfs_cpt_set_node+0xf9/0x120 [libcfs]
      [<ffffffffa0eb796e>] cfs_cpt_table_create_pattern+0x19e/0x6a0 [libcfs]
      [<ffffffffa0eb88f5>] cfs_cpu_init+0x175/0x4c0 [libcfs]
      [<ffffffffa0ec0aeb>] init_libcfs_module+0x9b/0x3b0 [libcfs]
      [<ffffffff810001cb>] do_one_initcall+0x3b/0x180
      [<ffffffff810a126f>] sys_init_module+0xcf/0x240
      [<ffffffff8146a012>] system_call_fastpath+0x16/0x1b
      [<00007ffff7b413aa>] 0x7ffff7b413aa
      r15 = 0xffff893cc628e459 r14 = 0xffff893cbddec7c0
      r13 = 0x0000000000000000 r12 = 0xffff893cbddec7c0
      bp = 0x000000000000001e bx = 0xffff893de526ea00
      r11 = 0x0000000000000000 r10 = 0x0000000000000025
      r9 = 0x000000000000000a r8 = 0x000000000000000a
      ax = 0xffff893de526ea00 cx = 0x0000000000000018
      dx = 0x0000000000000018 si = 0x0000000000000000
      di = 0xffff893de526ea00 orig_ax = 0xffffffffffffffff
      ip = 0xffffffff81260485 cs = 0x0000000000000010
      flags = 0x0000000000010206 sp = 0xffff88bcd95f1e80
      ss = 0x0000000000000018 &regs = 0xffff88bcd95f1de8

      The trigger:
      /etc/modprobe.d/Lustre.conf :
      options lnet accept_port=50 networks=o2ib0(ib0)
      options ptlrpc ptlrpcd_bind_policy=4
      options libcfs cpu_pattern="N 0[30-39]"

      This system only has 10 NUMA nodes, so this is a mis configuration. That should be "N 0[3]" for my desired effect. But the error handing could be improved.

      Attachments

        Activity

          People

            liang Liang Zhen (Inactive)
            schamp Stephen Champion
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: