Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7448

sleeping under spinlock somewhere in nrs/tbf code

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Minor Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

      I got this tell tale complaint in sanityn test 77e:

      .781807] BUG: spinlock wrong CPU on CPU#1, lctl/29799 (Not tainted)
      .782638]  lock: ffff880055277d08, .magic: dead4ead, .owner: lctl/29799, .owner_c
      .784053] Pid: 29799, comm: lctl Not tainted 2.6.32-rhe6.7-debug #1
      .784828] Call Trace:
      .785635]  [<ffffffff812a06fa>] ? spin_bug+0xaa/0x100
      .786402]  [<ffffffff812a07c6>] ? _raw_spin_unlock+0x76/0xa0
      .787184]  [<ffffffff81530afe>] ? _spin_unlock+0xe/0x10
      .787989]  [<ffffffffa153faa4>] ? nrs_policy_ctl+0xd4/0x2e0 [ptlrpc]
      .788835]  [<ffffffffa15414f2>] ? ptlrpc_nrs_policy_control+0xe2/0x2a0 [ptlrpc]
      .790282]  [<ffffffffa1522876>] ? ptlrpc_lprocfs_nrs_seq_write+0x3e6/0x600 [ptlrp
      .791762]  [<ffffffffa1522490>] ? ptlrpc_lprocfs_nrs_seq_write+0x0/0x600 [ptlrpc]
      .794101]  [<ffffffff811ff945>] ? proc_reg_write+0x85/0xc0
      .794872]  [<ffffffff81192f48>] ? vfs_write+0xb8/0x1a0
      .795616]  [<ffffffff811943f6>] ? fget_light_pos+0x16/0x50
      .796356]  [<ffffffff81193881>] ? sys_write+0x51/0xb0
      .797114]  [<ffffffff815312ee>] ? do_device_not_available+0xe/0x10
      .797878]  [<ffffffff8100b112>] ? system_call_fastpath+0x16/0x1b
      

      What this means is that something in nrs_policy_ctl() slept while holding nrs->nrs_lock to the point that when it woke up, it was rescheduled on a different cpu.

      I only running rhel6 at the moment so this is the best I got and I do not see any obvious culprit right away.
      Probably need to rerun with rhel7 where it actually catches offenders much better.

            emoly.liu Emoly Liu
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: