Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17887

Client crash in RCU when unmounting all FSs and unloading modules in a raw

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      On our fat Numa nodes, problem/crash occurs randomly but quite often, and the stack-trace looks always like following :

      [246065.156962] LustreError: 789506:0:(class_obd.c:839:obdclass_exit()) obd_memory max: 16059942, leaked: 40
      ................
      [246065.484593] Call trace:
      [246065.487180]  percpu_counter_add_batch+0x2c/0x188
      [246065.491987]  0xffffc0a0cf97869c
      [246065.495282]  rcu_do_batch+0x184/0x450
      [246065.499120]  rcu_core+0x174/0x3e8
      [246065.502593]  rcu_core_si+0x18/0x38
      [246065.506156]  __do_softirq+0x134/0x460
      [246065.509988]  ____do_softirq+0x18/0x38
      [246065.513818]  call_on_irq_stack+0x24/0x30
      [246065.517914]  do_softirq_own_stack+0x24/0x48
      [246065.522275]  irq_exit_rcu+0xa0/0xf8
      [246065.525930]  el1_interrupt+0x4c/0xd0
      [246065.529674]  el1h_64_irq_handler+0x18/0x38
      [246065.533947]  el1h_64_irq+0x7c/0x80
      [246065.537508]  cpuidle_enter_state+0xd4/0x828
      [246065.541870]  cpuidle_enter+0x40/0x70
      [246065.545615]  cpuidle_idle_call+0x150/0x208
      [246065.549892]  do_idle+0xa8/0x128
      [246065.553188]  cpu_startup_entry+0x3c/0x50
      [246065.557282]  secondary_start_kernel+0xf0/0x150
      [246065.561911]  __secondary_switched+0xb8/0xc0
      ..................
      [246065.600422] Kernel panic - not syncing: Oops: Fatal exception in interrupt
      [246065.607542] SMP: stopping secondary CPUs

      interesting is the fact there seems to always be this same preceeding message claiming that some OBD memory has been leaked, and apparently always as a multiple of 40 bytes...

      Attachments

        Activity

          People

            bfaccini-nvda Bruno Faccini
            bfaccini-nvda Bruno Faccini
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: