Details

    • Task
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Remove lu_ref infrastructure forever. This debugging infrastructure is often broken and doesn't coorespond with the actual reference counting used to manage object lifetimes. Hence, when a real bug is encountered (i.e. some thread isn't releasing a reference), this code (assuming it happens to be working) can't actually help debug the issue.

      Recently, I was debugging an issue with ld_ref counting. Naturally, I turned to the debugging code available already in Lustre. I was dismayed to find that it was more broken than the code I was already debugging. Rather than debug the debugging code, I think it's better to cast it away.

      Most compelling, the builds used by Maloo and Gerrit Janitor don't enable this feature. So it can be broken for long periods of time without anyone noticing.

      For reference, this code just crashes on master:

      [ 5057.262697] LustreError: 112475:0:(lu_ref.c:94:lu_ref_print()) lu_ref: 000000000e5de419 1 0 class_newdev:428
      [ 5057.264613] LustreError: 112475:0:(lu_ref.c:96:lu_ref_print())      link: newdev 00000000341977c6
      [ 5057.269695] LustreError: 112475:0:(lu_ref.c:137:lu_ref_fini()) ASSERTION( 0 ) failed: 
      [ 5057.271227] LustreError: 112475:0:(lu_ref.c:137:lu_ref_fini()) LBUG
      ...
      [ 5057.275307] Call Trace:
      [ 5057.275783]  <TASK>
      [ 5057.276194]  dump_stack_lvl+0x55/0x7e
      [ 5057.276895]  lbug_with_loc+0x30/0x80 [libcfs]
      [ 5057.277728]  lu_ref_fini+0xc0/0x110 [obdclass]
      [ 5057.278629]  class_free_dev+0x32f/0x5f0 [obdclass]
      [ 5057.279562]  class_decref+0xdb/0x180 [obdclass]
      [ 5057.280451]  class_detach+0x28c/0x2d0 [obdclass]
      [ 5057.281349]  class_process_config+0xb7a/0x4560 [obdclass]
      [ 5057.282393]  ? ttwu_queue+0x38/0x150
      [ 5057.283072]  ? lov_tgts_putref+0x805/0x950 [lov]
      [ 5057.283950]  ? __kmalloc+0x19c/0x2e0
      [ 5057.284647]  ? class_manual_cleanup+0x2f0/0x870 [obdclass]
      [ 5057.285704]  class_manual_cleanup+0x518/0x870 [obdclass]
      [ 5057.286729]  lov_tgts_putref+0x816/0x950 [lov]
      [ 5057.287570]  lov_disconnect+0x21f/0x280 [lov]
      [ 5057.288399]  obd_disconnect+0xe6/0x2c0 [lustre]
      [ 5057.289283]  ll_put_super+0x3df/0xda0 [lustre]
      [ 5057.290139]  ? fsnotify_grab_connector+0x55/0x70
      [ 5057.291016]  ? fsnotify_destroy_marks+0x11/0x1f0
      [ 5057.291881]  ? __cond_resched+0x16/0x40
      [ 5057.292609]  ? __cond_resched+0x16/0x40
      [ 5057.293336]  ? evict_inodes+0x1b4/0x210
      [ 5057.294059]  generic_shutdown_super+0x74/0x120
      [ 5057.294918]  kill_anon_super+0x13/0x30
      [ 5057.295632]  deactivate_locked_super+0x47/0x90
      [ 5057.296469]  cleanup_mnt+0x11a/0x170
      [ 5057.297155]  task_work_run+0x6f/0xb0
      [ 5057.297834]  exit_to_user_mode_loop+0x115/0x150
      [ 5057.298694]  exit_to_user_mode_prepare+0x53/0xd0
      [ 5057.299562]  syscall_exit_to_user_mode+0x22/0x60
      [ 5057.300427]  do_syscall_64+0x52/0x90
      [ 5057.301109]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
      ...
      [ 5057.314700] Kernel panic - not syncing: LBUG

      Attachments

        Activity

          People

            timday Tim Day
            timday Tim Day
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: