Details
-
Task
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
Remove lu_ref infrastructure forever. This debugging infrastructure is often broken and doesn't coorespond with the actual reference counting used to manage object lifetimes. Hence, when a real bug is encountered (i.e. some thread isn't releasing a reference), this code (assuming it happens to be working) can't actually help debug the issue.
Recently, I was debugging an issue with ld_ref counting. Naturally, I turned to the debugging code available already in Lustre. I was dismayed to find that it was more broken than the code I was already debugging. Rather than debug the debugging code, I think it's better to cast it away.
Most compelling, the builds used by Maloo and Gerrit Janitor don't enable this feature. So it can be broken for long periods of time without anyone noticing.
For reference, this code just crashes on master:
[ 5057.262697] LustreError: 112475:0:(lu_ref.c:94:lu_ref_print()) lu_ref: 000000000e5de419 1 0 class_newdev:428
[ 5057.264613] LustreError: 112475:0:(lu_ref.c:96:lu_ref_print()) link: newdev 00000000341977c6
[ 5057.269695] LustreError: 112475:0:(lu_ref.c:137:lu_ref_fini()) ASSERTION( 0 ) failed:
[ 5057.271227] LustreError: 112475:0:(lu_ref.c:137:lu_ref_fini()) LBUG
...
[ 5057.275307] Call Trace:
[ 5057.275783] <TASK>
[ 5057.276194] dump_stack_lvl+0x55/0x7e
[ 5057.276895] lbug_with_loc+0x30/0x80 [libcfs]
[ 5057.277728] lu_ref_fini+0xc0/0x110 [obdclass]
[ 5057.278629] class_free_dev+0x32f/0x5f0 [obdclass]
[ 5057.279562] class_decref+0xdb/0x180 [obdclass]
[ 5057.280451] class_detach+0x28c/0x2d0 [obdclass]
[ 5057.281349] class_process_config+0xb7a/0x4560 [obdclass]
[ 5057.282393] ? ttwu_queue+0x38/0x150
[ 5057.283072] ? lov_tgts_putref+0x805/0x950 [lov]
[ 5057.283950] ? __kmalloc+0x19c/0x2e0
[ 5057.284647] ? class_manual_cleanup+0x2f0/0x870 [obdclass]
[ 5057.285704] class_manual_cleanup+0x518/0x870 [obdclass]
[ 5057.286729] lov_tgts_putref+0x816/0x950 [lov]
[ 5057.287570] lov_disconnect+0x21f/0x280 [lov]
[ 5057.288399] obd_disconnect+0xe6/0x2c0 [lustre]
[ 5057.289283] ll_put_super+0x3df/0xda0 [lustre]
[ 5057.290139] ? fsnotify_grab_connector+0x55/0x70
[ 5057.291016] ? fsnotify_destroy_marks+0x11/0x1f0
[ 5057.291881] ? __cond_resched+0x16/0x40
[ 5057.292609] ? __cond_resched+0x16/0x40
[ 5057.293336] ? evict_inodes+0x1b4/0x210
[ 5057.294059] generic_shutdown_super+0x74/0x120
[ 5057.294918] kill_anon_super+0x13/0x30
[ 5057.295632] deactivate_locked_super+0x47/0x90
[ 5057.296469] cleanup_mnt+0x11a/0x170
[ 5057.297155] task_work_run+0x6f/0xb0
[ 5057.297834] exit_to_user_mode_loop+0x115/0x150
[ 5057.298694] exit_to_user_mode_prepare+0x53/0xd0
[ 5057.299562] syscall_exit_to_user_mode+0x22/0x60
[ 5057.300427] do_syscall_64+0x52/0x90
[ 5057.301109] entry_SYSCALL_64_after_hwframe+0x61/0xcb
...
[ 5057.314700] Kernel panic - not syncing: LBUG