Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.0
-
None
-
Single-node test configuration (dual-core x86_64, 1 MDT, 3 OST)
-
3
-
5271
Description
I recently hit this problem in running sanity-scrub.sh:
LustreError: 140-5: Server testfs-MDT0000 requested index 0, but that index is already in use. Use --writeconf to force mgs_write_log_target()) Can't get index (-98) mgs_handle_target_reg()) Failed to write testfs-MDT0000 log (-98) erver_register_target()) Cannot talk to the MGS: -98, not fatal LustreError: 32638:0:(osd_scrub.c:1122:osd_scrub_cleanup()) ASSERTION( dev->od_otable_it == ((void *)0) ) failed Pid: 32638, comm: umount Call Trace: [<ffffffffa08fb905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa08fbf17>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0fc096f>] osd_scrub_cleanup+0xdf/0xe0 [osd_ldiskfs] [<ffffffffa0f9d323>] osd_shutdown+0x33/0x110 [osd_ldiskfs] [<ffffffffa0fa9ff5>] osd_process_config+0x165/0x1b0 [osd_ldiskfs] [<ffffffffa0d97611>] lod_process_config+0x451/0xa70 [lod] [<ffffffffa0ed9ac0>] mdd_process_config+0x210/0x7e0 [mdd] [<ffffffffa1027272>] mdt_stack_fini+0x172/0xbf0 [mdt] [<ffffffffa1027fb7>] mdt_device_fini+0x2c7/0x510 [mdt] [<ffffffffa0a8d4c7>] class_cleanup+0x577/0xdc0 [obdclass] [<ffffffffa0a8edb5>] class_process_config+0x10a5/0x1ca0 [obdclass] [<ffffffffa0a8fb29>] class_manual_cleanup+0x179/0x6f0 [obdclass] [<ffffffffa0a9d0ac>] server_put_super+0x61c/0x1300 [obdclass] [<ffffffff8117d34b>] generic_shutdown_super+0x5b/0xe0 [<ffffffff8117d436>] kill_anon_super+0x16/0x60 [<ffffffffa0a919a6>] lustre_kill_super+0x36/0x60 [obdclass] [<ffffffff8117e4b0>] deactivate_super+0x70/0x90 [<ffffffff8119a4ff>] mntput_no_expire+0xbf/0x110 [<ffffffff8119af9b>] sys_umount+0x7b/0x3a0
Alex's patch in http://review.whamcloud.com/4217 to be landed was created for LU-2033, but since that bug was closed and actually related to a separate issue, I'd rather file a new bug instead of re-opening that one. That patch works around the duplicate index==0 issue by resetting the filesystem label after formatting (to clear the "VIRGIN" flag), though my preference would be if the MDT itself detected that it had been restored from backup and reset the label internally. At least the proposed solution will also work for older versions of Lustre as well, so a single restore procedure can be documented, so I'm not dead-set against this part of the patch.
The osd_scrub_cleanup() assertion is also addressed by Alex's patch, but Fan Yong rightfully objected to that fix because it still implies that the scrub thread is running when the MDT is being stopped, so there is some other cleanup/serialization needed.