[LU-2213] sanity-scrub.sh test_10b: osd_scrub_cleanup()) ASSERTION( dev->od_otable_it == ((void *)0) ) failed Created: 20/Oct/12  Updated: 19/Apr/13  Resolved: 29/Oct/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Andreas Dilger Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Single-node test configuration (dual-core x86_64, 1 MDT, 3 OST)


Severity: 3
Rank (Obsolete): 5271

 Description   

I recently hit this problem in running sanity-scrub.sh:

LustreError: 140-5: Server testfs-MDT0000 requested index 0, but that index is already in use. Use --writeconf to force
mgs_write_log_target()) Can't get index (-98)
mgs_handle_target_reg()) Failed to write testfs-MDT0000 log (-98)
erver_register_target()) Cannot talk to the MGS: -98, not fatal
LustreError: 32638:0:(osd_scrub.c:1122:osd_scrub_cleanup()) ASSERTION( dev->od_otable_it == ((void *)0) ) failed

Pid: 32638, comm: umount
Call Trace:
[<ffffffffa08fb905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa08fbf17>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa0fc096f>] osd_scrub_cleanup+0xdf/0xe0 [osd_ldiskfs]
[<ffffffffa0f9d323>] osd_shutdown+0x33/0x110 [osd_ldiskfs]
[<ffffffffa0fa9ff5>] osd_process_config+0x165/0x1b0 [osd_ldiskfs]
[<ffffffffa0d97611>] lod_process_config+0x451/0xa70 [lod]
[<ffffffffa0ed9ac0>] mdd_process_config+0x210/0x7e0 [mdd]
[<ffffffffa1027272>] mdt_stack_fini+0x172/0xbf0 [mdt]
[<ffffffffa1027fb7>] mdt_device_fini+0x2c7/0x510 [mdt]
[<ffffffffa0a8d4c7>] class_cleanup+0x577/0xdc0 [obdclass]
[<ffffffffa0a8edb5>] class_process_config+0x10a5/0x1ca0 [obdclass]
[<ffffffffa0a8fb29>] class_manual_cleanup+0x179/0x6f0 [obdclass]
[<ffffffffa0a9d0ac>] server_put_super+0x61c/0x1300 [obdclass]
[<ffffffff8117d34b>] generic_shutdown_super+0x5b/0xe0
[<ffffffff8117d436>] kill_anon_super+0x16/0x60
[<ffffffffa0a919a6>] lustre_kill_super+0x36/0x60 [obdclass]
[<ffffffff8117e4b0>] deactivate_super+0x70/0x90
[<ffffffff8119a4ff>] mntput_no_expire+0xbf/0x110
[<ffffffff8119af9b>] sys_umount+0x7b/0x3a0

Alex's patch in http://review.whamcloud.com/4217 to be landed was created for LU-2033, but since that bug was closed and actually related to a separate issue, I'd rather file a new bug instead of re-opening that one. That patch works around the duplicate index==0 issue by resetting the filesystem label after formatting (to clear the "VIRGIN" flag), though my preference would be if the MDT itself detected that it had been restored from backup and reset the label internally. At least the proposed solution will also work for older versions of Lustre as well, so a single restore procedure can be documented, so I'm not dead-set against this part of the patch.

The osd_scrub_cleanup() assertion is also addressed by Alex's patch, but Fan Yong rightfully objected to that fix because it still implies that the scrub thread is running when the MDT is being stopped, so there is some other cleanup/serialization needed.



 Comments   
Comment by Alex Zhuravlev [ 21/Oct/12 ]

right, so the problem should be fixed by a correct sequence of ->ldo_process_config(LCFG_CLEANUP) in MDD and OSD.

Comment by nasf (Inactive) [ 22/Oct/12 ]

The root reason is that, the LFSCK should be stopped before osd_shutdown. But currently it is not.

This is the patch:
http://review.whamcloud.com/#change,4217,set3

Comment by Alex Zhuravlev [ 29/Oct/12 ]

landed

Generated at Sat Feb 10 01:23:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.