[LU-7184] (lod_dev.c:1493:lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed: lu is ffff88010cf8a000 Created: 18/Sep/15 Updated: 01/Jun/16 Resolved: 14/Oct/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jeremy Filizetti | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | SSK, kerberos | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Setting the security flavor, similar to below causes an LBUG when the MDT is mounted again: /usr/lib64/lustre/tests/llmount.sh <4>Lustre: server umount lustre-MDT0000 complete |
| Comments |
| Comment by Jeremy Filizetti [ 18/Sep/15 ] |
|
Forgot to mention counter for ld_ref is 2: crash> struct lu_device 0xffff88010cf8a000
struct lu_device {
ld_ref = {
counter = 2
},
ld_type = 0xffffffffa0f12440,
ld_ops = 0xffffffffa0f08e40,
ld_site = 0xffff880118546098,
ld_proc_entry = 0x0,
ld_obd = 0xffff88011a186038,
ld_reference = {<No data fields>},
ld_linkage = {
next = 0xffff88010cf8a030,
prev = 0xffff88010cf8a030
}
}
|
| Comment by Oleg Drokin [ 18/Sep/15 ] |
|
So it looks like error path for either key or in osp_obd_connect or somewhere along the call chain forgets to release a reference to lu device. Somebody need to go through there and fidn the place and add the decref that is missing. |
| Comment by Joseph Gmitter (Inactive) [ 18/Sep/15 ] |
|
Hi John, |
| Comment by John Hammond [ 21/Sep/15 ] |
|
In progress. The references are from opd_last_used_oid_file and opd_last_used_seq_file. |
| Comment by John Hammond [ 22/Sep/15 ] |
|
Di, during MDT mount if osp_init() succeeds but lod_add_device() fails before adding the OSP device to the LOD then we hit this since the OSP device still holds references to two objects from the MDT site (opd_last_used_oid_file and opd_last_used_seq_file). Can the finding/creation of these two objects be moved out of osp_init0() and into some function later in the setup path? |
| Comment by Di Wang [ 22/Sep/15 ] |
|
It looks like osp_shutdown() is not being called in this case, since the OSP is not being added successfully. So it seems not just osp_last_used_fini(), neither osp_sync_fini() nor osp_precreate_fini() are executed either. So How about call ldo_process_config(env, next, CLEANUP); in lod_add_device() error handler path? |
| Comment by Gerrit Updater [ 24/Sep/15 ] |
|
John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/16635 |
| Comment by Gerrit Updater [ 14/Oct/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16635/ |
| Comment by Joseph Gmitter (Inactive) [ 14/Oct/15 ] |
|
Landed for 2.8 |