[LU-7546] conf-sanity conf-sanity: lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) Created: 11/Dec/15 Updated: 27/Oct/17 Resolved: 18/Dec/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
This issue was created by maloo for wangdi <di.wang@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/b123def0-a00b-11e5-ae0a-5254006e85c2. The sub-test conf-sanity failed with the following error: 03:27:23:Lustre: DEBUG MARKER: umount -d -f /mnt/mds1 03:27:23:LustreError: 10651:0:(client.c:1130:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff88006ed629c0 x1520238940666796/t0(0) o13->lustre-OST0001-osc-MDT0000@10.1.4.239@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 03:27:23:LustreError: 10651:0:(client.c:1130:ptlrpc_import_delay_req()) Skipped 7 previous similar messages 03:27:23:LustreError: 30453:0:(osp_object.c:587:osp_attr_get()) lustre-MDT0001-osp-MDT0000:osp_attr_get update error [0x240000402:0x1:0x0]: rc = -5 03:27:23:LustreError: 30453:0:(osp_object.c:587:osp_attr_get()) Skipped 5 previous similar messages 03:27:23:Lustre: lustre-MDT0000: Not available for connect from 10.1.4.239@tcp (stopping) 03:27:23:Lustre: Skipped 7 previous similar messages 03:27:23:LustreError: 10651:0:(client.c:1130:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff88006ed629c0 x1520238940666828/t0(0) o13->lustre-OST0005-osc-MDT0000@10.1.4.239@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 03:27:23:LustreError: 10651:0:(client.c:1130:ptlrpc_import_delay_req()) Skipped 4 previous similar messages 03:27:23:Lustre: lustre-MDT0000: Not available for connect from 10.1.4.244@tcp (stopping) 03:27:23:Lustre: Skipped 2 previous similar messages 03:27:23:LustreError: 10652:0:(client.c:1130:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880044a39680 x1520238940666836/t0(0) o13->lustre-OST0007-osc-MDT0000@10.1.4.239@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 03:27:23:LustreError: 10636:0:(lod_dev.c:1578:lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed: lu is ffff880070784000 03:27:23:LustreError: 10636:0:(lod_dev.c:1578:lod_device_free()) LBUG Please provide additional information about the failure here. Info required for matching: conf-sanity conf-sanity |
| Comments |
| Comment by Di Wang [ 14/Dec/15 ] |
|
Hmm, this assertion happened again in the test of COS patch https://testing.hpdd.intel.com/test_sets/7f0570ee-a242-11e5-afd0-5254006e85c2 So this failure might be related with COS patch. |
| Comment by Lai Siyao [ 16/Dec/15 ] |
|
https://testing.hpdd.intel.com/test_sets/7f0570ee-a242-11e5-afd0-5254006e85c2 shows it's a single MDS test, so I doubt this is CoS related (because on such system CoS is not enabled/active). |
| Comment by James Nunez (Inactive) [ 16/Dec/15 ] |
|
More instances on master for |
| Comment by James Nunez (Inactive) [ 17/Dec/15 ] |
|
Some of these failures have part of the error message cut off. Only seen for 09:01:15:LustreError: 10624:0:(client.c:1130:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff88005b5300c0 x1520711352535976/t0(0) o13->lustre-OST0007-osc-MDT0000@10.1.4.239@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 09:01:15:LustreError: 10624:0:(client.c:1130:ptlrpc_import_delay_req()) Skipped 1 previous similar message 09:01:15:LustreError: 10609:0:(lod_dev.c:1578:lod_device_free()) ASSERTIONInitializing cgroup subsys cpuset 09:01:15:Initializing cgroup subsys cpu 2015-12-16 14:42:37 - sanity-quota test_7c - https://testing.hpdd.intel.com/test_sets/07a4d22e-a427-11e5-8701-5254006e85c2 |
| Comment by Di Wang [ 17/Dec/15 ] |
|
It seems few mdt_reint_xxx changes did not release the object in the error handler path, which cause this assertion for example @@ -1926,20 +2038,20 @@ static int mdt_reint_rename_internal(struct mdt_thread_info *info,
lh_newp = &info->mti_lh[MDT_LH_NEW];
mdt_lock_reg_init(lh_newp, LCK_EX);
- rc = mdt_object_lock(info, mnew, lh_newp,
- MDS_INODELOCK_LOOKUP |
- MDS_INODELOCK_UPDATE);
+ rc = mdt_reint_object_lock(info, mnew, lh_newp,
+ MDS_INODELOCK_LOOKUP |
+ MDS_INODELOCK_UPDATE,
+ cos_incompat);
if (rc != 0)
GOTO(out_unlock_old, rc);
/* get and save version after locking */
mdt_version_get_save(info, mnew, 3);
- } else if (rc != -EREMOTE && rc != -ENOENT) {
- GOTO(out_put_old, rc);
+ } else if (rc2 != -EREMOTE && rc2 != -ENOENT) {
+ GOTO(out_unlock_parents, rc = rc2); --> this should be out_put_old, instead of out_unlock_parents
} else {
lh_oldp = &info->mti_lh[MDT_LH_OLD];
mdt_lock_reg_init(lh_oldp, LCK_EX);
-
lock_ibits = MDS_INODELOCK_LOOKUP | MDS_INODELOCK_XATTR;
if (mdt_object_remote(msrcdir)) {
/* Enqueue lookup lock from the parent MDT */
I will update the patch soon |
| Comment by Lai Siyao [ 18/Dec/15 ] |
|
Di, thanks, I'll update the patch later. |
| Comment by Di Wang [ 18/Dec/15 ] |
|
Since the reason has been found, let's close this ticket for now, and the fix will be included in |