[LU-4272] lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed from lovsub_device_free Created: 19/Nov/13  Updated: 29/Aug/15  Resolved: 29/Aug/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: Niu Yawei (Inactive)
Resolution: Duplicate Votes: 0
Labels: nfs

Issue Links:
Related
is related to LU-6794 memory leak in Lustre NFS support cod... Resolved
is related to LU-2613 opening and closing file can generate... Resolved
Severity: 3
Rank (Obsolete): 11737

 Description   

Apparently we have a problem with inode cleaning on unmount exposed by at least nfs.

The easiest reproducer I have right now is this:

sh llmount.sh
echo 0 >/proc/sys/lnet/panic_on_lbug
service nfs start
mount localhost:/mnt/lustre /mnt/nfs -t nfs
touch /mnt/nfs/file
ls -l /mnt/lustre
cp -f /etc/passwd /mnt/nfs/file

umount /mnt/nfs
service nfs stop
sh llmountcleanup.sh

Reading the logs, It appears that after the write file dentry is deleted, but because we have locks on inode - it stays in place (unlike when we do not use nfs where it also goes away after sync).
Then on unmount file inode is found, but is considered dirty and not cleaned, meaning the layout lock and objects remain in place.
Now we get to kill_super and that blows up trying to clean up lov.



 Comments   
Comment by Peter Jones [ 28/Nov/13 ]

Niu

Is this related to your patch for LU-2613?

Peter

Comment by Andreas Dilger [ 28/Nov/13 ]

This might be a side effect of the LU-2613 patch landing.

Comment by Niu Yawei (Inactive) [ 29/Nov/13 ]

I can't reproduce it with current master. Oleg, did you test it with clean master?

Comment by Jodi Levi (Inactive) [ 12/Mar/14 ]

Has this been reproduced? Does the problem still exist or should this ticket be closed?

Comment by Oleg Drokin [ 25/Mar/14 ]

The problem still exists, I just tried all the steps on current master and it still fails in the exact same way

Comment by Niu Yawei (Inactive) [ 27/Mar/14 ]

It's strange, I tried again, but it still can't be reproduced in my test environment. Oleg, could you post the log here for analysis? Thank you.

Comment by Jodi Levi (Inactive) [ 23/May/14 ]

Oleg,
Is this problem still occurring? If so, would you be able to provide the logs for Niu?
Thank you!

Comment by Andreas Dilger [ 25/Nov/14 ]

Oleg, if you are no longer able to reproduce this, please close it with "Cannot Reproduce" to get it off the tracking list.

Comment by Joseph Gmitter (Inactive) [ 16/Jul/15 ]

Oleg is testing patch from LU-6794 to see if it resolves this issue. He will close this as a duplicate if it is resolved.

Comment by Oleg Drokin [ 29/Aug/15 ]

it was fixed by LU-6794 which is essentially the same patch.

Generated at Sat Feb 10 01:41:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.