[LU-5708] Cannot get rid of orphaned objects Created: 06/Oct/14  Updated: 06/Jun/15  Resolved: 06/Jun/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.3
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Oliver Mangold Assignee: Dmitry Eremin (Inactive)
Resolution: Incomplete Votes: 0
Labels: None

Attachments: File lfsck.log    
Issue Links:
Related
is related to LU-6414 du and df disagree for used space Resolved
Severity: 3
Rank (Obsolete): 15997

 Description   

After some testing and benchmarking of a fresh filesystem, we found 1.4TB worth of orphaned objects. What we did was mostly several runs of tar (kernel tree extraction) and IOR benchmark.

After deleting all temporary files from these tests, we end up with a Lustre containing just 108k of files:

[root@n0101 ~]# du -sh /mnt/lustre/lnec/
108K /mnt/lustre/lnec/

but according to df, 1..4TB are in use:

[root@n0101 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
...
10.4.0.102@o2ib:10.4.0.101@o2ib:/lnec 175T 1.4T 174T 1% /mnt/lustre/lnec

Mounting one of the osts as ldiskfs and checking the contents we find that it contains 228GB worth of objects:

oss01:ost0# du -sh --total O/*
85G O/0
136K O/1
136K O/10
144G O/2
136K O/200000003
228G total

We have 6 osts in total and all of them are in a comparable state, so we add up to 1.4TB of objects.

Trying 'lctl lfsck_start' on the MDT or OSTs doesn't change that.



 Comments   
Comment by Peter Jones [ 06/Oct/14 ]

Dmitry

Could you please help with this issue?

Thanks

Peter

Comment by Oliver Mangold [ 06/Oct/14 ]

Maybe I should mention, we did several runs of obdfilter-survey. Could it be, this is the reason for the orphaned objects?

Comment by Dmitry Eremin (Inactive) [ 06/Oct/14 ]

The used size is consumed by journal and directories size after creation of many files in OSTs. For example, After formatting a new Lustre file system you can see the following:

# ls -la /tmp/lustre-*
-rw------- 1 root root 204800000 Oct  6 20:00 /tmp/lustre-mdt1
-rw------- 1 root root 204800000 Oct  6 20:02 /tmp/lustre-ost1
-rw------- 1 root root 204800000 Oct  6 20:00 /tmp/lustre-ost2
# df -h
Filesystem        Size  Used Avail Use% Mounted on
/dev/loop1        147M   18M  120M  13% /mnt/mds1
/dev/loop2        184M   26M  148M  15% /mnt/ost1
/dev/loop3        184M   26M  148M  15% /mnt/ost2
vbox@tcp:/lustre  367M   51M  296M  15% /mnt/lustre

# mount -t ldiskfs -o loop /tmp/lustre-ost1 /mnt/ost1
# du -sh --total O/*
136K	O/0
136K	O/1
136K	O/10
136K	O/200000003
544K	total

Also you can check a directory size after creation/removing many files.

# mkdir test
# du -sh test
4.0K    test
# for i in $(seq 1 1000); do touch test/$i; done
# du -sh test
20K     test
# rm -rf test/*
# du -sh test
20K     test
Comment by Oliver Mangold [ 07/Oct/14 ]

@Dmitry: I don't understand. What's your point? That journals and stuff take a few MB, even on an empty filesystem?

I lost 1.4TB and it's definitely from files on the OSTs, apparently all with sizes a multiple of 384MB:

oss01:~# ls -lh /mnt/lustre/ost0/O/0/d*/*
rw-rw-rw 1 500 nagiocmd 768M Oct 1 23:31 /mnt/lustre/ost0/O/0/d0/4364448
rw-rw-rw 1 500 nagiocmd 384M Oct 1 23:41 /mnt/lustre/ost0/O/0/d0/4364480
rw-rw-rw 1 500 nagiocmd 768M Oct 2 00:26 /mnt/lustre/ost0/O/0/d0/4364864
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d0/4366816
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d0/4367200
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d0/4367232
rw-rw-rw 1 500 nagiocmd 3.0G Oct 1 23:21 /mnt/lustre/ost0/O/0/d10/4364426
rw-rw-rw 1 500 nagiocmd 384M Oct 1 23:41 /mnt/lustre/ost0/O/0/d10/4364458
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d10/4366794
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d10/4367210
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d1/1
rw-rw-rw 1 500 nagiocmd 3.0G Oct 1 23:45 /mnt/lustre/ost0/O/0/d11/4364491
rw-rw-rw 1 500 nagiocmd 768M Oct 2 00:05 /mnt/lustre/ost0/O/0/d11/4364779
rw-rw-rw 1 500 nagiocmd 768M Oct 2 00:21 /mnt/lustre/ost0/O/0/d11/4364843
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d11/4366795
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d11/4367211
rw-rw-rw 1 500 nagiocmd 3.0G Oct 1 23:21 /mnt/lustre/ost0/O/0/d12/4364428
rw-rw-rw 1 500 nagiocmd 384M Oct 1 23:41 /mnt/lustre/ost0/O/0/d12/4364460
rw-rw-rw 1 500 nagiocmd 3.0G Oct 1 23:45 /mnt/lustre/ost0/O/0/d12/4364492
rw-rw-rw 1 500 nagiocmd 768M Oct 2 00:05 /mnt/lustre/ost0/O/0/d12/4364780
rw-rw-rw 1 500 nagiocmd 768M Oct 2 00:21 /mnt/lustre/ost0/O/0/d12/4364844
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d12/4366796
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d12/4367212
rw-rw-rw 1 500 nagiocmd 3.0G Oct 1 23:45 /mnt/lustre/ost0/O/0/d13/4364493
rw-rw-rw 1 500 nagiocmd 768M Oct 2 00:05 /mnt/lustre/ost0/O/0/d13/4364781
rw-rw-rw 1 500 nagiocmd 768M Oct 2 00:21 /mnt/lustre/ost0/O/0/d13/4364845
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d13/4366797
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d13/4367213
rw-rw-rw 1 500 nagiocmd 768M Oct 1 23:58 /mnt/lustre/ost0/O/0/d1/4364513
rw-rw-rw 1 500 nagiocmd 768M Oct 2 00:26 /mnt/lustre/ost0/O/0/d1/4364865
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d1/4366785
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d1/4366817
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d1/4367201
rwSrwSrw 1 root root 0 Jan 1 1970 /mnt/lustre/ost0/O/0/d1/4367233
... more of the same ...

Comment by Erich Focht [ 07/Oct/14 ]

Is there a simple way (from user space) to find eg. the FID a particular object (file) belongs to? What is the object ID of something like
/mnt/lustre/ost0/O/0/d1/4364513 ?

Comment by Dmitry Eremin (Inactive) [ 07/Oct/14 ]

Ok. Can you do the following sequence of commands and attach results?

# e2fsck -v -f -n --mdsdb /tmp/mdsdb <MDS-device>
# e2fsck -v -f -n --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-0 <OST-device-0>
# e2fsck -v -f -n --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-1 <OST-device-1>
# e2fsck -v -f -n --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-2 <OST-device-2>
# e2fsck -v -f -n --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-3 <OST-device-3>
# e2fsck -v -f -n --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-4 <OST-device-4>
# e2fsck -v -f -n --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-5 <OST-device-5>
# lfsck -v -d --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-0 /tmp/ostdb-1 /tmp/ostdb-2 /tmp/ostdb-3 /tmp/ostdb-4 /tmp/ostdb-5 /mnt/lustre
Comment by Oliver Mangold [ 07/Oct/14 ]

Okay, lfsck seems to run through and claims to remove several objects (see log), but the files seem to be still there:

mds02:~# df -h
Filesystem Size Used Avail Use% Mounted on
10.4.0.102@o2ib0:10.4.0.101@o2ib0:/lnec 175T 1.4T 174T 1% /rw/mnt/lustre/lnec

Comment by Dmitry Eremin (Inactive) [ 07/Oct/14 ]

I suppose the space is reserved by removed data and will be reused later. Why it's so critical for you to see small number of used blocks in df output? By the way what number was for just formatted FS?

Comment by Oleg Drokin [ 07/Oct/14 ]

So your objects on OSTs might still be referenced by something on MDS, be it real files or not.

You can use ll_decode_filter_fid tool from lustre utils to see what is suposed parent object fid is on MDS for an ldiskfs object and then look it up there.

Comment by Oliver Mangold [ 08/Oct/14 ]

I tried to resolve all objects with ll_decode_filter_fid. What I got was:

1. lots of empty object files which apparently do not have a find, ll_decode_filter_fid returns for these 'error reading fid: No data available'
2. a bunch of non-empty files returning fids. These are the ones to seem use up the disk space. I tried to resolve the fids with 'lfs fid2path' and got
2a. most of these objects cannot be resolved, 'lfs fid2path' returns 'error on FID xxx: Invalid argument'
2b. a few with return a path to an actual existing file

So how do I clean this up. Can a delete all files for cases (1) and (2a)?

Comment by John Fuchs-Chesney (Inactive) [ 19/May/15 ]

Hi Oliver,

Can you tell us the name of the end customer on this ticket please?

Is this still a relevant issue for you?

Many thanks,
~ jfc.

Comment by Oliver Mangold [ 19/May/15 ]

This was a problem we encountered on our own benchmark system. It is nothing urgent, but we thought to report it anyway, for you to know that there is an issue.

Comment by John Fuchs-Chesney (Inactive) [ 06/Jun/15 ]

Thanks Oliver,

I'm marking it as resolved/incomplete – it will remain visible to all, and can still be searched on, in needed.
~ jfc.

Generated at Sat Feb 10 01:53:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.