Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1
Affects Version/s: Lustre 2.1.3, Lustre 2.1.4
Labels:
- JL
- mn4

Severity:
3
Rank (Obsolete):
6116

Description

We have a lot of nodes with a large amount of unreclaimable memory (over 4GB). Whatever we try to do (manually shrinking the cache, clearing lru locks, ...) the memory can't be recovered. The only way to get the memory back is to umount the lustre filesystem.

After some troubleshooting, I was able to wrote a small reproducer where I just open(2) then close(2) files in O_RDWR (my reproducer use to open thousand of files to emphasize the issue).

Attached 2 programs :

gentree.c (cc -o gentree gentree.c -lpthread) to generate a tree of known files (no need to use readdir in reproducer.c)
reproducer.c (cc -o reproducer reproduver.c -lpthread) to reproduce the issue.
The macro BASE_DIR has to be adjust according the local cluster configuration (you should provide the name of a directory located on a lustre filesystem).

There is no link between the 2 phases as rebooting the client between gentree & reproducer does't avoid the problem. Running gentree (which open as much files as reproducer) doesn't show the issue.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

gentree.c
3 kB
14/Jan/13 8:38 AM
logs_01.tar.gz
7 kB
15/Jan/13 6:08 AM
reproducer.c
2 kB
14/Jan/13 8:38 AM

Issue Links

is duplicated by

LU-3399 MDT don't update client last commited correctly so produce OOM on client

Open

LU-3381 lock enqueue fails for open(".lustre/fid", ...)

Closed

is related to

LU-4272 lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed from lovsub_device_free

Resolved

LU-4270 Interop 2.5.0<->2.6 failure on test suite sanity test_209: open/close requests are not freed

Resolved

Activity

People

Assignee:: Hongchao Zhang

Reporter:: Alexandre Louvet (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 31 Start watching this issue

Dates

Created:: 14/Jan/13 8:38 AM

Updated:: 12/Feb/14 11:10 PM

Resolved:: 12/Feb/14 11:10 PM