Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.8.0
Affects Version/s: Lustre 2.6.0, Lustre 2.7.0, Lustre 2.5.3, Lustre 2.5.4
Labels:
- 22pl
- mq115
Environment:
OpenSFS Cluster running Lustre master build #1914
Combined MGS/MDS, one OSS with two OSTs and one client.

Severity:
3
Rank (Obsolete):
12884

Description

I’ve created a brand new file system on a freshly installed system on the OpenSFS cluster. I run dd a few times and everything looks fine. On the OSS, I run

# lctl set_param fail_loc=0x1610
fail_loc=0x1610
# lctl get_param fail_loc
fail_loc=5648

fail_loc=0x1610 (OBD_FAIL_LFSCK_DANGLING) is supposed to create files with dangling references. Then I run dd

# dd if=/dev/urandom of=/lustre/scratch/a_3 count=1 bs=64k
1+0 records in
1+0 records out
65536 bytes (66 kB) copied, 0.0157425 s, 4.2 MB/s

and get no errors for the first 50 or so files written. Then all dd commands will produce the following error

# dd if=/dev/urandom of=/lustre/scratch/m_502 count=1 bs=64k
dd: writing `/lustre/scratch/m_502': Cannot allocate memory
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.292437 s, 0.0 kB/s

I run LFSCK on the MDS

#lctl lfsck_start -M scratch-MDT0000 -A --reset --type layout
Started LFSCK on the device scratch-MDT0000: layout.
# lctl get_param -n mdd.scratch-MDT0000.lfsck_layout

and see that some number of dangling references were repaired. Up to this point, all of this is expected behavior.

The problem happens when I try to turn dangling references off. On the OST, I run “lctl set_param fail_loc=0” and get “fail_loc=0” returned. I then run dd on the client and get the same error as above about allocating memory and running LFSCK finds and corrects dangling references. I’m told that files could still be created with dangling references due to preallocation, but that after 32 or so files, it should stop.

After writing about 30 files, the dd command on the client froze, the OST crashed and, on the OST console, I see

Message from syslogd@c11-ib at Feb 27 20:42:26 ...
 kernel:LustreError: 2082:0:(ldlm_lib.c:1311:target_destroy_export()) ASSERTION( atomic_read(&exp->exp_cb_count) == 0 ) failed: value: 1

Message from syslogd@c11-ib at Feb 27 20:42:26 ...
 kernel:LustreError: 2082:0:(ldlm_lib.c:1311:target_destroy_export()) LBUG

The OST came back on-line after a few minutes. I’ve repeated this twice on two different clean file systems.

Attachments

Issue Links

is related to

LU-5395 lfsck_start not progressing

Resolved

LU-5377 target_destroy_export()) ASSERTION( atomic_read(&exp->exp_cb_count) == 0 ) failed: value: 1

Resolved

mentioned in: Page Loading...; Page Loading...

Activity

People

Assignee:: Mikhail Pershin

Reporter:: James Nunez (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 28/Feb/14 5:06 AM

Updated:: 12/Aug/15 10:05 PM

Resolved:: 23/Apr/15 4:40 PM