Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.5.3, Lustre 2.8.0
Labels:
None
Environment:
OLCF Atlas production system: clients running 2.8.0+ (with patches), server running 2.5.5+ (with patches)

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

On atlas2 file system, we have a particular directory, any operations such as "ls" or "stat" will completely hang the process. This incurs no OS error or Lustre error from the client side. On server side, we did observe OI scrub message a few times, which may suggest there is some MDS data inconsistency, and it is "trying" to do the fix but no avail. We can't correlate the two yet.

Ops teams have collected traces on the client side by:

mount -t lustre 10.36.226.77@o2ib:/atlas2 /lustre/atlas2 -o rw,flock,nosuid,nodev
lctl set_param osc/*/checksums 0
echo “all” > /proc/sys/lnet/debug
echo “1024” > /proc/sys/lnet/debug_mb

Step2: cd /lustre/atlas2/path/to/offending_directory/
Step3: ls

Step1: lctl dk > /dev/null
Step4: Wait 30 seconds
Step5: lctl dk > atlas2-mds3_ls_for_fprof.out

the log is attached.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

atlas2-mds3_ls_for_fprof.out.gz
2.41 MB
12/Oct/16 3:59 PM

Issue Links

is cloned by

LU-10237 "ls" hangs on a particular directory

Resolved

Activity

People

Assignee:: nasf (Inactive)

Reporter:: Feiyi Wang

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 12/Oct/16 3:58 PM

Updated:: 13/Nov/17 11:08 PM

Resolved:: 09/Mar/17 6:19 PM