[LU-9735] Sles12Sp2 and 2.9 getcwd() sometimes fails - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4
Affects Version/s: Lustre 2.9.0
Labels:
- ORNL

Severity:
2
Rank (Obsolete):
9223372036854775807

Description

This is a duplicate of ~~LU-9208~~. Opening this case for tracking for nasa. We start to see this once we updated the clients to Sles12SP2 and lustre2.9

Using the test code provide ~~LU-9208~~ (miranda) I was able to reproduce the bug on a single node.

Iteration =    868, Run Time =     0.9614 sec., Transfer Rate =   120.7790 10e+06 Bytes/sec/proc
Iteration =    869, Run Time =     1.5308 sec., Transfer Rate =    75.8561 10e+06 Bytes/sec/proc
forrtl: severe (121): Cannot access current working directory for unit 10012, file "Unknown"
Image              PC                Routine            Line        Source             
miranda            0000000000409F29  Unknown               Unknown  Unknown
miranda            00000000004169D2  Unknown               Unknown  Unknown
miranda            0000000000404045  Unknown               Unknown  Unknown
miranda            0000000000402FDE  Unknown               Unknown  Unknown
libc.so.6          00002AAAAB5B96E5  Unknown               Unknown  Unknown
miranda            0000000000402EE9  Unknown               Unknown  Unknown
MPT ERROR: MPI_COMM_WORLD rank 12 has terminated without calling MPI_Finalize()
	aborting job

I was able to capture some debug logs I have attached to the case. I was unable to reproduce it using "+trace". But will continue to try.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

getcwdHack.c
6 kB
07/Aug/17 6:43 PM
miranda.debug.1499341246.gz
84.13 MB
06/Jul/17 12:10 PM
miranda.dis
9.19 MB
06/Jul/17 12:52 PM
r481i7n17.dump1.log.gz
13.86 MB
05/Jul/17 6:55 AM
unoptimize-atomic_open-of-negative-dentry.patch
2 kB
10/Jul/17 3:10 AM

Issue Links

duplicates

LU-9208 getcwd() sometimes fails

Resolved

is related to

LU-9868 dcache/namei fixes for lustre

Open

LU-10164 kernel update [SLES12 SP3 4.4.92-6.18]

Resolved

Activity

People

Assignee:: James A Simmons

Reporter:: Mahmoud Hanafi

Votes:: 1 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 05/Jul/17 6:57 AM

Updated:: 24/Jul/19 5:36 PM

Resolved:: 24/Jul/19 5:14 PM