Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.3.0
Labels:
- ptr
Environment:
Lustre server 2.1.4 centos 6.3
Lustre clients 2.3.0 sles11sp1

Severity:
2
Rank (Obsolete):
7461

Description

After we upgraded our clients from 2.1.3 to 2.3.0, some users (the crowd is increasing) started seeing their application to fail, to hang, or even crash. The servers run 2.1.4. In all cases, same application ran OK with 2.1.3.

Since we do not have reproducer on the hang and the crash cases, we here attach a reproducer that can cause application to fail. The test were executed with stripe count of 1, 2, 4, 8, 16. The higher number the stripe count the more likely application fails.

The 'reproducer1.scr' is a PBS script to start 1024 mpi tests.
'reproducer1.scr.o1000145' is PBS output of the execution.
'1000145.pbspl1.0.log.txt' is an output of one of our tools to collect /var/log/messages from servers and clients related to the specified job.

The PBS specific argument lines start with "#PBS " string and are ignored if executed without PBS. The script use SGI MPT, but can be converted to openmpi or intel mpi.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

1000145.pbspl1.0.log.txt
29/Mar/13 12:54 AM
227 kB
Jay Lan
1000145.pbspl1.0.log.txt.-pbs
01/Apr/13 6:40 PM
26 kB
Jay Lan
lu-3062-reproducer-logs.tgz
21/Jun/13 8:03 PM
0.2 kB
Bobbie Lind
nbp2-server-logs.LU-3062
29/Mar/13 7:11 PM
5 kB
Jay Lan
reproducer_debug_r311i1n10_log
01/Apr/13 9:08 PM
56 kB
James Karellas
reproducer_debug_r311i1n9_log
01/Apr/13 9:08 PM
56 kB
James Karellas
reproducer_full_debug_log
01/Apr/13 10:23 PM
2.01 MB
James Karellas
reproducer_full_debug_xaa.bz2
02/Apr/13 6:30 PM
0.2 kB
James Karellas
reproducer_full_debug_xab.bz2
02/Apr/13 6:30 PM
5.00 MB
James Karellas
reproducer_full_debug_xac.bz2
02/Apr/13 6:30 PM
0.2 kB
James Karellas
reproducer_full_debug_xad.bz2
02/Apr/13 6:30 PM
0.2 kB
James Karellas
reproducer_full_debug_xae.bz2
02/Apr/13 6:30 PM
3.02 MB
James Karellas
reproducer1.scr
29/Mar/13 12:54 AM
0.8 kB
Jay Lan
reproducer1.scr.o1000145
29/Mar/13 12:54 AM
4 kB
Jay Lan
reproducer2.scr
01/Apr/13 8:02 PM
0.8 kB
James Karellas

Activity

People

Assignee:: Oleg Drokin

Reporter:: Jay Lan (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 29/Mar/13 12:54 AM

Updated:: 08/Sep/16 9:33 PM

Resolved:: 08/Sep/16 9:33 PM