Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.11.0
Affects Version/s: Lustre 2.11.0
Labels:
- test_script_improvements
- tests

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

To reproduce this hang, run the following:

# ./llmount.sh
#  ./auster -v -k sanity --only "64d 65k 66"

The above commands with set up Lustre with one combined MGS/MDS and two OSTs with loop back devices on a single node. The auster command will run sanity test 64d, 65k and 66 only.

The output from the run is:

== sanity test 64d: check grant limit exceed ========================================================= 19:10:43 (1517512243)
debug=super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck
dd: error writing '/mnt/lustre/f64d.sanity': No space left on device
278+0 records in
277+0 records out
290684928 bytes (291 MB) copied, 13.1767 s, 22.1 MB/s
/usr/lib64/lustre/tests/sanity.sh: line 6121: kill: (25805) - No such process
debug=trace inode super ext2 malloc cache info ioctl neterror net warning buffs other dentry nettrace page dlmtrace error emerg ha rpctrace vfstrace reada mmap config console quota sec lfsck hsm snapshot layout
Resetting fail_loc on all nodes...done.
PASS 64d (15s)


== sanity test 65k: validate manual striping works properly with deactivated OSCs ==================== 19:10:58 (1517512258)
Check OST status: 
lustre-OST0000-osc-MDT0000 is active
lustre-OST0001-osc-MDT0000 is active
total: 1000 open/close in 1.49 seconds: 672.84 ops/second
Deactivate:  lustre-OST0000-osc-MDT0000
/usr/bin/lfs setstripe -i 0 -c 1 /mnt/lustre/d65k.sanity/0
/usr/bin/lfs setstripe -i 1 -c 1 /mnt/lustre/d65k.sanity/1
 - unlinked 0 (time 1517512260 ; total 0 ; last 0)
total: 1000 unlinks in 2 seconds: 500.000000 unlinks/second
lustre-OST0000-osc-MDT0000 is Activate
trevis-58vm6.trevis.hpdd.intel.com: executing wait_import_state FULL osc.*OST*-osc-MDT0000.ost_server_uuid 40
osc.*OST*-osc-MDT0000.ost_server_uuid in FULL state after 0 sec
total: 1000 open/close in 1.62 seconds: 615.67 ops/second
Deactivate:  lustre-OST0001-osc-MDT0000
/usr/bin/lfs setstripe -i 0 -c 1 /mnt/lustre/d65k.sanity/0
lfs setstripe: error on ioctl 0x4008669a for '/mnt/lustre/d65k.sanity/0' (3): No space left on device
 sanity test_65k: @@@@@@ FAIL: setstripe 0 should succeed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5130:error()
  = /usr/lib64/lustre/tests/sanity.sh:6310:test_65k()
  = /usr/lib64/lustre/tests/test-framework.sh:5406:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5445:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:5244:run_test()
  = /usr/lib64/lustre/tests/sanity.sh:6326:main()
Dumping lctl log to /tmp/test_logs/2018-02-01/191040/sanity.test_65k.*.1517512264.log
Dumping logs only on local client.
Resetting fail_loc on all nodes...done.
FAIL 65k (6s)


== sanity test 66: update inode blocks count on client =============================================== 19:11:04 (1517512264)

Looking at the testing output, we see that test 64d fills an OST and doesn’t remove the file and then test 65k deactivates one OST at a time and, when it has deactivated the empty OST and tries to use the full one, the ‘lfs setstripe’ command fails. Test 65k exits when the ‘lfs setstripe’ command fail and leaves one OST deactivated.

The last thing I see in dmesg from test 66 is:

 [797721.192593] Lustre: DEBUG MARKER: == sanity test 66: update inode blocks count on client =============================================== 23:33:24 (1517614404)
[797773.923382] Lustre: lustre-OST0001: haven't heard from client lustre-MDT0000-mdtlov_UUID (at 0@lo) in 53 seconds. I think it's dead, and I am evicting it. exp ffff88000c752c00, cur 1517614457 expire 1517614427 last 1517614404

We need to clean up the file from test 64d and, on failure, we need to activate the OST and clean up files in test 65k.

I’ll upload a patch for this.

Attachments

Activity

People

Assignee:: James Nunez (Inactive)

Reporter:: James Nunez (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 02/Feb/18 11:37 PM

Updated:: 27/Feb/18 4:35 AM

Resolved:: 27/Feb/18 4:35 AM