Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.12.0, Lustre 2.10.3, Lustre 2.10.4, Lustre 2.10.5, Lustre 2.13.0, Lustre 2.10.6, Lustre 2.10.7, Lustre 2.12.1, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.5, Lustre 2.12.6, Lustre 2.15.0
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

parallel-scale-nfsv4 hangs on unmount after all tests have run. In the suite_log, the last thing we see is

== parallel-scale-nfsv4 test complete, duration 2088 sec ============================================= 22:07:24 (1521868044)
 
Unmounting NFS clients...
CMD: trevis-8vm1,trevis-8vm2 umount -f /mnt/lustre
 
Unexporting Lustre filesystem...
CMD: trevis-8vm1,trevis-8vm2 chkconfig --list rpcidmapd 2>/dev/null |
                                    grep -q rpcidmapd && service rpcidmapd stop ||
                                    true
CMD: trevis-8vm4 { [[ -e /etc/SuSE-release ]] &&
                                       service nfsserver stop; } ||
                                       service nfs stop
CMD: trevis-8vm4 sed -i '/^lustre/d' /etc/exports
CMD: trevis-8vm4 exportfs -v
CMD: trevis-8vm4 grep -c /mnt/lustre' ' /proc/mounts
Stopping client trevis-8vm4 /mnt/lustre (opts:-f)
CMD: trevis-8vm4 lsof -t /mnt/lustre
CMD: trevis-8vm4 umount -f /mnt/lustre 2>&1

Looking at the console logs for vm4, MDS1 and 3, we see

[ 2216.385890] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test complete, duration 2088 sec ============================================= 22:07:24 (1521868044)
[ 2216.698201] Lustre: DEBUG MARKER: { [[ -e /etc/SuSE-release ]] &&
[ 2216.698201]                                service nfsserver stop; } ||
[ 2216.698201]                                service nfs stop
[ 2216.805093] nfsd: last server has exited, flushing export cache
[ 2216.819487] Lustre: DEBUG MARKER: sed -i '/^lustre/d' /etc/exports
[ 2216.885266] Lustre: DEBUG MARKER: exportfs -v
[ 2216.945098] Lustre: DEBUG MARKER: grep -c /mnt/lustre' ' /proc/mounts
[ 2216.982526] Lustre: DEBUG MARKER: lsof -t /mnt/lustre
[ 2217.170422] Lustre: DEBUG MARKER: umount -f /mnt/lustre 2>&1
[ 2217.192827] Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request
[ 2217.193476] LustreError: 410:0:(file.c:205:ll_close_inode_openhandle()) lustre-clilmv-ffff880060b4e800: inode [0x200000406:0x3c1b:0x0] mdc close failed: rc = -108
[ 2217.218709] Lustre: 4066:0:(llite_lib.c:2676:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.9.4.84@tcp:/lustre/fid: [0x200000406:0x3e42:0x0]/ may get corrupted (rc -108)
[ 2217.218732] Lustre: 4066:0:(llite_lib.c:2676:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.9.4.84@tcp:/lustre/fid: [0x200000406:0x3e7b:0x0]/ may get corrupted (rc -108)
…
[ 5541.474664]
[ 5541.474667] umount          D 0000000000000000     0   410    409 0x00000000
[ 5541.474669]  ffff88004365fda8 ffff88004365fde0 ffff880048e5ce00 ffff880043660000
[ 5541.474670]  ffff88004365fde0 000000010013feb9 ffff88007fc10840 0000000000000000
[ 5541.474671]  ffff88004365fdc0 ffffffff81612a95 ffff88007fc10840 ffff88004365fe68
[ 5541.474672] Call Trace:
[ 5541.474674]  [<ffffffff81612a95>] schedule+0x35/0x80
[ 5541.474677]  [<ffffffff81615851>] schedule_timeout+0x161/0x2d0
[ 5541.474689]  [<ffffffffa1457cc7>] ll_kill_super+0x77/0x150 [lustre]
[ 5541.474723]  [<ffffffffa09f3a94>] lustre_kill_super+0x34/0x40 [obdclass]
[ 5541.474734]  [<ffffffff8120cf5f>] deactivate_locked_super+0x3f/0x70
[ 5541.474742]  [<ffffffff812283fb>] cleanup_mnt+0x3b/0x80
[ 5541.474745]  [<ffffffff8109d198>] task_work_run+0x78/0x90
[ 5541.474748]  [<ffffffff8107b5cf>] exit_to_usermode_loop+0x91/0xc2
[ 5541.474760]  [<ffffffff81003ae5>] syscall_return_slowpath+0x85/0xa0
[ 5541.474768]  [<ffffffff81616ca7>] int_ret_from_sys_call+0x25/0x9f
[ 5541.476903] DWARF2 unwinder stuck at int_ret_from_sys_call+0x25/0x9f
[ 5541.476904]

We see this problem with unmount on the maser and b2_10 branches for SLES12 SP2 and SP3 testing only.

Logs for test suites failres are at

https://testing.whamcloud.com/test_sets/4bce5a66-2f2f-11e8-9e0e-52540065bddc

https://testing.whamcloud.com/test_sets/103f280e-2fac-11e8-b3c6-52540065bddc

https://testing.whamcloud.com/test_sets/044a75f0-2eba-11e8-b6a0-52540065bddc

Attachments

Issue Links

is related to

LU-17154 parallel-scale-nfsv4: hangs on umount after racer_on_nfs

Open

LU-10566 parallel-scale-nfsv4 test_metabench: mkdir: cannot create directory on Read-only file system

Reopened

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(44 mentioned in)

Activity

People

Assignee:: Qian Yingjin

Reporter:: James Nunez (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 26/Mar/18 5:04 PM

Updated:: 25/Feb/25 12:47 AM