[LU-7615] replay-dual test_25 failed: umount client hangs up : INFO: task umount:4134 blocked for more than 120 seconds. Created: 29/Dec/15  Updated: 11/Sep/20  Resolved: 11/Sep/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: parinay v kondekar (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Configuration : 4 node , 1 MDS, 1 OSS, 2 clients
Release
2.6.32_431.29.2.el6_lustremaster_9267_2_g959f8f7
2.6.32_431.29.2.el6.x86_64_g959f8f7

Server 2.7.64
Client 2.7.64


Attachments: File 25.console.fre1304.log     File 25.lctl.tgz    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
stdout.log
== replay-dual test 25: replay|resend == 01:50:51 (1451181051)
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.00229757 s, 223 kB/s
fail_loc=0x304
fail_loc=0x304
fail_loc=0x80000325
Failing ost1 on fre1302
Stopping /mnt/ost1 (opts:) on fre1302
pdsh@fre1303: fre1302: ssh exited with exit code 1
reboot facets: ost1
Failover ost1 to fre1302
01:51:03 (1451181063) waiting for fre1302 network 900 secs ...
01:51:03 (1451181063) network interface is UP
mount facets: ost1
Starting ost1: -o user_xattr  /dev/vdb /mnt/ost1
pdsh@fre1303: fre1302: ssh exited with exit code 1
pdsh@fre1303: fre1302: ssh exited with exit code 1
Started lustre-OST0000
fre1303: osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 4 sec
fre1304: osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 4 sec
/usr/lib64/lustre/tests/test-framework.sh: line 4376:  3139 Terminated              LUSTRE="/usr/lib64/lustre" sh -c "multiop /mnt/lustre2/f25.replay-dual Ow512"
fail_loc=0
fail_loc=0
Resetting fail_loc on all nodes...done.
PASS 25 (19s)
Stopping clients: fre1303,fre1304 /mnt/lustre2 (opts:)


client fre1303 console :
INFO: task umount:4134 blocked for more than 120 seconds.

      Not tainted 2.6.32-431.29.2.el6.x86_64 #1

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

umount        D 0000000000000000     0  4134   4127 0x00000080

 ffff88010a083a68 0000000000000082 0000000000000000 000000ac00000000
 0000000000000028 ffff88010a083ab8 ffff88010a083a78 0000000000017dbe
 ffff88010c8fc5f8 ffff88010a083fd8 000000000000fbc8 ffff88010c8fc5f8
Call Trace:
 [<ffffffff8152a5be>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff8152a45b>] mutex_lock+0x2b/0x50
 [<ffffffffa08a82ed>] mgc_process_config+0x1dd/0x1210 [mgc]
 [<ffffffffa02b36c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa041452d>] obd_process_config.clone.0+0x8d/0x2e0 [obdclass]
 [<ffffffffa02b36c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa0418752>] lustre_end_log+0x262/0x6a0 [obdclass]
 [<ffffffffa0908ec2>] ll_put_super+0x82/0x1230 [lustre]
 [<ffffffffa02b36c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa0934273>] ? ll_destroy_inode+0xc3/0x100 [lustre]
 [<ffffffff811a620f>] ? destroy_inode+0x2f/0x60
 [<ffffffff811a66dc>] ? dispose_list+0xfc/0x120
 [<ffffffff811a6ad6>] ? invalidate_inodes+0xf6/0x190
 [<ffffffff8118b1eb>] generic_shutdown_super+0x5b/0xe0
 [<ffffffff8118b2d6>] kill_anon_super+0x16/0x60
 [<ffffffffa041020a>] lustre_kill_super+0x4a/0x60 [obdclass]
 [<ffffffff8118ba77>] deactivate_super+0x57/0x80
 [<ffffffff811ab41f>] mntput_no_expire+0xbf/0x110
 [<ffffffff811abf6b>] sys_umount+0x7b/0x3a0
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b


 Comments   
Comment by parinay v kondekar (Inactive) [ 18/Jan/16 ]

I re-ran the test with increased timeout (1200) and the test passed.

Generated at Sat Feb 10 02:10:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.