[LU-7994] statahead loop in umount Created: 07/Apr/16  Updated: 03/May/17  Resolved: 09/Mar/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: Oleg Drokin Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Ever snce first batch of landings in 2.8 I started to get racer clients to lock up on unmount.

Typical trace:

[86663.285011] umount        D 0000000000000001 11016 14155  14148 0x00000000
[86663.285011]  ffff88008a16fd88 0000000000000086 ffff88008a16fd50 ffff88008a16fd4c
[86663.285011]  ffff88008a16fcf8 ffff8800bcc23280 00004ed2e0f1ebbc ffff880006315c00
[86663.285011]  0000000000000000 0000000101497763 ffff880081c0e6b8 ffff88008a16ffd8
[86663.285011] Call Trace:
[86663.285011]  [<ffffffff8152d911>] schedule_timeout+0x191/0x2e0
[86663.285011]  [<ffffffff81088290>] ? process_timeout+0x0/0x10
[86663.285011]  [<ffffffffa0fa4961>] ll_kill_super+0x91/0x180 [lustre]
[86663.285011]  [<ffffffffa05d94c2>] lustre_kill_super+0x42/0x60 [obdclass]
[86663.285011]  [<ffffffff81195ed7>] deactivate_super+0x57/0x80
[86663.285011]  [<ffffffff811b5e2f>] mntput_no_expire+0xbf/0x110
[86663.285011]  [<ffffffff811b699b>] sys_umount+0x7b/0x3a0
[86663.285011]  [<ffffffff8108e64d>] ? sigprocmask+0x8d/0x110
[86663.285011]  [<ffffffff8100b112>] system_call_fastpath+0x16/0x1b
(gdb) l *(ll_kill_super+0x91)
0x24991 is in ll_kill_super (/home/green/bk/linux-2.6.32-573.3.1.el6-debug/arch/x86/include/asm/atomic_64.h:23).
18	 *
19	 * Atomically reads the value of @v.
20	 */
21	static inline int atomic_read(const atomic_t *v)
22	{
23		return v->counter;
24	}
25	
26	/**
27	 * atomic_set - set atomic variable
(gdb) l *(ll_kill_super+0x8f)
0x2498f is in ll_kill_super (/home/green/git/lustre-release/lustre/llite/llite_lib.c:787).
782			sbi->ll_umounting = 1;
783	
784			/* wait running statahead threads to quit */
785			while (atomic_read(&sbi->ll_sa_running) > 0) {
786				set_current_state(TASK_UNINTERRUPTIBLE);
787				schedule_timeout(msecs_to_jiffies(MSEC_PER_SEC >> 3));
788			}
789		}
790	
791		EXIT;

Initially it looked like the cause was this patch: http://review.whamcloud.com/18475

But I am not sure.

Anyway whatever it is, it just seems to be exposing some sort of a reference leak in statahead code.



 Comments   
Comment by Joseph Gmitter (Inactive) [ 08/Apr/16 ]

Hi Lai,

Can you please have a look into this issue?

Thanks.
Joe

Comment by Gerrit Updater [ 10/Oct/16 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/23040
Subject: LU-7994 statahead: add smp_mb() to serialize ops
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a939946e936f3ada6c8c6d39dcd25aaf60e366d0

Comment by Gerrit Updater [ 09/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23040/
Subject: LU-7994 statahead: add smp_mb() to serialize ops
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c5c8d0623a60ee54bd11588a391b3dcd43c1abcf

Comment by Peter Jones [ 09/Mar/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:13:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.