[LU-7994] statahead loop in umount Created: 07/Apr/16 Updated: 03/May/17 Resolved: 09/Mar/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Oleg Drokin | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Ever snce first batch of landings in 2.8 I started to get racer clients to lock up on unmount. Typical trace: [86663.285011] umount D 0000000000000001 11016 14155 14148 0x00000000 [86663.285011] ffff88008a16fd88 0000000000000086 ffff88008a16fd50 ffff88008a16fd4c [86663.285011] ffff88008a16fcf8 ffff8800bcc23280 00004ed2e0f1ebbc ffff880006315c00 [86663.285011] 0000000000000000 0000000101497763 ffff880081c0e6b8 ffff88008a16ffd8 [86663.285011] Call Trace: [86663.285011] [<ffffffff8152d911>] schedule_timeout+0x191/0x2e0 [86663.285011] [<ffffffff81088290>] ? process_timeout+0x0/0x10 [86663.285011] [<ffffffffa0fa4961>] ll_kill_super+0x91/0x180 [lustre] [86663.285011] [<ffffffffa05d94c2>] lustre_kill_super+0x42/0x60 [obdclass] [86663.285011] [<ffffffff81195ed7>] deactivate_super+0x57/0x80 [86663.285011] [<ffffffff811b5e2f>] mntput_no_expire+0xbf/0x110 [86663.285011] [<ffffffff811b699b>] sys_umount+0x7b/0x3a0 [86663.285011] [<ffffffff8108e64d>] ? sigprocmask+0x8d/0x110 [86663.285011] [<ffffffff8100b112>] system_call_fastpath+0x16/0x1b (gdb) l *(ll_kill_super+0x91) 0x24991 is in ll_kill_super (/home/green/bk/linux-2.6.32-573.3.1.el6-debug/arch/x86/include/asm/atomic_64.h:23). 18 * 19 * Atomically reads the value of @v. 20 */ 21 static inline int atomic_read(const atomic_t *v) 22 { 23 return v->counter; 24 } 25 26 /** 27 * atomic_set - set atomic variable (gdb) l *(ll_kill_super+0x8f) 0x2498f is in ll_kill_super (/home/green/git/lustre-release/lustre/llite/llite_lib.c:787). 782 sbi->ll_umounting = 1; 783 784 /* wait running statahead threads to quit */ 785 while (atomic_read(&sbi->ll_sa_running) > 0) { 786 set_current_state(TASK_UNINTERRUPTIBLE); 787 schedule_timeout(msecs_to_jiffies(MSEC_PER_SEC >> 3)); 788 } 789 } 790 791 EXIT; Initially it looked like the cause was this patch: http://review.whamcloud.com/18475 But I am not sure. Anyway whatever it is, it just seems to be exposing some sort of a reference leak in statahead code. |
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 08/Apr/16 ] |
|
Hi Lai, Can you please have a look into this issue? Thanks. |
| Comment by Gerrit Updater [ 10/Oct/16 ] |
|
Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/23040 |
| Comment by Gerrit Updater [ 09/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23040/ |
| Comment by Peter Jones [ 09/Mar/17 ] |
|
Landed for 2.10 |