Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
Ever snce first batch of landings in 2.8 I started to get racer clients to lock up on unmount.
Typical trace:
[86663.285011] umount D 0000000000000001 11016 14155 14148 0x00000000 [86663.285011] ffff88008a16fd88 0000000000000086 ffff88008a16fd50 ffff88008a16fd4c [86663.285011] ffff88008a16fcf8 ffff8800bcc23280 00004ed2e0f1ebbc ffff880006315c00 [86663.285011] 0000000000000000 0000000101497763 ffff880081c0e6b8 ffff88008a16ffd8 [86663.285011] Call Trace: [86663.285011] [<ffffffff8152d911>] schedule_timeout+0x191/0x2e0 [86663.285011] [<ffffffff81088290>] ? process_timeout+0x0/0x10 [86663.285011] [<ffffffffa0fa4961>] ll_kill_super+0x91/0x180 [lustre] [86663.285011] [<ffffffffa05d94c2>] lustre_kill_super+0x42/0x60 [obdclass] [86663.285011] [<ffffffff81195ed7>] deactivate_super+0x57/0x80 [86663.285011] [<ffffffff811b5e2f>] mntput_no_expire+0xbf/0x110 [86663.285011] [<ffffffff811b699b>] sys_umount+0x7b/0x3a0 [86663.285011] [<ffffffff8108e64d>] ? sigprocmask+0x8d/0x110 [86663.285011] [<ffffffff8100b112>] system_call_fastpath+0x16/0x1b
(gdb) l *(ll_kill_super+0x91) 0x24991 is in ll_kill_super (/home/green/bk/linux-2.6.32-573.3.1.el6-debug/arch/x86/include/asm/atomic_64.h:23). 18 * 19 * Atomically reads the value of @v. 20 */ 21 static inline int atomic_read(const atomic_t *v) 22 { 23 return v->counter; 24 } 25 26 /** 27 * atomic_set - set atomic variable (gdb) l *(ll_kill_super+0x8f) 0x2498f is in ll_kill_super (/home/green/git/lustre-release/lustre/llite/llite_lib.c:787). 782 sbi->ll_umounting = 1; 783 784 /* wait running statahead threads to quit */ 785 while (atomic_read(&sbi->ll_sa_running) > 0) { 786 set_current_state(TASK_UNINTERRUPTIBLE); 787 schedule_timeout(msecs_to_jiffies(MSEC_PER_SEC >> 3)); 788 } 789 } 790 791 EXIT;
Initially it looked like the cause was this patch: http://review.whamcloud.com/18475
But I am not sure.
Anyway whatever it is, it just seems to be exposing some sort of a reference leak in statahead code.