Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7994

statahead loop in umount

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Ever snce first batch of landings in 2.8 I started to get racer clients to lock up on unmount.

      Typical trace:

      [86663.285011] umount        D 0000000000000001 11016 14155  14148 0x00000000
      [86663.285011]  ffff88008a16fd88 0000000000000086 ffff88008a16fd50 ffff88008a16fd4c
      [86663.285011]  ffff88008a16fcf8 ffff8800bcc23280 00004ed2e0f1ebbc ffff880006315c00
      [86663.285011]  0000000000000000 0000000101497763 ffff880081c0e6b8 ffff88008a16ffd8
      [86663.285011] Call Trace:
      [86663.285011]  [<ffffffff8152d911>] schedule_timeout+0x191/0x2e0
      [86663.285011]  [<ffffffff81088290>] ? process_timeout+0x0/0x10
      [86663.285011]  [<ffffffffa0fa4961>] ll_kill_super+0x91/0x180 [lustre]
      [86663.285011]  [<ffffffffa05d94c2>] lustre_kill_super+0x42/0x60 [obdclass]
      [86663.285011]  [<ffffffff81195ed7>] deactivate_super+0x57/0x80
      [86663.285011]  [<ffffffff811b5e2f>] mntput_no_expire+0xbf/0x110
      [86663.285011]  [<ffffffff811b699b>] sys_umount+0x7b/0x3a0
      [86663.285011]  [<ffffffff8108e64d>] ? sigprocmask+0x8d/0x110
      [86663.285011]  [<ffffffff8100b112>] system_call_fastpath+0x16/0x1b
      
      (gdb) l *(ll_kill_super+0x91)
      0x24991 is in ll_kill_super (/home/green/bk/linux-2.6.32-573.3.1.el6-debug/arch/x86/include/asm/atomic_64.h:23).
      18	 *
      19	 * Atomically reads the value of @v.
      20	 */
      21	static inline int atomic_read(const atomic_t *v)
      22	{
      23		return v->counter;
      24	}
      25	
      26	/**
      27	 * atomic_set - set atomic variable
      (gdb) l *(ll_kill_super+0x8f)
      0x2498f is in ll_kill_super (/home/green/git/lustre-release/lustre/llite/llite_lib.c:787).
      782			sbi->ll_umounting = 1;
      783	
      784			/* wait running statahead threads to quit */
      785			while (atomic_read(&sbi->ll_sa_running) > 0) {
      786				set_current_state(TASK_UNINTERRUPTIBLE);
      787				schedule_timeout(msecs_to_jiffies(MSEC_PER_SEC >> 3));
      788			}
      789		}
      790	
      791		EXIT;
      

      Initially it looked like the cause was this patch: http://review.whamcloud.com/18475

      But I am not sure.

      Anyway whatever it is, it just seems to be exposing some sort of a reference leak in statahead code.

      Attachments

        Activity

          People

            laisiyao Lai Siyao
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: