Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • Lustre 2.4.0
    • Sequoia, 2.3.54-2chaos, github.com/chaos/lustre
    • 3
    • 5558

    Description

      We have another instance of clients getting stuck at mount time. Out of 768 client, 766 mounted successfully, and 2 appear to be stuck. So far it has been several minutes. They both had the same soft lockup messages on the console:

      LustreError: 20578:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc log: -2
      BUG: soft lockup - CPU#33 stuck for 68s! [ll_ping:3244]
      Modules linked in: lmv(U) mgc(U) lustre(U) mdc(U) fid(U) fld(U) lov(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) bgvrnic bgmudm
      NIP: c00000000042e190 LR: 8000000003a9ff2c CTR: c00000000042e160
      REGS: c0000003ca60fba0 TRAP: 0901   Not tainted  (2.6.32-220.23.3.bgq.13llnl.V1R1M2.bgq62_16.ppc64)
      MSR: 0000000080029000 <EE,ME,CE>  CR: 84228424  XER: 20000000
      TASK = c0000003ca51e600[3244] 'll_ping' THREAD: c0000003ca60c000 CPU: 33
      GPR00: 0000000080000021 c0000003ca60fe20 c0000000006de510 c0000003c00c0a78
      GPR04: 2222222222222222 0000000000000000 c0000003ca60fd38 0000000000000000
      GPR08: 0000000000000000 0000000080000021 0000000000000000 c00000000042e160
      GPR12: 8000000003aebda8 c00000000075d200
      NIP [c00000000042e190] ._spin_lock+0x30/0x44
      LR [8000000003a9ff2c] .ptlrpc_pinger_main+0x19c/0xcc0 [ptlrpc]
      Call Trace:
      [c0000003ca60fe20] [8000000003a9fea0] .ptlrpc_pinger_main+0x110/0xcc0 [ptlrpc] (unreliable)
      [c0000003ca60ff90] [c00000000001a9e0] .kernel_thread+0x54/0x70
      Instruction dump:
      38000000 980d0c94 812d0000 7c001829 2c000000 40c20010 7d20192d 40c2fff0
      4c00012c 2fa00000 4dfe0020 7c210b78 <80030000> 2fa00000 40defff4 7c421378
      BUG: soft lockup - CPU#49 stuck for 68s! [ptlrpcd_rcv:3175]
      Modules linked in: lmv(U) mgc(U) lustre(U) mdc(U) fid(U) fld(U) lov(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) bgvrnic bgmudm
      NIP: c00000000042e198 LR: 8000000003a66af4 CTR: c00000000042e160
      REGS: c0000003e9bf3900 TRAP: 0901   Not tainted  (2.6.32-220.23.3.bgq.13llnl.V1R1M2.bgq62_16.ppc64)
      MSR: 0000000080029000 <EE,ME,CE>  CR: 84228444  XER: 20000000
      TASK = c0000003ea1f0740[3175] 'ptlrpcd_rcv' THREAD: c0000003e9bf0000 CPU: 49
      GPR00: 0000000080000021 c0000003e9bf3b80 c0000000006de510 c0000003c00c0a78
      GPR04: 0000000000000001 0000000000000000 0000000000000000 0000000000000000
      GPR08: c0000003c00c0a60 0000000080000031 c00000036a060900 c00000000042e160
      GPR12: 8000000003aebda8 c00000000076a200
      NIP [c00000000042e198] ._spin_lock+0x38/0x44
      LR [8000000003a66af4] .ptlrpc_check_set+0x4f4/0x4e80 [ptlrpc]
      Call Trace:
      [c0000003e9bf3b80] [8000000003a66964] .ptlrpc_check_set+0x364/0x4e80 [ptlrpc] (unreliable)
      [c0000003e9bf3d20] [8000000003abd1cc] .ptlrpcd_check+0x66c/0x8a0 [ptlrpc]
      [c0000003e9bf3e40] [8000000003abd6b8] .ptlrpcd+0x2b8/0x510 [ptlrpc]
      [c0000003e9bf3f90] [c00000000001a9e0] .kernel_thread+0x54/0x70
      Instruction dump:
      812d0000 7c001829 2c000000 40c20010 7d20192d 40c2fff0 4c00012c 2fa00000
      4dfe0020 7c210b78 80030000 2fa00000 <40defff4> 7c421378 4bffffc8 7c0802a6
      

      Both nodes are pretty much idle. I'll see what other information I can gather.

      Attachments

        1. lustre-log.1357681275.3261.txt
          3.02 MB
        2. seqio162_console.txt
          1.58 MB
        3. seqio162_lustre_log.txt
          3.06 MB
        4. seqio542_console_sysrq.log
          1.66 MB
        5. seqio542_lustre.log
          7.99 MB
        6. seqio652_lustre.log
          8.00 MB
        7. seqio652_lustre2.log
          4.02 MB

        Issue Links

          Activity

            [LU-2327] Clients stuck in mount

            This is still a problem. Unfortunately, we never seem to hit the problem while CONFIG_DEBUG_SPINLOCK is enabled, but it hits fairly frequently with it disabled.

            In the latest occurance that I investigated yesterday, I see the following stacks:

            CPU1:

            ptlrpcd_check_set
            ptlrpcd_check
            ptlrpcd
            

            It appears to be stuck spinning on the "while (spin_trylock (&imp->imp_lock)" line.

            CPU5:

            ptlrpc_pinger_process_import (inlined)
            ptlrpc_pinger_main
            

            It is spinning on "spin_lock(&imp->imp_lock)" in ptlrpc_pinger_process_import(), which holding the pinger mutex.

            Not spinning, but stuck sleeping on the pinger mutex is the mount.lustre process:

            schedule
            __mutex_lock_slowpath
            mutex_lock
            ptlrpc_pinger_add_import
            client_connect_import
            lov_connect_obd
            lov_connect
            ll_fill_super
            lustre_fill_super
            get_sb_nodev
            lustre_get_sb
            vfs_kern_mount
            do_kern_mount
            do_mount
            sys_mount
            syscall_exit
            

            But mount.lustre waiting on the mutex is understood since the other thread is holding it. We still need to understand who is holding the import spin lock.

            Looking our debugging statement ("spinning too long"), I see that the value of the spin lock is 2147483652. In hex, that is:

            0x8000004

            Looking at the Linux kernel PowerPC spin lock code, I see that when lock is set, the least significant byte is set to the CPU number. So the lock value looks correct, and apparently the lock was taken on a CPU that is not currently in use by any processes.

            So that is just seems to further support the assumption that Lustre is failing to unlock the imp_lock somewhere.

            morrone Christopher Morrone (Inactive) added a comment - This is still a problem. Unfortunately, we never seem to hit the problem while CONFIG_DEBUG_SPINLOCK is enabled, but it hits fairly frequently with it disabled. In the latest occurance that I investigated yesterday, I see the following stacks: CPU1: ptlrpcd_check_set ptlrpcd_check ptlrpcd It appears to be stuck spinning on the "while (spin_trylock (&imp->imp_lock)" line. CPU5: ptlrpc_pinger_process_import (inlined) ptlrpc_pinger_main It is spinning on "spin_lock(&imp->imp_lock)" in ptlrpc_pinger_process_import(), which holding the pinger mutex. Not spinning, but stuck sleeping on the pinger mutex is the mount.lustre process: schedule __mutex_lock_slowpath mutex_lock ptlrpc_pinger_add_import client_connect_import lov_connect_obd lov_connect ll_fill_super lustre_fill_super get_sb_nodev lustre_get_sb vfs_kern_mount do_kern_mount do_mount sys_mount syscall_exit But mount.lustre waiting on the mutex is understood since the other thread is holding it. We still need to understand who is holding the import spin lock. Looking our debugging statement ("spinning too long"), I see that the value of the spin lock is 2147483652. In hex, that is: 0x8000004 Looking at the Linux kernel PowerPC spin lock code, I see that when lock is set, the least significant byte is set to the CPU number. So the lock value looks correct, and apparently the lock was taken on a CPU that is not currently in use by any processes. So that is just seems to further support the assumption that Lustre is failing to unlock the imp_lock somewhere.

            We seem to have the problem that enabling kernel spin lock debugging makes the problem go away. I think that explains why we had a period of time before that the problem went away as well.

            So I would say that the ticket should remain open, because there seems to be a lustre race here somewhere. But the priority can be dropped in favor of other things that are more pressing.

            morrone Christopher Morrone (Inactive) added a comment - We seem to have the problem that enabling kernel spin lock debugging makes the problem go away. I think that explains why we had a period of time before that the problem went away as well. So I would say that the ticket should remain open, because there seems to be a lustre race here somewhere. But the priority can be dropped in favor of other things that are more pressing.

            Have you been able to reproduce this issue? Or should we close this ticket and reopen if it occurs again?

            jlevi Jodi Levi (Inactive) added a comment - Have you been able to reproduce this issue? Or should we close this ticket and reopen if it occurs again?

            I've tried reproducing the hang with CONFIG_DEBUG_SPINLOCK enabled, but haven't been able to trigger it.

            prakash Prakash Surya (Inactive) added a comment - I've tried reproducing the hang with CONFIG_DEBUG_SPINLOCK enabled, but haven't been able to trigger it.

            Alex, we've re-enabled CONFIG_DEBUG_SPINLOCK on the ION kernels. I've added a patch to print out the void *owner, int magic, and int owner_cpu fields of the spinlock_t when the "spinning" message is printed. Is there any other information I should print when this hits? Or any other debugging info that would be useful?

            prakash Prakash Surya (Inactive) added a comment - Alex, we've re-enabled CONFIG_DEBUG_SPINLOCK on the ION kernels. I've added a patch to print out the void *owner , int magic , and int owner_cpu fields of the spinlock_t when the "spinning" message is printed. Is there any other information I should print when this hits? Or any other debugging info that would be useful?

            well, I've read through the code few times yet. Liang did as well. still no idea how do we get into this.. sorry. I hoped to get a hint from the kernel.

            bzzz Alex Zhuravlev added a comment - well, I've read through the code few times yet. Liang did as well. still no idea how do we get into this.. sorry. I hoped to get a hint from the kernel.

            I'm not sure yet, I need to get a test machine to play around with the kernel and determine what the underlying issue is with that option set. Pending work to be done. Is that the best option available to us? I'd imagine knowing what the "owner" field is would tell us a lot.

            prakash Prakash Surya (Inactive) added a comment - I'm not sure yet, I need to get a test machine to play around with the kernel and determine what the underlying issue is with that option set. Pending work to be done. Is that the best option available to us? I'd imagine knowing what the "owner" field is would tell us a lot.

            any chance to get it back ?

            bzzz Alex Zhuravlev added a comment - any chance to get it back ?

            Yes, when we went to the RHEL6.3 kernel on Sequoia last Thursday, we lost CONFIG_DEBUG_SPINLOCK. Apparently the kernel would no longer operate with that enabled, as we've seen with other debug settings.

            morrone Christopher Morrone (Inactive) added a comment - Yes, when we went to the RHEL6.3 kernel on Sequoia last Thursday, we lost CONFIG_DEBUG_SPINLOCK. Apparently the kernel would no longer operate with that enabled, as we've seen with other debug settings.

            Unfortunately, CONFIG_DEBUG_SPINLOCK was dropped when we upgraded the kernel a couple weeks ago. I need to talk to the admins about why they dropped it, and if it can get enabled again. I'm also interested in finding out who the owner is.

            prakash Prakash Surya (Inactive) added a comment - Unfortunately, CONFIG_DEBUG_SPINLOCK was dropped when we upgraded the kernel a couple weeks ago. I need to talk to the admins about why they dropped it, and if it can get enabled again. I'm also interested in finding out who the owner is.

            People

              bzzz Alex Zhuravlev
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: