Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5126

Kernel crashed after debug_deamon started

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.6.0, Lustre 2.5.3
    • Lustre 2.6.0, Lustre 2.4.2
    • Lustre-2.4.2
    • 3
    • 14143

    Description

      Following crash happened after degbu_daemon was started.

      2014-05-30 16:30:19 LNetError: 2775:0/root/rpmbuild/BUILD/lustre-2.4.2/libcfs/libcfs/tracefile.c:1035:tracefiled()) ASSERTION(cfs_page_count(tage->page) > 0) failed
      2014-05-30 16:30:19 Kernel panic - not syncing: Lustre debug assertion failure
      2014-05-30 16:30:19
      2014-05-30 16:30:19 Pid: 2775, comm: ktracefiled Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1
      2014-05-30 16:30:19 Call Trace:
      2014-05-30 16:30:19 [<ffffffff8150de58>] ? panic+0xa7/0x16f
      2014-05-30 16:30:19 [<ffffffff810a15fa>] ? do_gettimeofday+0x1a/0x50
      2014-05-30 16:30:19 [<ffffffffa0410c14>] ? cfs_trace_assertion_failed+0x74/0x80 [libcfs]
      2014-05-30 16:30:19 [<ffffffffa0412b60>] ? tracefiled+0x400/0x530 [libcfs]
      2014-05-30 16:30:19 [<ffffffffa0412760>] ? tracefiled+0x0/0x530 [libcfs]
      2014-05-30 16:30:19 [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
      2014-05-30 16:30:19 [<ffffffffa0412760>] ? tracefiled+0x0/0x530 [libcfs]
      2014-05-30 16:30:19 [<ffffffffa0412760>] ? tracefiled+0x0/0x530 [libcfs]
      2014-05-30 16:30:19 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      2014-05-30 16:34:33 Initializing cgroup subsys cpuset
      2014-05-30 16:34:33 Initializing cgroup subsys cpu

      And a same crash was reported a long time ago. https://jira.hpdd.intel.com/browse/LU-1311?focusedCommentId=35871&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-35871

      There was a important log there:
      Apr 30 08:21:41 barry-oss4 kernel: [247839.213070] wanted to write 1008 but wrote -5

      Attachments

        Activity

          [LU-5126] Kernel crashed after debug_deamon started

          My bad, the patch above was meant for lu-5216 and not lu-5126, sorry.

          bougetq Quentin Bouget (Inactive) added a comment - My bad, the patch above was meant for lu-5216 and not lu-5126, sorry.

          Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/25008
          Subject: LU-5126 hsm: cancel HSM actions when CT unregisters
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 0d7632ccfd776087258ae79482d24ba0e5c232fb

          gerrit Gerrit Updater added a comment - Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/25008 Subject: LU-5126 hsm: cancel HSM actions when CT unregisters Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0d7632ccfd776087258ae79482d24ba0e5c232fb
          yujian Jian Yu added a comment -

          Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/11454

          yujian Jian Yu added a comment - Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/11454
          emoly.liu Emoly Liu added a comment -

          The patch landed to 2.6.

          emoly.liu Emoly Liu added a comment - The patch landed to 2.6.
          pjones Peter Jones added a comment -

          Thanks Li Xi!

          Emoly

          Could you please help with this patch?

          Thanks

          Peter

          pjones Peter Jones added a comment - Thanks Li Xi! Emoly Could you please help with this patch? Thanks Peter

          Here is a patch which tries to fix this problem.
          http://review.whamcloud.com/10524

          lixi Li Xi (Inactive) added a comment - Here is a patch which tries to fix this problem. http://review.whamcloud.com/10524

          People

            emoly.liu Emoly Liu
            lixi Li Xi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: