[LU-2962] MGT umount locks up when taking down the file system. Created: 14/Mar/13  Updated: 18/Jul/13  Resolved: 18/Jul/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: James A Simmons Assignee: Bruno Faccini (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 7225

 Comments   
Comment by Bruno Faccini (Inactive) [ 14/Mar/13 ]

Hello James,
I am afraid the vmcore is not enough and we'll also need the vmlinux (likely from kernel-debuginfo) and lustre-modules you run.

Comment by James A Simmons [ 15/Mar/13 ]

I also pushed all the debug rpms into uploads/LU-2962/. These are the ones used for the test shot. Let me know if you need anything else.

Comment by Bruno Faccini (Inactive) [ 15/Mar/13 ]

Thank's James, will get these new datas and let you know asap.

Comment by Bruno Faccini (Inactive) [ 18/Mar/13 ]

James,
In the crash-dump you uploaded, there is no MGS mount nor Lustre modules loaded !!...
BTW it has been taken on Jan 16th, after only 224s of uptime:

      KERNEL: usr/lib/debug/lib/modules/2.6.32-279.14.1.el6.head.x86_64/vmlinux
    DUMPFILE: ./vmcore-mgs  [PARTIAL DUMP]
        CPUS: 8
        DATE: Wed Jan 16 14:21:18 2013
      UPTIME: 00:03:44
LOAD AVERAGE: 2.20, 1.44, 0.60
       TASKS: 285
    NODENAME: widow-mgs2
     RELEASE: 2.6.32-279.14.1.el6.head.x86_64
     VERSION: #1 SMP Wed Nov 21 13:02:32 EST 2012
     MACHINE: x86_64  (2327 Mhz)
      MEMORY: 16 GB
       PANIC: "[  224.469800] Oops: 0002 [#1] SMP " (check log for details)

I don't think this could be the crash-dump taken from the situation you described in this ticket ...
Can you double-check ??

Comment by James A Simmons [ 19/Mar/13 ]

Sure as soon as admin comes in tomorrow I will get the proper core dump. Sorry about that.

Comment by Bruno Faccini (Inactive) [ 19/Mar/13 ]

No problem James.

Comment by James A Simmons [ 21/Mar/13 ]

I have bad news. The admin never got a crash dump We are going to have to reproduce it at a smaller scale.

Comment by Bruno Faccini (Inactive) [ 08/Apr/13 ]

James, Do you agree that we can reduce ticket's priority to Major ?

Comment by James A Simmons [ 08/Apr/13 ]

I figure the reason for this was that it only occurs are very large scale so it doesn't impact every one. Now it would be nice to see this fixed before the release I have a test shot this week end coming up so we can see if the problem still exist and if it does we can get a crash dump this time.

Comment by James A Simmons [ 23/Apr/13 ]

For the last test shot we did not get a lock up so that is the good news. It did take 5 minutes to unmount the MGT but it is not a show stopper for us.

Comment by Bruno Faccini (Inactive) [ 17/Jul/13 ]

James, what about this quite old ticket, can we close it as never reproduced ?

Comment by James A Simmons [ 18/Jul/13 ]

Yes please close the ticket.

Comment by Bruno Faccini (Inactive) [ 18/Jul/13 ]

Thanks for your feedback James.

Generated at Sat Feb 10 01:29:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.