[LU-736] LBUG and kernel panic on client unmount Created: 04/Oct/11 Updated: 22/Jan/16 Resolved: 22/Jan/16 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Christopher Morrone | Assignee: | Christopher Morrone |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
1.8.5.0-5chaos. https://github.com/chaos/lustre/tree/1.8.5.0-5chaos |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Bugzilla ID: | 23,861 | ||||
| Rank (Obsolete): | 9743 | ||||
| Description |
|
We had a few hundred clients all LBUG and then kernel panic on unmount of a lustre filesystem recently. All the ones that I checked have the same backtrace. See the attached sierra32_console.txt. It looks like others have hit this in earlier 1.8 versions. See bugzilla.lustre.org bug 23861. |
| Comments |
| Comment by Christopher Morrone [ 04/Oct/11 ] |
|
To make this issue more searchable, the LBUG is here: 2011-09-29 07:51:30 LustreError: 19065:0:(ldlm_lock.c:1568:ldlm_lock_cancel()) ### lock still has references ns: lsa-MDT0000-mdc-ffff810332040400 lock: ffff810263a92e00/0xb23761f5d085be87 lrc: 4/0,1 mode: PW/PW res: 578792285/4020328757 rrc: 2 type: FLK pid: 21451 [0->9223372036854775807] flags: 0x22002890 remote: 0x1f055096a089059 expref: -99 pid: 21451 timeout: 0 |
| Comment by Peter Jones [ 04/Oct/11 ] |
|
HongChao Could you please look into this one? Thanks Peter |
| Comment by Peter Jones [ 13/Oct/11 ] |
|
Hongchao Could you please provide a status update? Thanks Peter |
| Comment by Hongchao Zhang [ 18/Oct/11 ] |
|
the readers/writers of the flock's LDLM lock isn't zero until it is canceled by unlock request, the reference will only be in this case, the flag of the lock is "0x22002890", only contains LDLM_FL_FAILED, no LDLM_FL_LOCAL_ONLY. and during umount if there are flock's LDLM lock during umount and obd->obd_force isn't set, then this issue wil be triggered. Hi Chris, |
| Comment by Hongchao Zhang [ 18/Oct/11 ] |
|
Hi Chris, Thanks |
| Comment by Christopher Morrone [ 20/Oct/11 ] |
|
I will find out what the admins did to umount lustre. It is going to be rather difficult to track down whether an of the various applications are using flock, and how. Most of our users won't know the answer to that that, even if there application IS using flock. Perhaps it is relevant that we are mounting with the "flock" option enabled. |
| Comment by Christopher Morrone [ 20/Oct/11 ] |
|
As far as they can recall, they did not use the umount -f option. |
| Comment by Hongchao Zhang [ 13/Apr/12 ] |
|
the initial patch is tracked at http://review.whamcloud.com/#change,2535 |
| Comment by Christopher Morrone [ 13/Apr/12 ] |
|
Thanks. FYI unless this is also a problem for 2.1, this ticket is very low priority compared to our many 2.1 bugs. We do not plan to fix any 1.8 bugs in production. |
| Comment by Hongchao Zhang [ 21/Jan/16 ] |
|
Hi Chris, |
| Comment by Christopher Morrone [ 21/Jan/16 ] |
|
This is so old that I think you can close it with resolution "Won't Fix". |
| Comment by Hongchao Zhang [ 22/Jan/16 ] |
|
Chris, Thanks! |