[LU-9428] ASSERTION( de->d_op == &ll_d_ops) Created: 02/May/17 Updated: 29/Jun/17 Resolved: 29/Jun/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Frederik Ferner (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL6 server |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
We have recently seen frequent occurrences of the LBUG below. The affected machines are all exporting our Lustre file system via NFS to other Linux machines. May 2 06:59:03 i05-storage1 kernel: LustreError: 3023:0:(dcache.c:236:ll_d_init()) ASSERTION( de->d_op == &ll_d_ops ) failed: May 2 06:59:03 i05-storage1 kernel: LustreError: 3023:0:(dcache.c:236:ll_d_init()) LBUG May 2 06:59:03 i05-storage1 kernel: Pid: 3023, comm: nfsd May 2 06:59:03 i05-storage1 kernel: May 2 06:59:03 i05-storage1 kernel: Call Trace: May 2 06:59:03 i05-storage1 kernel: [<ffffffffa0383895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] May 2 06:59:03 i05-storage1 kernel: [<ffffffffa0383e97>] lbug_with_loc+0x47/0xb0 [libcfs] May 2 06:59:03 i05-storage1 kernel: [<ffffffffa097e69f>] ll_d_init+0x2ff/0x540 [lustre] May 2 06:59:03 i05-storage1 kernel: [<ffffffffa09c1b5b>] ll_iget_for_nfs+0x20b/0x300 [lustre] May 2 06:59:03 i05-storage1 kernel: [<ffffffffa09c1d89>] ll_fh_to_dentry+0x99/0xa0 [lustre] May 2 06:59:03 i05-storage1 kernel: [<ffffffffa0b3871c>] exportfs_decode_fh+0x5c/0x2bc [exportfs] May 2 06:59:03 i05-storage1 kernel: [<ffffffffa0bcc8e0>] ? nfsd_acceptable+0x0/0x120 [nfsd] May 2 06:59:03 i05-storage1 kernel: [<ffffffffa0b56da0>] ? cache_check+0x60/0x370 [sunrpc] May 2 06:59:03 i05-storage1 kernel: [<ffffffff8117f76b>] ? cache_alloc_refill+0x15b/0x240 May 2 06:59:03 i05-storage1 kernel: [<ffffffffa0bccdda>] fh_verify+0x32a/0x640 [nfsd] May 2 06:59:03 i05-storage1 kernel: [<ffffffffa0bcfda1>] nfsd_open+0x31/0x240 [nfsd] May 2 06:59:03 i05-storage1 kernel: [<ffffffffa0bd022b>] nfsd_commit+0x3b/0xa0 [nfsd] May 2 06:59:03 i05-storage1 kernel: [<ffffffff810aff24>] ? groups_free+0x54/0x60 May 2 06:59:03 i05-storage1 kernel: [<ffffffffa0bd769d>] nfsd3_proc_commit+0x9d/0x100 [nfsd] May 2 06:59:03 i05-storage1 kernel: [<ffffffffa0bc9405>] nfsd_dispatch+0xe5/0x230 [nfsd] May 2 06:59:03 i05-storage1 kernel: [<ffffffffa0b4ccf4>] svc_process_common+0x344/0x640 [sunrpc] May 2 06:59:03 i05-storage1 kernel: [<ffffffff8106c500>] ? default_wake_function+0x0/0x20 May 2 06:59:03 i05-storage1 kernel: [<ffffffffa0b4d390>] svc_process+0x110/0x160 [sunrpc] May 2 06:59:03 i05-storage1 kernel: [<ffffffffa0bc9c82>] nfsd+0xc2/0x160 [nfsd] May 2 06:59:03 i05-storage1 kernel: [<ffffffffa0bc9bc0>] ? nfsd+0x0/0x160 [nfsd] May 2 06:59:03 i05-storage1 kernel: [<ffffffff810a640e>] kthread+0x9e/0xc0 May 2 06:59:03 i05-storage1 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 May 2 06:59:03 i05-storage1 kernel: [<ffffffff810a6370>] ? kthread+0x0/0xc0 May 2 06:59:03 i05-storage1 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 May 2 06:59:03 i05-storage1 kernel: This looks similar to We are still investigating the events leading to the crash, hoping for a reproducer.... |
| Comments |
| Comment by Peter Jones [ 02/May/17 ] |
|
Lai Can you please assist with this issue? Thanks Peter |
| Comment by Frederik Ferner (Inactive) [ 11/May/17 ] |
|
Any updates to this? We are still seeing this frequently, though unfortunately haven't been able to detect a pattern or develop a reproducer yet, however it is definitely affecting our users. Thanks, |
| Comment by Frederik Ferner (Inactive) [ 18/May/17 ] |
|
I noticed that the patch in Can we get feedback if it should be safe to cherry-pick this commit and test on our clients? thanks, |
| Comment by Lai Siyao [ 18/May/17 ] |
|
Yes, it's safe to cherry-pick to 2.7. It's a trivial fix to client code. |
| Comment by Frederik Ferner (Inactive) [ 19/May/17 ] |
|
Thanks for confirming. We have rebuild our client with this patch applied and have started testing. As we don't have a know reproducer and it seems quite unpredictable when it happens, it will take a while until we can be confident that this fixed our problem. We'll report back. Frederik |
| Comment by Peter Jones [ 21/Jun/17 ] |
|
Frederik Has this been long enough to ascertain whether the fix has helped? Peter |
| Comment by Frederik Ferner (Inactive) [ 29/Jun/17 ] |
|
Peter, All, apologies for the delay, I've been away. Without a clear reproducer it is always going to be hard to be absolutely sure and the problem seems to come and go in waves. However, we have so far not seen this problem on a NFS server running the patched version. So I feel confident to say it's looking good so far, it certainly seems to have helped. Thanks, |
| Comment by Peter Jones [ 29/Jun/17 ] |
|
Thanks Frederik. Let's close out this ticket for now then and open a new one if you do ever get a reoccurence. |