[LU-2958] LBUG triggered in seq_client_alloc_fid() - ASSERTION( seq != ((void *)0) ) failed Created: 13/Mar/13  Updated: 20/Mar/13  Resolved: 13/Mar/13

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Prakash Surya (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: sequoia, topsequoia

Issue Links:
Duplicate
duplicates LU-2911 After upgrading from 1.8.9 to master,... Resolved
Severity: 3
Rank (Obsolete): 7213

 Description   

We're continually hitting this LBUG when running 2.3.62-2chaos.

2013-03-13 10:28:27.644618 {DefaultControlEventListener} [mmcs]{8}.5.1: LustreError: 12943:0:(fid_request.c:329:seq_client_alloc_fid()) ASSERTION( seq != ((void *)0) ) failed: 
2013-03-13 10:28:27.645378 {DefaultControlEventListener} [mmcs]{8}.5.1: LustreError: 12943:0:(fid_request.c:329:seq_client_alloc_fid()) LBUG
2013-03-13 10:28:27.645734 {DefaultControlEventListener} [mmcs]{8}.5.1: Call Trace:
2013-03-13 10:28:27.646114 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b3b0] [c000000000008160] .show_stack+0x7c/0x184 (unreliable)
2013-03-13 10:28:27.646480 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b460] [8000000000420cb8] .libcfs_debug_dumpstack+0xd8/0x150 [libcfs]
2013-03-13 10:28:27.646853 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b510] [8000000000421480] .lbug_with_loc+0x50/0xc0 [libcfs]
2013-03-13 10:28:27.647213 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b5a0] [8000000001958b80] .seq_client_alloc_fid+0x4f0/0x740 [fid]
2013-03-13 10:28:27.647576 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b6c0] [8000000001a4e310] .mdc_fid_alloc+0x140/0x1e0 [mdc]
2013-03-13 10:28:27.647944 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b770] [8000000001a65818] .mdc_intent_lock+0x508/0x838 [mdc]
2013-03-13 10:28:27.648345 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b8f0] [8000000001ce72e0] .ll_lookup_it+0x450/0xfb0 [lustre]
2013-03-13 10:28:27.648774 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268ba70] [8000000001ce7f10] .ll_lookup_nd+0xd0/0x580 [lustre]
2013-03-13 10:28:27.649207 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268bb30] [c0000000000df10c] .__lookup_hash+0x180/0x1c8
2013-03-13 10:28:27.649699 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268bbd0] [c0000000000e3690] .do_filp_open+0x260/0xadc
2013-03-13 10:28:27.650185 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268bd80] [c0000000000d1ca8] .do_sys_open+0x8c/0x18c
2013-03-13 10:28:27.650632 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268be30] [c000000000000580] syscall_exit+0x0/0x2c
2013-03-13 10:28:27.651100 {DefaultControlEventListener} [mmcs]{8}.5.1: Kernel panic - not syncing: LBUG
2013-03-13 10:28:27.651554 {DefaultControlEventListener} [mmcs]{8}.5.1: Call Trace:
2013-03-13 10:28:27.652039 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b3d0] [c000000000008160] .show_stack+0x7c/0x184 (unreliable)
2013-03-13 10:28:27.652453 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b480] [c0000000004557cc] .panic+0xb8/0x1e0
2013-03-13 10:28:27.652880 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b510] [80000000004214e0] .lbug_with_loc+0xb0/0xc0 [libcfs]
2013-03-13 10:28:27.653312 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b5a0] [8000000001958b80] .seq_client_alloc_fid+0x4f0/0x740 [fid]
2013-03-13 10:28:27.653734 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b6c0] [8000000001a4e310] .mdc_fid_alloc+0x140/0x1e0 [mdc]
2013-03-13 10:28:27.654166 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b770] [8000000001a65818] .mdc_intent_lock+0x508/0x838 [mdc]
2013-03-13 10:28:27.654595 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268b8f0] [8000000001ce72e0] .ll_lookup_it+0x450/0xfb0 [lustre]
2013-03-13 10:28:27.655016 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268ba70] [8000000001ce7f10] .ll_lookup_nd+0xd0/0x580 [lustre]
2013-03-13 10:28:27.655440 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268bb30] [c0000000000df10c] .__lookup_hash+0x180/0x1c8
2013-03-13 10:28:27.655861 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268bbd0] [c0000000000e3690] .do_filp_open+0x260/0xadc
2013-03-13 10:28:27.656281 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268bd80] [c0000000000d1ca8] .do_sys_open+0x8c/0x18c
2013-03-13 10:28:27.656711 {DefaultControlEventListener} [mmcs]{8}.5.1: [c0000003c268be30] [c000000000000580] syscall_exit+0x0/0x2c
2013-03-13 10:28:27.657108 {DefaultControlEventListener} [mmcs]{8}.5.1: LustreError: dumping log to /tmp/lustre-log.1363195707.12943


 Comments   
Comment by Andreas Dilger [ 13/Mar/13 ]

Prakash, any info on what kind of workload/operation is triggering this problem?

Comment by Prakash Surya (Inactive) [ 13/Mar/13 ]

This is on one of our production machines, so I'm unsure what kind of workload the users are putting on the system. We just recently upgraded the clients to run the new 2.3.62 tag (from 2.3.58), which I'm sure is why we're seeing it now. Also, I just noticed that the clients are mounting a file system running 2.1.2-3chaos, so this isn't exactly a supported configuration (2.3.62 clients using 2.1.2 servers).

Comment by Di Wang [ 13/Mar/13 ]

duplicate with 2911

Comment by Andreas Dilger [ 16/Mar/13 ]

Prakash, it is my expectation that 2.4 clients will work with 2.1 servers. I thought the LU-2911 problem was related to upgrade/downgrade for 1.8.9->2.4->1.8.9, but in fact it is an interop problem between 1.8.9 or 2.1.2 clients and master servers.

Comment by Ned Bass [ 20/Mar/13 ]

I think this is in fact the same problem as LU-2911. As I understand that bug, it relates to master clients talking to servers upgraded from 1.8. I discovered while investigating LU-2986 that the server in question was upgraded to 2.1 from 1.8. Consequently, the client doesn't load the LMV layer. But LU-1445 http://review.whamcloud.com/4787 removed the obd_fid_init() calls from llite, assuming that it will be taken care of in LMV. Since we don't have LMV, cl_seq is never initialized and we fail the assertion.

Generated at Sat Feb 10 01:29:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.