[LU-1378] ASSERTION(range_is_sane(&seq->lcs_space)) failed Created: 04/May/12 Updated: 22/Apr/13 Resolved: 11/Jun/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.3.0, Lustre 2.1.3 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Christopher Morrone | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
PPC32 clients running 1.8.5-5chaos, x86_64 servers running https://github.com/chaos/lustre/commits/2.1.1-4chaos |
||
| Severity: | 2 |
| Rank (Obsolete): | 4556 |
| Description |
|
Our BG/P I/O Nodes (IONs, ppc32), which are lustre clients are hitting the following assertions now that they are talking to 2.1 servers: mdc_fid.c:110:seq_client_rpc()) cli-srv-ls2-MDT0000-mdc-c1298800: Invalid range received from server:[0xffe959f603000000 - 0x00ea59f603000000) mdc_fid.c:154:seq_client_alloc_seq()) cli-srv-ls2-MDT0000-mdc-c1298800: Can't allocate new meta-sequence, rc -22 mdc_fid.c:193:seq_client_alloc_seq()) cli-srv-ls2-MDT0000-mdc-c1298800: Can't allocate new sequence, rc -22 mdc_locks.c:614:mdc_enqueue()) fid allocation result: -22 mdc_fix.c:148:seq_cleint_alloc_seq()) ASSERTION(range_is_sane(&seq->lcs_space)) failed I had to transpose that by hand, so please forgive any typos. But I tried to get the range on the first line correct, because it looks fishy at first glance. This is high priority, because a major user of our BG/P system is dead in the water until this is fixed. |
| Comments |
| Comment by Andreas Dilger [ 04/May/12 ] |
|
This looks like it could be a swabbing error in the code: [0xffe959f603000000 - 0x00ea59f603000000) (gdb) p 0x306f959eff though to be honest I have no idea what this code is doing. This is the sequence server, I'm not familiar enough with the code to be able to immediately jump to where this should be swabbed. Of other interest is the actual FID value here. The starting FID_SEQ_NORMAL is 0x200000400. It would appear Alternately, it is possible that the MDS is not tracking its own super-sequence correctly, and is getting a new 1B sequence on every mount, and has mounted 0x28 = 40 times? While not an immediate and fatal problem, the code is currently implemented to have a relatively small number of super-sequences in use at one time, so if there is one allocated for each mount it may potentially cause problems down the line with DNE. |
| Comment by Christopher Morrone [ 04/May/12 ] |
|
Yeah, I was suspicious of swabbing too. FIDs are still new to me. I'll try to find the super-sequence. We have probably less than 10,000 clients total. The BG/P machine only mounts from IO nodes, so that keeps our Lustre client count lower. The BG/P clients are fewer than 1000, but they do reboot more frequently (an do not cleanly unmount when they do). Even so, I don't think that we have gotten to anywhere near 100B client mounts. Unless a client reconnect can trigger a new sequence number. We certainly have had a tremendous number of client reconnects to servers. It might be possible that the MDS has rebooted 40 times in the past few months...we have certainly hit quite a few assertions, and sometimes servers get into a state where they reboot and assert again a few times before getting a clean boot. 40 sounds a LITTLE high, but at least its more in the ballpark than the number of client mounts. |
| Comment by Di Wang [ 04/May/12 ] |
|
Yeah, unfortunately, 1.8 does not swab the lu_seq after it retrieve from the reply here. Fortunately, we do swab it on 2.x. I will cook a patch here. |
| Comment by Christopher Morrone [ 04/May/12 ] |
|
Ah ok. We basically consider 1.8 end-of-life at LLNL. BG/P is the final hold out because cross compiling there is a huge pain. But Ned is going to start on the task of getting 2.1 built for the BG/P systems. But we've not tested 2.1 much on ppc32 (it had some testing on ppc64 on the BG/Q systems), so it may take us a few weeks to work through 2.1 issues on that platform, install it on the open-side machines to gain confidence, and then finally on the large close-side machine. So a 1.8 patch to allow us to survive until 2.1 is ready for BG/P may be required. |
| Comment by Di Wang [ 04/May/12 ] |
|
Patch are here. http://review.whamcloud.com/2655 |
| Comment by Andreas Dilger [ 04/May/12 ] |
|
Di, However, once we get to DNE with multiple MDTs and OSTs consuming super sequence ranges, it has the possibility to explode the number of FLDB entries and cause unnecessary overhead on the servers and all of the clients. |
| Comment by Di Wang [ 05/May/12 ] |
|
I do not have clue neither. Server reboot should not consume FID at all, and normal client reconnect will not consume FID neither. But if the import has been refreshed during the reconnect (either evicted or inactive/active manually), it will consume the FID, i.e. client will throw away its current seq and request a new sequence. Chris, do you know whether those reconnects(mentioned in your comments) are "normal" or follow after a eviction? Btw: For DNE, we might need put FIDs into the last rcvd, so if the MDTs or OSTs reboot during the super sequence allocation, resend request will not consume the new sequence. And also it might be useful to expose FLDB under /proc or have a fldb reader like llog_read. So we can know how the fid sequence is being used online/offline. |
| Comment by Christopher Morrone [ 07/May/12 ] |
I think that they are mostly "normal". |
| Comment by Christopher Morrone [ 07/May/12 ] |
|
I see seq and fld proc directories, but I don't know what these things mean. Is there anything useful in /proc that I should share with you? |
| Comment by Di Wang [ 07/May/12 ] |
|
Could you please post /proc/fs/lustre/seq/srv-XXX-MDT0000/space and /proc/fs/lustre/seq/ctl-XXX-MDT0000/space here? |
| Comment by Christopher Morrone [ 07/May/12 ] |
|
/proc/fs/lustre/seq/srv-ls2-MDT0000/space is: [0x3fdf8a1cf-0x400000400]:0:0 /proc/fs/lustre/seq/ctl-ls2-MDT0000/space is: [0x400000400-0xffffffffffffffff]:0:0 |
| Comment by Di Wang [ 07/May/12 ] |
|
Hmm, 0x306f959eff > 0x400000400 is clearly out of the range of current allocated space. So it only allocates 8 billion sequence, instead of 200 billion. But still quite a lot. I am investigating the code now. So it is very strange we got this gigantic range here. Is it possible you could provide level -1 debug log of touching a file with ppc clients? (with the patch http://review.whamcloud.com/2655) Thanks. |
| Comment by Christopher Morrone [ 07/May/12 ] |
|
We cannot get that patch into production very quickly. Also, turning on extra debugging on our production filesystems has killed the filesystem before, so I am very reluctant to do that. We would need to schedule a filesystem down time to do that. I would prefer that we pursue other methods first. |
| Comment by Di Wang [ 07/May/12 ] |
|
Ah, ok. Btw: are there any suspicious error msg on MDT during that time? |
| Comment by Christopher Morrone [ 07/May/12 ] |
|
Oh, wait, did you mean you want a debug of -1 on the client or on the MDS? I was thinking the MDS since thats were I was just looking, but now I realize you might have wanted it on the client. I can probably do a higher debug level on a client on the open side. |
| Comment by Di Wang [ 07/May/12 ] |
|
Ah, convert [0xffe959f603000000 - 0x00ea59f603000000) to big endian should be [0x3f659e9ff - 0x3f659ea00), instead of [0x306f959eff - 0x306f95ae00) so we are good if the range has been swab. Just need to understand why MDT requests 8 billion fid sequence here. |
| Comment by Di Wang [ 07/May/12 ] |
|
Oh, no need debug log now. The reason I want debug log is for understanding why the request seq is outside of available fid space. But since the seq range[0x3f659e9ff- 0x3f659ea00) just locates in the [0x3fdf8a1cf-0x400000400]. so we are good here. |
| Comment by Christopher Morrone [ 07/May/12 ] |
|
Ah good. And I don't see anything unusual on the MDT log during the times these things happen. |
| Comment by Di Wang [ 09/May/12 ] |
|
http://review.whamcloud.com/#change,2701 This patch will add some console message for fid when new super sequence is allocated, since the super sequence are huge 1 billion, these msg will be very rare. |
| Comment by Christopher Morrone [ 14/May/12 ] |
|
I've pulled http://review.whamcloud.com/#change,2701 into our branch. It will be in 2.1.1-12chaos. |
| Comment by Christopher Morrone [ 04/Jun/12 ] |
|
FYI, so far I have not seen this assertion since the patched 1.8 went on the large BG/P system, dawn. |
| Comment by Peter Jones [ 11/Jun/12 ] |
|
Landed for 2.3 |
| Comment by Di Wang [ 22/Jun/12 ] |
|
Add another patch http://review.whamcloud.com/#change,3175 to export fldb under /proc |
| Comment by Di Wang [ 22/Apr/13 ] |