[LU-3347] (local_storage.c:872:local_oid_storage_init()) ASSERTION( (*los)->los_last_oid >= first_oid ) failed: 0 < 1 Created: 15/May/13  Updated: 21/Oct/13  Resolved: 16/Sep/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.5.0

Type: Bug Priority: Critical
Reporter: Keith Mannthey (Inactive) Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: None
Environment:

lbug encountered during normal review testing


Issue Links:
Related
is related to LU-2059 mgc to backup configuration on osd-ba... Resolved
is related to LU-3316 ASSERTION(list_empty(&ls->ls_los_list... Resolved
Severity: 3
Rank (Obsolete): 8276

 Description   

This is from conf-sanity test test_32a. There is lots of other badness going on in conf-sanity and I am not sure how much THIS error occurs.

It may be related to LU-2200 Test failure on test suite conf-sanity, subtest test_32a :

The test run: https://maloo.whamcloud.com/test_sets/32b2ffc4-bd3c-11e2-9324-52540035b04c

Highlight lbug:

21:39:35:Lustre: DEBUG MARKER: mount -t lustre -o loop,mgsnode=10.10.4.198@tcp /tmp/t32/ost /tmp/t32/mnt/ost
21:39:35:LDISKFS-fs (loop1): mounted filesystem with ordered data mode. quota=off. Opts: 
21:39:35:LustreError: 23362:0:(local_storage.c:872:local_oid_storage_init()) ASSERTION( (*los)->los_last_oid >= first_oid ) failed: 0 < 1
21:39:35:LustreError: 23362:0:(local_storage.c:872:local_oid_storage_init()) LBUG
21:39:35:Pid: 23362, comm: mount.lustre
21:39:35:
21:39:35:Call Trace:
21:39:35: [<ffffffffa0478895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
21:39:35: [<ffffffffa0478e97>] lbug_with_loc+0x47/0xb0 [libcfs]
21:39:35: [<ffffffffa05ca646>] local_oid_storage_init+0x426/0xe50 [obdclass]
21:39:35: [<ffffffffa05a3660>] llog_osd_setup+0xc0/0x360 [obdclass]
21:39:35: [<ffffffffa05a0162>] llog_setup+0x352/0x920 [obdclass]
21:39:35: [<ffffffffa0d3508b>] mgc_set_info_async+0x12eb/0x1970 [mgc]
21:39:35: [<ffffffffa04892c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
21:39:35: [<ffffffffa0607f70>] server_mgc_set_fs+0x120/0x520 [obdclass]
21:39:35: [<ffffffffa060e9a5>] server_start_targets+0x85/0x19c0 [obdclass]
21:39:35: [<ffffffffa0483d88>] ? libcfs_log_return+0x28/0x40 [libcfs]
21:39:35: [<ffffffffa05dfc40>] ? lustre_start_mgc+0x4e0/0x1ee0 [obdclass]
21:39:35: [<ffffffffa04892c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
21:39:35: [<ffffffffa0610e8c>] server_fill_super+0xbac/0x1660 [obdclass]
21:39:35: [<ffffffffa05e1818>] lustre_fill_super+0x1d8/0x530 [obdclass]
21:39:35: [<ffffffffa05e1640>] ? lustre_fill_super+0x0/0x530 [obdclass]
21:39:35: [<ffffffff811842bf>] get_sb_nodev+0x5f/0xa0
21:39:35: [<ffffffffa05d91b5>] lustre_get_sb+0x25/0x30 [obdclass]
21:39:35: [<ffffffff811838fb>] vfs_kern_mount+0x7b/0x1b0
21:39:35: [<ffffffff81183aa2>] do_kern_mount+0x52/0x130
21:39:35: [<ffffffff811a3cf2>] do_mount+0x2d2/0x8d0
21:39:35: [<ffffffff811a4380>] sys_mount+0x90/0xe0
21:39:35: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b


 Comments   
Comment by Jodi Levi (Inactive) [ 15/May/13 ]

Mike,
Could you please comment on this one?
Thank you!

Comment by Andreas Dilger [ 15/May/13 ]

Keith, if this is repeatable, could you please submit a quick patch to change the LASSERT() to LASSERTF() and print out the actual values in this condition? That would make debugging this much easier.

This LASSERT() was just recently added in LU-2886 patch http://review.whamcloud.com/6199, so it might represent a regression that was just introduced by that patch.

Comment by Andreas Dilger [ 15/May/13 ]

Keith, also, if you file a bug related to a failure in Maloo, please "Associate" the bug with the failed test, and search all of the other recent failures of the same test (e.g. in the past 2 weeks) and Associate the same bug with those as well. This is easily done in Maloo with Results->Search->Name=conf-sanity,Status=TIMEOUT,ResultsWithin=2weeks and then looking to see which ones failed in test_32a and verifying those have the same ASSERT failure in the MDS console log.

Comment by Keith Mannthey (Inactive) [ 15/May/13 ]

I am not sure if will reproduce or not. It is a pretty large patch that triggered it but there are alot of timeout errors with this conf-sanity and this test_32a.

http://review.whamcloud.com/5512 is the patch set: The patch seems like it could have caused it but with so many timeouts and this test I opened the LU to track the issue.

Maloo tells me in the last 4 weeks (master review ldisks) there have been 3 in the last 24 hours and non before that...

2 were review-dne and this one. So far it has been a one shot issue with a large patch set on Ldiskfs/Master.

I can submit the Assert change if you want it in Master.

Comment by Keith Mannthey (Inactive) [ 16/May/13 ]

I "associated" the single issue. With review-dne I didn't see a way but the error messages where the same for the 2 I see.

Comment by Mikhail Pershin [ 16/May/13 ]

Keith, please rebase that patch again, I've fixed this issue in http://review.whamcloud.com/5049 which is top of patch set

Comment by Keith Mannthey (Inactive) [ 20/May/13 ]

I did not hit the issue again when I did a retest.

Generated at Sat Feb 10 01:33:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.