Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3347

(local_storage.c:872:local_oid_storage_init()) ASSERTION( (*los)->los_last_oid >= first_oid ) failed: 0 < 1

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.5.0
    • None
    • None
    • lbug encountered during normal review testing
    • 3
    • 8276

    Description

      This is from conf-sanity test test_32a. There is lots of other badness going on in conf-sanity and I am not sure how much THIS error occurs.

      It may be related to LU-2200 Test failure on test suite conf-sanity, subtest test_32a :

      The test run: https://maloo.whamcloud.com/test_sets/32b2ffc4-bd3c-11e2-9324-52540035b04c

      Highlight lbug:

      21:39:35:Lustre: DEBUG MARKER: mount -t lustre -o loop,mgsnode=10.10.4.198@tcp /tmp/t32/ost /tmp/t32/mnt/ost
      21:39:35:LDISKFS-fs (loop1): mounted filesystem with ordered data mode. quota=off. Opts: 
      21:39:35:LustreError: 23362:0:(local_storage.c:872:local_oid_storage_init()) ASSERTION( (*los)->los_last_oid >= first_oid ) failed: 0 < 1
      21:39:35:LustreError: 23362:0:(local_storage.c:872:local_oid_storage_init()) LBUG
      21:39:35:Pid: 23362, comm: mount.lustre
      21:39:35:
      21:39:35:Call Trace:
      21:39:35: [<ffffffffa0478895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      21:39:35: [<ffffffffa0478e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      21:39:35: [<ffffffffa05ca646>] local_oid_storage_init+0x426/0xe50 [obdclass]
      21:39:35: [<ffffffffa05a3660>] llog_osd_setup+0xc0/0x360 [obdclass]
      21:39:35: [<ffffffffa05a0162>] llog_setup+0x352/0x920 [obdclass]
      21:39:35: [<ffffffffa0d3508b>] mgc_set_info_async+0x12eb/0x1970 [mgc]
      21:39:35: [<ffffffffa04892c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      21:39:35: [<ffffffffa0607f70>] server_mgc_set_fs+0x120/0x520 [obdclass]
      21:39:35: [<ffffffffa060e9a5>] server_start_targets+0x85/0x19c0 [obdclass]
      21:39:35: [<ffffffffa0483d88>] ? libcfs_log_return+0x28/0x40 [libcfs]
      21:39:35: [<ffffffffa05dfc40>] ? lustre_start_mgc+0x4e0/0x1ee0 [obdclass]
      21:39:35: [<ffffffffa04892c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      21:39:35: [<ffffffffa0610e8c>] server_fill_super+0xbac/0x1660 [obdclass]
      21:39:35: [<ffffffffa05e1818>] lustre_fill_super+0x1d8/0x530 [obdclass]
      21:39:35: [<ffffffffa05e1640>] ? lustre_fill_super+0x0/0x530 [obdclass]
      21:39:35: [<ffffffff811842bf>] get_sb_nodev+0x5f/0xa0
      21:39:35: [<ffffffffa05d91b5>] lustre_get_sb+0x25/0x30 [obdclass]
      21:39:35: [<ffffffff811838fb>] vfs_kern_mount+0x7b/0x1b0
      21:39:35: [<ffffffff81183aa2>] do_kern_mount+0x52/0x130
      21:39:35: [<ffffffff811a3cf2>] do_mount+0x2d2/0x8d0
      21:39:35: [<ffffffff811a4380>] sys_mount+0x90/0xe0
      21:39:35: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Issue Links

          Activity

            [LU-3347] (local_storage.c:872:local_oid_storage_init()) ASSERTION( (*los)->los_last_oid >= first_oid ) failed: 0 < 1
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-2059 [ LU-2059 ]
            jlevi Jodi Levi (Inactive) made changes -
            Fix Version/s New: Lustre 2.5.0 [ 10295 ]
            tappro Mikhail Pershin made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: In Progress [ 3 ] New: Resolved [ 5 ]

            I did not hit the issue again when I did a retest.

            keith Keith Mannthey (Inactive) added a comment - I did not hit the issue again when I did a retest.
            tappro Mikhail Pershin made changes -
            Status Original: Open [ 1 ] New: In Progress [ 3 ]

            Keith, please rebase that patch again, I've fixed this issue in http://review.whamcloud.com/5049 which is top of patch set

            tappro Mikhail Pershin added a comment - Keith, please rebase that patch again, I've fixed this issue in http://review.whamcloud.com/5049 which is top of patch set
            keith Keith Mannthey (Inactive) added a comment - - edited

            I "associated" the single issue. With review-dne I didn't see a way but the error messages where the same for the 2 I see.

            keith Keith Mannthey (Inactive) added a comment - - edited I "associated" the single issue. With review-dne I didn't see a way but the error messages where the same for the 2 I see.

            I am not sure if will reproduce or not. It is a pretty large patch that triggered it but there are alot of timeout errors with this conf-sanity and this test_32a.

            http://review.whamcloud.com/5512 is the patch set: The patch seems like it could have caused it but with so many timeouts and this test I opened the LU to track the issue.

            Maloo tells me in the last 4 weeks (master review ldisks) there have been 3 in the last 24 hours and non before that...

            2 were review-dne and this one. So far it has been a one shot issue with a large patch set on Ldiskfs/Master.

            I can submit the Assert change if you want it in Master.

            keith Keith Mannthey (Inactive) added a comment - I am not sure if will reproduce or not. It is a pretty large patch that triggered it but there are alot of timeout errors with this conf-sanity and this test_32a. http://review.whamcloud.com/5512 is the patch set: The patch seems like it could have caused it but with so many timeouts and this test I opened the LU to track the issue. Maloo tells me in the last 4 weeks (master review ldisks) there have been 3 in the last 24 hours and non before that... 2 were review-dne and this one. So far it has been a one shot issue with a large patch set on Ldiskfs/Master. I can submit the Assert change if you want it in Master.

            Keith, also, if you file a bug related to a failure in Maloo, please "Associate" the bug with the failed test, and search all of the other recent failures of the same test (e.g. in the past 2 weeks) and Associate the same bug with those as well. This is easily done in Maloo with Results->Search->Name=conf-sanity,Status=TIMEOUT,ResultsWithin=2weeks and then looking to see which ones failed in test_32a and verifying those have the same ASSERT failure in the MDS console log.

            adilger Andreas Dilger added a comment - Keith, also, if you file a bug related to a failure in Maloo, please "Associate" the bug with the failed test, and search all of the other recent failures of the same test (e.g. in the past 2 weeks) and Associate the same bug with those as well. This is easily done in Maloo with Results->Search->Name=conf-sanity,Status=TIMEOUT,ResultsWithin=2weeks and then looking to see which ones failed in test_32a and verifying those have the same ASSERT failure in the MDS console log.

            Keith, if this is repeatable, could you please submit a quick patch to change the LASSERT() to LASSERTF() and print out the actual values in this condition? That would make debugging this much easier.

            This LASSERT() was just recently added in LU-2886 patch http://review.whamcloud.com/6199, so it might represent a regression that was just introduced by that patch.

            adilger Andreas Dilger added a comment - Keith, if this is repeatable, could you please submit a quick patch to change the LASSERT() to LASSERTF() and print out the actual values in this condition? That would make debugging this much easier. This LASSERT() was just recently added in LU-2886 patch http://review.whamcloud.com/6199 , so it might represent a regression that was just introduced by that patch.

            People

              tappro Mikhail Pershin
              keith Keith Mannthey (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: