Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4083

lod_lov.c:824:lod_load_striping()) ASSERTION( lo->ldo_stripenr == 0 ) failed

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.5.0
    • None
    • 3
    • 10975

    Description

      I saw this crash when I was running racer.

      LustreError: 17343:0:(lod_lov.c:824:lod_load_striping()) ASSERTION( lo->ldo_stripenr == 0 ) failed:
      LustreError: 17343:0:(lod_lov.c:824:lod_load_striping()) LBUG
      Pid: 17343, comm: mdt03_007

      Call Trace:
      [<ffffffffa04c2895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [<ffffffffa04c2e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      [<ffffffffa0efded3>] lod_load_striping+0x383/0x4b0 [lod]
      [<ffffffffa0f08bab>] lod_declare_object_destroy+0x16b/0x390 [lod]
      [<ffffffffa0c972a0>] mdd_declare_finish_unlink+0x90/0x170 [mdd]
      [<ffffffffa0ca0579>] mdd_rename+0x1eb9/0x2390 [mdd]
      [<ffffffffa0e10143>] mdt_reint_rename+0x1383/0x1bf0 [mdt]
      [<ffffffffa066ad60>] ? lu_ucred+0x20/0x30 [obdclass]
      [<ffffffffa0e0aea1>] mdt_reint_rec+0x41/0xe0 [mdt]
      [<ffffffffa0df2c93>] mdt_reint_internal+0x4c3/0x780 [mdt]
      [<ffffffffa0df2f94>] mdt_reint+0x44/0xe0 [mdt]
      [<ffffffffa0df5a8a>] mdt_handle_common+0x52a/0x1470 [mdt]
      [<ffffffffa0e2fc45>] mds_regular_handle+0x15/0x20 [mdt]
      [<ffffffffa07d9e25>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
      [<ffffffffa04d427f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      [<ffffffffa07d14c9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
      [<ffffffffa07db18d>] ptlrpc_main+0xaed/0x1740 [ptlrpc]
      [<ffffffffa07da6a0>] ? ptlrpc_main+0x0/0x1740 [ptlrpc]
      [<ffffffff81096a36>] kthread+0x96/0xa0
      [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [<ffffffff810969a0>] ? kthread+0x0/0xa0
      [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

      LustreError: dumping log to /tmp/lustre-log.1381356940.17343

      Attachments

        Issue Links

          Activity

            [LU-4083] lod_lov.c:824:lod_load_striping()) ASSERTION( lo->ldo_stripenr == 0 ) failed

            FWIW we have this this with a production lustre system running 2.5.0

            Will apply the patch and LU2789 and move on.

            sdm900 Stuart Midgley (Inactive) added a comment - FWIW we have this this with a production lustre system running 2.5.0 Will apply the patch and LU2789 and move on.

            Is this a duplicate of https://jira.hpdd.intel.com/browse/LU-2789 ?

            From the fixes, they tentatively appear to take the same lock, but around different operations. Is it the same race condition or a different one?

            paf Patrick Farrell (Inactive) added a comment - Is this a duplicate of https://jira.hpdd.intel.com/browse/LU-2789 ? From the fixes, they tentatively appear to take the same lock, but around different operations. Is it the same race condition or a different one?

            Patch landed to master.

            jamesanunez James Nunez (Inactive) added a comment - Patch landed to master.
            green Oleg Drokin added a comment -

            I just want to add that I also hit this pretty frequently and it disrupts my testing. As such I am increasing priority to critical.

            green Oleg Drokin added a comment - I just want to add that I also hit this pretty frequently and it disrupts my testing. As such I am increasing priority to critical.
            jamesanunez James Nunez (Inactive) added a comment - Patch at: http://review.whamcloud.com/7919
            pjones Peter Jones added a comment -

            James

            Could you please upload this patch into gerrit on behalf of Jinshan?

            Peter

            pjones Peter Jones added a comment - James Could you please upload this patch into gerrit on behalf of Jinshan? Peter

            After applying this patch, the issue went away:

            diff --git a/lustre/lod/lod_qos.c b/lustre/lod/lod_qos.c
            index e7b1de0..49575b7 100644
            --- a/lustre/lod/lod_qos.c
            +++ b/lustre/lod/lod_qos.c
            @@ -813,6 +813,7 @@ repeat_find:
                            rc = 0;
                    } else {
                            /* nobody provided us with a single object */
            +               lo->ldo_stripenr = 0;
                            rc = -ENOSPC;
                    }
             
            
            jay Jinshan Xiong (Inactive) added a comment - After applying this patch, the issue went away: diff --git a/lustre/lod/lod_qos.c b/lustre/lod/lod_qos.c index e7b1de0..49575b7 100644 --- a/lustre/lod/lod_qos.c +++ b/lustre/lod/lod_qos.c @@ -813,6 +813,7 @@ repeat_find: rc = 0; } else { /* nobody provided us with a single object */ + lo->ldo_stripenr = 0; rc = -ENOSPC; }

            People

              jamesanunez James Nunez (Inactive)
              jay Jinshan Xiong (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: