Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.1, Lustre 2.5.0
    • Lustre 2.4.0, Lustre 2.5.0
    • 3
    • 8703

    Description

      Hi,

      We have been testing v2.4 and have hit this LBUG which we have never experienced in v1.8.x for similar workloads. It looks like it is related to do an rm/unlink on certain files. I had to abort recovery and stop the ongoing file deletion in order to keep the MDS from repeatedly crashing with the same LBUG. We can supply more debug info should you need it.

      Cheers,

      Daire

      <0>LustreError: 6274:0:(linkea.c:169:linkea_links_find()) ASSERTION( ldata->ld_leh != ((void *)0) ) failed:
      <0>LustreError: 6274:0:(linkea.c:169:linkea_links_find()) LBUG
      <4>Pid: 6274, comm: mdt01_004
      <4>
      <4>Call Trace:
      <4> [<ffffffffa044b895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa044be97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa05b47d6>] linkea_links_find+0x186/0x190 [obdclass]
      <4> [<ffffffffa0b87206>] ? mdo_xattr_get+0x26/0x30 [mdd]
      <4> [<ffffffffa0b8a645>] mdd_linkea_prepare+0x95/0x430 [mdd]
      <4> [<ffffffffa0b8ab01>] mdd_links_rename+0x121/0x540 [mdd]
      <4> [<ffffffffa0b8eae6>] mdd_unlink+0xb86/0xe30 [mdd]
      <4> [<ffffffffa0e0db98>] mdo_unlink+0x18/0x50 [mdt]
      <4> [<ffffffffa0e10f40>] mdt_reint_unlink+0x820/0x1010 [mdt]
      <4> [<ffffffffa0e0d891>] mdt_reint_rec+0x41/0xe0 [mdt]
      <4> [<ffffffffa0df2b03>] mdt_reint_internal+0x4c3/0x780 [mdt]
      <4> [<ffffffffa0df2e04>] mdt_reint+0x44/0xe0 [mdt]
      <4> [<ffffffffa0df7ab8>] mdt_handle_common+0x648/0x1660 [mdt]
      <4> [<ffffffffa0e31165>] mds_regular_handle+0x15/0x20 [mdt]
      <4> [<ffffffffa0730388>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      <4> [<ffffffffa044c5de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      <4> [<ffffffffa045dd8f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      <4> [<ffffffffa07276e9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      <4> [<ffffffff81055ab3>] ? __wake_up+0x53/0x70
      <4> [<ffffffffa073171e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
      <4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
      <4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      <4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      <4>
      <0>Kernel panic - not syncing: LBUG
      <4>Pid: 6274, comm: mdt01_004 Tainted: G --------------- T 2.6.32-358.6.2.el6_lustre.g230b174.x86_64 #1
      <4>Call Trace:
      <4> [<ffffffff8150d878>] ? panic+0xa7/0x16f
      <4> [<ffffffffa044beeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      <4> [<ffffffffa05b47d6>] ? linkea_links_find+0x186/0x190 [obdclass]
      <4> [<ffffffffa0b87206>] ? mdo_xattr_get+0x26/0x30 [mdd]
      <4> [<ffffffffa0b8a645>] ? mdd_linkea_prepare+0x95/0x430 [mdd]
      <4> [<ffffffffa0b8ab01>] ? mdd_links_rename+0x121/0x540 [mdd]
      <4> [<ffffffffa0b8eae6>] ? mdd_unlink+0xb86/0xe30 [mdd]
      <4> [<ffffffffa0e0db98>] ? mdo_unlink+0x18/0x50 [mdt]
      <4> [<ffffffffa0e10f40>] ? mdt_reint_unlink+0x820/0x1010 [mdt]
      <4> [<ffffffffa0e0d891>] ? mdt_reint_rec+0x41/0xe0 [mdt]
      <4> [<ffffffffa0df2b03>] ? mdt_reint_internal+0x4c3/0x780 [mdt]
      <4> [<ffffffffa0df2e04>] ? mdt_reint+0x44/0xe0 [mdt]
      <4> [<ffffffffa0df7ab8>] ? mdt_handle_common+0x648/0x1660 [mdt]
      <4> [<ffffffffa0e31165>] ? mds_regular_handle+0x15/0x20 [mdt]
      <4> [<ffffffffa0730388>] ? ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
      <4> [<ffffffffa044c5de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      <4> [<ffffffffa045dd8f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      <4> [<ffffffffa07276e9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
      <4> [<ffffffff81055ab3>] ? __wake_up+0x53/0x70
      <4> [<ffffffffa073171e>] ? ptlrpc_main+0xace/0x1700 [ptlrpc]
      <4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      <4> [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
      <4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      <4> [<ffffffffa0730c50>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

      Attachments

        Issue Links

          Activity

            [LU-3474] MDS LBUG on unlink?
            spitzcor Cory Spitz added a comment -

            Cray testing on change #6676 ps1 and 6772 shows that the changes resolve our problems with the LBUG.

            spitzcor Cory Spitz added a comment - Cray testing on change #6676 ps1 and 6772 shows that the changes resolve our problems with the LBUG.

            Just pushed new version/patch-set #2 of change http://review.whamcloud.com/6676. It adds a few ENODATA error handling fixes, to avoid unnecessary msgs and also prevent early return, to original fix.

            And http://review.whamcloud.com/6772 is cosmetic patch for similar linkea_init() error handling in mdt layer.

            bfaccini Bruno Faccini (Inactive) added a comment - Just pushed new version/patch-set #2 of change http://review.whamcloud.com/6676 . It adds a few ENODATA error handling fixes, to avoid unnecessary msgs and also prevent early return, to original fix. And http://review.whamcloud.com/6772 is cosmetic patch for similar linkea_init() error handling in mdt layer.

            And we have a 2.1 formatted FS upgraded to Lustre 2.4 RPMs.

            prakash Prakash Surya (Inactive) added a comment - And we have a 2.1 formatted FS upgraded to Lustre 2.4 RPMs.

            Cray sees the bug on a file system formatted with 2.4.

            amk Ann Koehler (Inactive) added a comment - Cray sees the bug on a file system formatted with 2.4.

            The filesystem was formatted using the latest v2.3 release so many of the hardlinks would have been created under that version.

            daire Daire Byrne (Inactive) added a comment - The filesystem was formatted using the latest v2.3 release so many of the hardlinks would have been created under that version.

            Just for my understanding and about the reproducer scenario, is it possible that the hard-links beeing removed/unlinked and causing the LBUGs/msgs may have been created when running with some early 2.x version (ie, with more limits in place during link_ea populate) ?

            Patch to be out soon.

            bfaccini Bruno Faccini (Inactive) added a comment - Just for my understanding and about the reproducer scenario, is it possible that the hard-links beeing removed/unlinked and causing the LBUGs/msgs may have been created when running with some early 2.x version (ie, with more limits in place during link_ea populate) ? Patch to be out soon.

            Di,
            Since ENOENT/ENODATA indicate either the link reference was not found in link_ea or there is no more reference at all in link_ea that means the current looked-up entry was never added due to link_ea space exhausted. So yes, seems that the error return value test is wrong (in fact I think in the original version it should have been "(check == 1)" as the 1st condition !!) and that we need to ignore ENOENT/ENODATA and simply continue. BTW, actually the error will only generate the msgs and end-up beeing trashed in mdd_unlink(), with the only side effect of new entry not beeing added upon rename!

            Daire,
            Thank's for the details on how this happen at your site. The fact you never experienced this with 1.8 is only due to no link_ea there! And as per my last comment just before, even if annoying, the error msgs you now have running with the patch do not indicate something really bad occurs.

            Prakash, Ann,
            I don't think that mdt_links_read() requires same change like mdd_links_read() because its only caller mdt_path_current() does not use ldata->ld_leh reference, but anyway I will try to create a reproducer for the situation (no link_ea entry due to overflow/limit?) that triggers the original Assert during unlink() and see if it can also cause problem upon fid2path() lookup.

            bfaccini Bruno Faccini (Inactive) added a comment - Di, Since ENOENT/ENODATA indicate either the link reference was not found in link_ea or there is no more reference at all in link_ea that means the current looked-up entry was never added due to link_ea space exhausted. So yes, seems that the error return value test is wrong (in fact I think in the original version it should have been "(check == 1)" as the 1st condition !!) and that we need to ignore ENOENT/ENODATA and simply continue. BTW, actually the error will only generate the msgs and end-up beeing trashed in mdd_unlink(), with the only side effect of new entry not beeing added upon rename! Daire, Thank's for the details on how this happen at your site. The fact you never experienced this with 1.8 is only due to no link_ea there! And as per my last comment just before, even if annoying, the error msgs you now have running with the patch do not indicate something really bad occurs. Prakash, Ann, I don't think that mdt_links_read() requires same change like mdd_links_read() because its only caller mdt_path_current() does not use ldata->ld_leh reference, but anyway I will try to create a reproducer for the situation (no link_ea entry due to overflow/limit?) that triggers the original Assert during unlink() and see if it can also cause problem upon fid2path() lookup.

            Should mdt_links_read() as well as mdd_links_read() be changed to return the rc from linkea_init()? I'm just noting the symmetry in the code.

            At first glance it looks like it should. And if it shouldn't, a comment explaining the difference is needed.

            prakash Prakash Surya (Inactive) added a comment - Should mdt_links_read() as well as mdd_links_read() be changed to return the rc from linkea_init()? I'm just noting the symmetry in the code. At first glance it looks like it should. And if it shouldn't, a comment explaining the difference is needed.

            re: http://review.whamcloud.com/6676

            Should mdt_links_read() as well as mdd_links_read() be changed to return the rc from linkea_init()? I'm just noting the symmetry in the code.

            amk Ann Koehler (Inactive) added a comment - re: http://review.whamcloud.com/6676 Should mdt_links_read() as well as mdd_links_read() be changed to return the rc from linkea_init()? I'm just noting the symmetry in the code.
            spitzcor Cory Spitz added a comment -

            Cray has been seeing this bug a lot. We'll try out the patch and report back. Speak up if you need any of our debug info.

            spitzcor Cory Spitz added a comment - Cray has been seeing this bug a lot. We'll try out the patch and report back. Speak up if you need any of our debug info.

            So just to give you an idea of what we are doing here. We are essentially using "rsync --link-dest" to do backups of servers (hence the yum DB files). If the file is unchanged just hardlink it to the previous backup copy. So to trigger these messages we are simply deleting old backups which in many cases simply removes the hard link count by one. This workload has been found to be a good test of metadata and IO. We have never seen this issue in Lustre v1.8 in the 2 years this workload has been running on it.

            In terms of the messages maybe 20/s? It isn't constant so I guess not all files trigger it. And there can be half an hour between an influx of these messages.

            daire Daire Byrne (Inactive) added a comment - So just to give you an idea of what we are doing here. We are essentially using "rsync --link-dest" to do backups of servers (hence the yum DB files). If the file is unchanged just hardlink it to the previous backup copy. So to trigger these messages we are simply deleting old backups which in many cases simply removes the hard link count by one. This workload has been found to be a good test of metadata and IO. We have never seen this issue in Lustre v1.8 in the 2 years this workload has been running on it. In terms of the messages maybe 20/s? It isn't constant so I guess not all files trigger it. And there can be half an hour between an influx of these messages.

            People

              bfaccini Bruno Faccini (Inactive)
              daire Daire Byrne (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: