Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5393

LBUG: (ost_handler.c:882:ost_brw_read()) ASSERTION( local_nb[i].rc == 0 ) failed

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.8.0
    • Lustre 2.4.3
    • RHEL 6 w/ patched kernel for Lustre
    • 3
    • 15009

    Description

      We are hitting this LBUG on one of our production systems recently updated to Lustre 2.4.3 (on 4th of June).

      LustreError: 11020:0:(ost_handler.c:882:ost_brw_read()) ASSERTION( local_nb[i].rc == 0 ) failed:
      LustreError: 11020:0:(ost_handler.c:882:ost_brw_read()) LBUG
      Pid: 11020, comm: ll_ost_io03_084
      
      Call Trace:
       [<ffffffffa07e6895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa07e6e97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa04a0d44>] ost_brw_read+0x12d4/0x1340 [ost]
       [<ffffffff81282c09>] ? cpumask_next_and+0x29/0x50
       [<ffffffff8105bf64>] ? find_busiest_group+0x244/0x9f0
       [<ffffffffa0abbf0c>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc]
       [<ffffffffa0abc068>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc]
       [<ffffffffa04a8038>] ost_handle+0x2ac8/0x48e0 [ost]
       [<ffffffffa0ac2c4b>] ? ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc]
       [<ffffffffa0acb428>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
       [<ffffffffa07e75de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
       [<ffffffffa07f8d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
       [<ffffffffa0ac2789>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
       [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
       [<ffffffffa0acc7be>] ptlrpc_main+0xace/0x1700 [ptlrpc]
       [<ffffffffa0acbcf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffffa0acbcf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
       [<ffffffffa0acbcf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      
      Kernel panic - not syncing: LBUG
      

      The assertion is here:

       753 static int ost_brw_read(struct ptlrpc_request *req, struct obd_trans_info *oti)
       754 {
      ...
       879                 if (page_rc != local_nb[i].len) { /* short read */
       880                         /* All subsequent pages should be 0 */
       881                         while(++i < npages)
       882                                 LASSERT(local_nb[i].rc == 0);
       883                         break;
       884                 }
      

      I was able to get the content of local_nb from the crash dump.

      crash> struct ost_thread_local_cache 0xffff88105da00000
      struct ost_thread_local_cache {
        local = {{
            lnb_file_offset = 6010437632,
            lnb_page_offset = 0,
            len = 4096,
            flags = 0,
            page = 0xffffea0037309c10,
            dentry = 0xffff8808048c9a80,
            lnb_grant_used = 0,
            rc = 4096
          }, {
      ...
          }, {
            lnb_file_offset = 6010757120,
            lnb_page_offset = 0,
            len = 4096,
            flags = 0,
            page = 0xffffea00372ccf80,
            dentry = 0xffff8808048c9a80,
            lnb_grant_used = 0,
            rc = 512   <======== local_nb[i].rc != local_nb[i].len /* short read */
          }, {
            lnb_file_offset = 6010761216,
            lnb_page_offset = 0,
            len = 4096,
            flags = 0,
            page = 0xffffea0037176e98,
            dentry = 0xffff8808048c9a80,
            lnb_grant_used = 0,
            rc = 4096
          }, {
            lnb_file_offset = 1411710976,
            lnb_page_offset = 0,
            len = 4096,
            flags = 1120,
            page = 0x0,
            dentry = 0xffff8803690c2780,
            lnb_grant_used = 0,
            rc = 0
          }, {
            lnb_file_offset = 1411715072,
            lnb_page_offset = 0,
            len = 4096,
            flags = 1120,
            page = 0x0,
            dentry = 0xffff8803690c2780,
            lnb_grant_used = 0,
            rc = 0
          }, {
      ...
      

      This LBUG occurred 5 times since 06/06/14. Each time, we have a short read followed by some non-empty pages (rc != 0). You will find attached some output for the first 3 occurences.

      FYI, we have Lustre routers between the servers and clients as they are on a different LNET. Clients, routers and servers are running Lustre 2.4.3.

      Attachments

        1. 1407594033.bf26.debug.log
          8 kB
        2. 1407598675.bf27.debug.log
          23 kB
        3. 1407800513.bf33.debug.log
          7 kB
        4. MTS1351.tgz
          204 kB

        Issue Links

          Activity

            [LU-5393] LBUG: (ost_handler.c:882:ost_brw_read()) ASSERTION( local_nb[i].rc == 0 ) failed

            Yes, you are right and thanks to add this since I have forgotten to mention it. And hopefully this is the way we can use now to find in-memory/crash inode for new occurrences of problem!

            BTW, as already discussed, I have also pushed a debug patch (http://review.whamcloud.com/13448) that may allow to better highlight the possible race during short-read condition which we strongly suspect to cause the assert. Let's wait for it to be exposed to our auto-tests and see if it is not too intrusive. I have submitted it on top of current b2_5 but it should apply easily in 2.5.3.

            I will also try to work in-house on a reproducer, based on our latest findings/thoughts during our joint on-site debug and analysis.

            bfaccini Bruno Faccini (Inactive) added a comment - Yes, you are right and thanks to add this since I have forgotten to mention it. And hopefully this is the way we can use now to find in-memory/crash inode for new occurrences of problem! BTW, as already discussed, I have also pushed a debug patch ( http://review.whamcloud.com/13448 ) that may allow to better highlight the possible race during short-read condition which we strongly suspect to cause the assert. Let's wait for it to be exposed to our auto-tests and see if it is not too intrusive. I have submitted it on top of current b2_5 but it should apply easily in 2.5.3. I will also try to work in-house on a reproducer, based on our latest findings/thoughts during our joint on-site debug and analysis.

            Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/13448
            Subject: LU-5393 osd-ldiskfs: debug short-read vs possible race
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set: 1
            Commit: f8805056590b6e0ea5cacf32179c0cf995811b3a

            gerrit Gerrit Updater added a comment - Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/13448 Subject: LU-5393 osd-ldiskfs: debug short-read vs possible race Project: fs/lustre-release Branch: b2_5 Current Patch Set: 1 Commit: f8805056590b6e0ea5cacf32179c0cf995811b3a

            I found an other way to have the inode pointer who do the LBUG
            On the niobuf_local structure the dentry field is finaly a pointer
            of osd_object structure and the field oo_inode in the osd_object
            is THE inode pointer.
            CF: lustre!osd_ldiskfs/osd_io.c: 472 lnb->dentry = (void *) obj;

            apercher Antoine Percher added a comment - I found an other way to have the inode pointer who do the LBUG On the niobuf_local structure the dentry field is finaly a pointer of osd_object structure and the field oo_inode in the osd_object is THE inode pointer. CF: lustre!osd_ldiskfs/osd_io.c: 472 lnb->dentry = (void *) obj;

            After on-site, it appear that the method I thought to allow easy Inode in crash-dump retrieval was irrelevant because the OSTs are configured with read_cache disabled and thus none of the pages concerned by IO/niobufs are chached nor mapped to the inode.
            Thus, to find the inode in-memory we had to to do a mass search of its ino/number on all the Inodes lists, and have found it on the default backing_device dirty-list. And BTW, the inode size is quite far from the current pages range.
            Since we still suspect a race to cause the problem, I am currently working on a debug patch to help in debugging more closely.

            bfaccini Bruno Faccini (Inactive) added a comment - After on-site, it appear that the method I thought to allow easy Inode in crash-dump retrieval was irrelevant because the OSTs are configured with read_cache disabled and thus none of the pages concerned by IO/niobufs are chached nor mapped to the inode. Thus, to find the inode in-memory we had to to do a mass search of its ino/number on all the Inodes lists, and have found it on the default backing_device dirty-list. And BTW, the inode size is quite far from the current pages range. Since we still suspect a race to cause the problem, I am currently working on a debug patch to help in debugging more closely.

            I tried to, but I was not able to find the inode in-memory. The customer site will be closed for the next two weeks. I will give a new try with Antoine next year.

            > I also wonder how you identified the inode you point at the beginning of each log ?

            We were able to find the OST and the oid of the inode in the stack. We then used debugfs on the OST to stat O/0/d(oid%32)/oid to get the inode.

            bruno.travouillon Bruno Travouillon (Inactive) added a comment - I tried to, but I was not able to find the inode in-memory. The customer site will be closed for the next two weeks. I will give a new try with Antoine next year. > I also wonder how you identified the inode you point at the beginning of each log ? We were able to find the OST and the oid of the inode in the stack. We then used debugfs on the OST to stat O/0/d(oid%32)/oid to get the inode.

            Bruno, Can you check if you can resolve/find the concerned Inode in crash-dump memory using the methos I have described above?

            bfaccini Bruno Faccini (Inactive) added a comment - Bruno, Can you check if you can resolve/find the concerned Inode in crash-dump memory using the methos I have described above?

            We hit the same LBUG today, on the same filesystem, but with 2.5.3 this time. I can confirm that the debug mask was set as requested for the already provided debug logs.

            The lnb[i].rc is not always 512. We had 396 in a previous crash. This time, rc = 1152.

            I should be able to look deeper tomorrow.

            bruno.travouillon Bruno Travouillon (Inactive) added a comment - We hit the same LBUG today, on the same filesystem, but with 2.5.3 this time. I can confirm that the debug mask was set as requested for the already provided debug logs. The lnb [i] .rc is not always 512. We had 396 in a previous crash. This time, rc = 1152. I should be able to look deeper tomorrow.

            As a short-cut method to find the inode in-memory, man could also verify that all previous lnb[0..(i-1)].page are linked to the same mapping/address_space that will hopefully belong to the same ldiskfs_node.

            Also I forgot to ask if the unexpected non-null lnb[i].rc value is always 512 in all crash-dumps ?

            bfaccini Bruno Faccini (Inactive) added a comment - As a short-cut method to find the inode in-memory, man could also verify that all previous lnb [0..(i-1)] .page are linked to the same mapping/address_space that will hopefully belong to the same ldiskfs_node. Also I forgot to ask if the unexpected non-null lnb [i] .rc value is always 512 in all crash-dumps ?

            Hello Bruno,
            Sorry to be late on this.
            I am afraid that the latest 3 logs you provided do not help to identify problem's root cause.
            BTW, I wonder if you had been able able to set the minimum "inode super ext2 neterror page warning error emerg vfstrace console" debug mask I had requested before ?
            I also wonder how you identified the inode you point at the beginning of each log ?
            OTOH, I would like to get the concerned inode to be extracted from a crash dump, so is the 1st one still available ? Or may be one of the latest ?
            This looks a bit tricky to do because the inode has been retrieved in previous routines/stages starting from obd_preprw()/ofd_preprw() call in ost_brw_read(), but no reference has been kept then ... We may luckily retrieve it by browsing the whole LBUG/thread stack and try to resolve the stacked pointers/values as in an ldiskfs_inode, but if unsuccessful we will need to mimic the whole algorithm, including the hashing of lu_object !!

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Bruno, Sorry to be late on this. I am afraid that the latest 3 logs you provided do not help to identify problem's root cause. BTW, I wonder if you had been able able to set the minimum "inode super ext2 neterror page warning error emerg vfstrace console" debug mask I had requested before ? I also wonder how you identified the inode you point at the beginning of each log ? OTOH, I would like to get the concerned inode to be extracted from a crash dump, so is the 1st one still available ? Or may be one of the latest ? This looks a bit tricky to do because the inode has been retrieved in previous routines/stages starting from obd_preprw()/ofd_preprw() call in ost_brw_read(), but no reference has been kept then ... We may luckily retrieve it by browsing the whole LBUG/thread stack and try to resolve the stacked pointers/values as in an ldiskfs_inode, but if unsuccessful we will need to mimic the whole algorithm, including the hashing of lu_object !!

            Hi Bruno,

            Yes it is.

            bruno.travouillon Bruno Travouillon (Inactive) added a comment - Hi Bruno, Yes it is.

            Hello Bruno,
            At the very beginning of each of the 3 debug logs you have added debugfs infos about a specific inode and I wonder if it is the one you have identified as being the one used by the process that triggers the LBUG ??

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Bruno, At the very beginning of each of the 3 debug logs you have added debugfs infos about a specific inode and I wonder if it is the one you have identified as being the one used by the process that triggers the LBUG ??

            People

              bfaccini Bruno Faccini (Inactive)
              bruno.travouillon Bruno Travouillon (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: