[LU-5393] LBUG: (ost_handler.c:882:ost_brw_read()) ASSERTION( local_nb[i].rc == 0 ) failed - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.8.0
Affects Version/s: Lustre 2.4.3
Labels:
- p4b
Environment:
RHEL 6 w/ patched kernel for Lustre

Severity:
3
Rank (Obsolete):
15009

Description

We are hitting this LBUG on one of our production systems recently updated to Lustre 2.4.3 (on 4th of June).

LustreError: 11020:0:(ost_handler.c:882:ost_brw_read()) ASSERTION( local_nb[i].rc == 0 ) failed:
LustreError: 11020:0:(ost_handler.c:882:ost_brw_read()) LBUG
Pid: 11020, comm: ll_ost_io03_084

Call Trace:
 [<ffffffffa07e6895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa07e6e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa04a0d44>] ost_brw_read+0x12d4/0x1340 [ost]
 [<ffffffff81282c09>] ? cpumask_next_and+0x29/0x50
 [<ffffffff8105bf64>] ? find_busiest_group+0x244/0x9f0
 [<ffffffffa0abbf0c>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc]
 [<ffffffffa0abc068>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc]
 [<ffffffffa04a8038>] ost_handle+0x2ac8/0x48e0 [ost]
 [<ffffffffa0ac2c4b>] ? ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc]
 [<ffffffffa0acb428>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
 [<ffffffffa07e75de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa07f8d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
 [<ffffffffa0ac2789>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
 [<ffffffff81058bd3>] ? __wake_up+0x53/0x70
 [<ffffffffa0acc7be>] ptlrpc_main+0xace/0x1700 [ptlrpc]
 [<ffffffffa0acbcf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffffa0acbcf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffffa0acbcf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG

The assertion is here:

 753 static int ost_brw_read(struct ptlrpc_request *req, struct obd_trans_info *oti)
 754 {
...
 879                 if (page_rc != local_nb[i].len) { /* short read */
 880                         /* All subsequent pages should be 0 */
 881                         while(++i < npages)
 882                                 LASSERT(local_nb[i].rc == 0);
 883                         break;
 884                 }

I was able to get the content of local_nb from the crash dump.

crash> struct ost_thread_local_cache 0xffff88105da00000
struct ost_thread_local_cache {
  local = {{
      lnb_file_offset = 6010437632,
      lnb_page_offset = 0,
      len = 4096,
      flags = 0,
      page = 0xffffea0037309c10,
      dentry = 0xffff8808048c9a80,
      lnb_grant_used = 0,
      rc = 4096
    }, {
...
    }, {
      lnb_file_offset = 6010757120,
      lnb_page_offset = 0,
      len = 4096,
      flags = 0,
      page = 0xffffea00372ccf80,
      dentry = 0xffff8808048c9a80,
      lnb_grant_used = 0,
      rc = 512   <======== local_nb[i].rc != local_nb[i].len /* short read */
    }, {
      lnb_file_offset = 6010761216,
      lnb_page_offset = 0,
      len = 4096,
      flags = 0,
      page = 0xffffea0037176e98,
      dentry = 0xffff8808048c9a80,
      lnb_grant_used = 0,
      rc = 4096
    }, {
      lnb_file_offset = 1411710976,
      lnb_page_offset = 0,
      len = 4096,
      flags = 1120,
      page = 0x0,
      dentry = 0xffff8803690c2780,
      lnb_grant_used = 0,
      rc = 0
    }, {
      lnb_file_offset = 1411715072,
      lnb_page_offset = 0,
      len = 4096,
      flags = 1120,
      page = 0x0,
      dentry = 0xffff8803690c2780,
      lnb_grant_used = 0,
      rc = 0
    }, {
...

This LBUG occurred 5 times since 06/06/14. Each time, we have a short read followed by some non-empty pages (rc != 0). You will find attached some output for the first 3 occurences.

FYI, we have Lustre routers between the servers and clients as they are on a different LNET. Clients, routers and servers are running Lustre 2.4.3.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

1407594033.bf26.debug.log
8 kB
07/Oct/14 2:46 PM
1407598675.bf27.debug.log
23 kB
07/Oct/14 2:46 PM
1407800513.bf33.debug.log
7 kB
07/Oct/14 2:46 PM
MTS1351.tgz
204 kB
22/Jul/14 7:30 AM

Issue Links

is duplicated by

LU-7322 (ost_handler.c:896:ost_brw_read()) ASSERTION( local_nb[i].rc == 0 ) failed

Resolved

Activity

[LU-5393] LBUG: (ost_handler.c:882:ost_brw_read()) ASSERTION( local_nb[i].rc == 0 ) failed

Bruno Faccini (Inactive) added a comment - 27/Jan/15 4:37 PM

Did you have some time to give a try to my debug patch ? From my side, it has passed auto-tests and does not introduce any regression nor cause flooding msgs ...

OTOH, my reproducer attempts, based on our findings+thought is still unsuccessful ...

Bruno Faccini (Inactive) added a comment - 27/Jan/15 4:37 PM Did you have some time to give a try to my debug patch ? From my side, it has passed auto-tests and does not introduce any regression nor cause flooding msgs ... OTOH, my reproducer attempts, based on our findings+thought is still unsuccessful ...

Bruno Faccini (Inactive) added a comment - 18/Jan/15 6:09 PM

Yes, you are right and thanks to add this since I have forgotten to mention it. And hopefully this is the way we can use now to find in-memory/crash inode for new occurrences of problem!

BTW, as already discussed, I have also pushed a debug patch (http://review.whamcloud.com/13448) that may allow to better highlight the possible race during short-read condition which we strongly suspect to cause the assert. Let's wait for it to be exposed to our auto-tests and see if it is not too intrusive. I have submitted it on top of current b2_5 but it should apply easily in 2.5.3.

I will also try to work in-house on a reproducer, based on our latest findings/thoughts during our joint on-site debug and analysis.

Bruno Faccini (Inactive) added a comment - 18/Jan/15 6:09 PM Yes, you are right and thanks to add this since I have forgotten to mention it. And hopefully this is the way we can use now to find in-memory/crash inode for new occurrences of problem! BTW, as already discussed, I have also pushed a debug patch ( http://review.whamcloud.com/13448 ) that may allow to better highlight the possible race during short-read condition which we strongly suspect to cause the assert. Let's wait for it to be exposed to our auto-tests and see if it is not too intrusive. I have submitted it on top of current b2_5 but it should apply easily in 2.5.3. I will also try to work in-house on a reproducer, based on our latest findings/thoughts during our joint on-site debug and analysis.

Gerrit Updater added a comment - 18/Jan/15 5:49 PM

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/13448
Subject: ~~LU-5393~~ osd-ldiskfs: debug short-read vs possible race
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: f8805056590b6e0ea5cacf32179c0cf995811b3a

Gerrit Updater added a comment - 18/Jan/15 5:49 PM Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/13448 Subject: LU-5393 osd-ldiskfs: debug short-read vs possible race Project: fs/lustre-release Branch: b2_5 Current Patch Set: 1 Commit: f8805056590b6e0ea5cacf32179c0cf995811b3a

Antoine Percher added a comment - 16/Jan/15 12:52 PM

I found an other way to have the inode pointer who do the LBUG
On the niobuf_local structure the dentry field is finaly a pointer
of osd_object structure and the field oo_inode in the osd_object
is THE inode pointer.
CF: lustre!osd_ldiskfs/osd_io.c: 472 lnb->dentry = (void *) obj;

Antoine Percher added a comment - 16/Jan/15 12:52 PM I found an other way to have the inode pointer who do the LBUG On the niobuf_local structure the dentry field is finaly a pointer of osd_object structure and the field oo_inode in the osd_object is THE inode pointer. CF: lustre!osd_ldiskfs/osd_io.c: 472 lnb->dentry = (void *) obj;

Bruno Faccini (Inactive) added a comment - 13/Jan/15 6:49 PM

After on-site, it appear that the method I thought to allow easy Inode in crash-dump retrieval was irrelevant because the OSTs are configured with read_cache disabled and thus none of the pages concerned by IO/niobufs are chached nor mapped to the inode.
Thus, to find the inode in-memory we had to to do a mass search of its ino/number on all the Inodes lists, and have found it on the default backing_device dirty-list. And BTW, the inode size is quite far from the current pages range.
Since we still suspect a race to cause the problem, I am currently working on a debug patch to help in debugging more closely.

Bruno Faccini (Inactive) added a comment - 13/Jan/15 6:49 PM After on-site, it appear that the method I thought to allow easy Inode in crash-dump retrieval was irrelevant because the OSTs are configured with read_cache disabled and thus none of the pages concerned by IO/niobufs are chached nor mapped to the inode. Thus, to find the inode in-memory we had to to do a mass search of its ino/number on all the Inodes lists, and have found it on the default backing_device dirty-list. And BTW, the inode size is quite far from the current pages range. Since we still suspect a race to cause the problem, I am currently working on a debug patch to help in debugging more closely.

Bruno Travouillon (Inactive) added a comment - 19/Dec/14 2:17 PM

I tried to, but I was not able to find the inode in-memory. The customer site will be closed for the next two weeks. I will give a new try with Antoine next year.

> I also wonder how you identified the inode you point at the beginning of each log ?

We were able to find the OST and the oid of the inode in the stack. We then used debugfs on the OST to stat O/0/d(oid%32)/oid to get the inode.

Bruno Travouillon (Inactive) added a comment - 19/Dec/14 2:17 PM I tried to, but I was not able to find the inode in-memory. The customer site will be closed for the next two weeks. I will give a new try with Antoine next year. > I also wonder how you identified the inode you point at the beginning of each log ? We were able to find the OST and the oid of the inode in the stack. We then used debugfs on the OST to stat O/0/d(oid%32)/oid to get the inode.

Bruno Faccini (Inactive) added a comment - 19/Dec/14 8:45 AM

Bruno, Can you check if you can resolve/find the concerned Inode in crash-dump memory using the methos I have described above?

Bruno Faccini (Inactive) added a comment - 19/Dec/14 8:45 AM Bruno, Can you check if you can resolve/find the concerned Inode in crash-dump memory using the methos I have described above?

Bruno Travouillon (Inactive) added a comment - 18/Dec/14 4:47 PM

We hit the same LBUG today, on the same filesystem, but with 2.5.3 this time. I can confirm that the debug mask was set as requested for the already provided debug logs.

The lnb[i].rc is not always 512. We had 396 in a previous crash. This time, rc = 1152.

I should be able to look deeper tomorrow.

Bruno Travouillon (Inactive) added a comment - 18/Dec/14 4:47 PM We hit the same LBUG today, on the same filesystem, but with 2.5.3 this time. I can confirm that the debug mask was set as requested for the already provided debug logs. The lnb [i] .rc is not always 512. We had 396 in a previous crash. This time, rc = 1152. I should be able to look deeper tomorrow.

Bruno Faccini (Inactive) added a comment - 28/Nov/14 1:49 PM

As a short-cut method to find the inode in-memory, man could also verify that all previous lnb[0..(i-1)].page are linked to the same mapping/address_space that will hopefully belong to the same ldiskfs_node.

Also I forgot to ask if the unexpected non-null lnb[i].rc value is always 512 in all crash-dumps ?

Bruno Faccini (Inactive) added a comment - 28/Nov/14 1:49 PM As a short-cut method to find the inode in-memory, man could also verify that all previous lnb [0..(i-1)] .page are linked to the same mapping/address_space that will hopefully belong to the same ldiskfs_node. Also I forgot to ask if the unexpected non-null lnb [i] .rc value is always 512 in all crash-dumps ?

Bruno Faccini (Inactive) added a comment - 27/Nov/14 4:20 PM

Hello Bruno,
Sorry to be late on this.
I am afraid that the latest 3 logs you provided do not help to identify problem's root cause.
BTW, I wonder if you had been able able to set the minimum "inode super ext2 neterror page warning error emerg vfstrace console" debug mask I had requested before ?
I also wonder how you identified the inode you point at the beginning of each log ?
OTOH, I would like to get the concerned inode to be extracted from a crash dump, so is the 1st one still available ? Or may be one of the latest ?
This looks a bit tricky to do because the inode has been retrieved in previous routines/stages starting from obd_preprw()/ofd_preprw() call in ost_brw_read(), but no reference has been kept then ... We may luckily retrieve it by browsing the whole LBUG/thread stack and try to resolve the stacked pointers/values as in an ldiskfs_inode, but if unsuccessful we will need to mimic the whole algorithm, including the hashing of lu_object !!

Bruno Faccini (Inactive) added a comment - 27/Nov/14 4:20 PM Hello Bruno, Sorry to be late on this. I am afraid that the latest 3 logs you provided do not help to identify problem's root cause. BTW, I wonder if you had been able able to set the minimum "inode super ext2 neterror page warning error emerg vfstrace console" debug mask I had requested before ? I also wonder how you identified the inode you point at the beginning of each log ? OTOH, I would like to get the concerned inode to be extracted from a crash dump, so is the 1st one still available ? Or may be one of the latest ? This looks a bit tricky to do because the inode has been retrieved in previous routines/stages starting from obd_preprw()/ofd_preprw() call in ost_brw_read(), but no reference has been kept then ... We may luckily retrieve it by browsing the whole LBUG/thread stack and try to resolve the stacked pointers/values as in an ldiskfs_inode, but if unsuccessful we will need to mimic the whole algorithm, including the hashing of lu_object !!

Bruno Travouillon (Inactive) added a comment - 31/Oct/14 10:30 AM

Hi Bruno,

Yes it is.

Bruno Travouillon (Inactive) added a comment - 31/Oct/14 10:30 AM Hi Bruno, Yes it is.

People

Assignee:: Bruno Faccini (Inactive)

Reporter:: Bruno Travouillon (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 22/Jul/14 7:30 AM

Updated:: 06/Dec/16 3:46 AM

Resolved:: 20/May/15 12:58 PM