[LU-4183] MDS crash when running fsx with NFS export Created: 29/Oct/13  Updated: 14/Aug/14  Resolved: 14/Aug/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.1, Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Lustre 2.5.0-RC1, RHEL 6, OpenSFS cluster with one MDS/MGS, two OSSs with two OSTs each, one node with Lustre mounted/NFS server exporting Lustre, three Lustre clients NFS mounting the file system


Issue Links:
Related
is related to LU-4528 osd_trans_exec_op()) ASSERTION( oti->... Resolved
is related to LU-4901 Remove triple buffer update from llog... Closed
Severity: 3
Rank (Obsolete): 11320

 Description   

Running three instances of fsx writing to an NFS mounted Lustre file system with a single NFS server, the MDS crashes and reboots. From the MDS console:

Message from syslogd@c03 at Oct 29 13:04:27 ...
 kernel:LustreError: 16448:0:(osd_internal.h:963:osd_trans_exec_op()) ASSERTION( rb < OSD_OT_MAX ) failed: rb = 12

Message from syslogd@c03 at Oct 29 13:04:27 ...
 kernel:LustreError: 16448:0:(osd_internal.h:963:osd_trans_exec_op()) LBUG

From the crash dump:

crash> bt
PID: 16448  TASK: ffff880738300040  CPU: 4   COMMAND: "mdt00_005"
 #0 [ffff8806565c1628] machine_kexec at ffffffff81035d6b
 #1 [ffff8806565c1688] crash_kexec at ffffffff810c0e22
 #2 [ffff8806565c1758] panic at ffffffff8150de5f
 #3 [ffff8806565c17d8] lbug_with_loc at ffffffffa04f0eeb [libcfs]
 #4 [ffff8806565c17f8] osd_punch at ffffffffa0d12243 [osd_ldiskfs]
 #5 [ffff8806565c1848] llog_osd_write_blob at ffffffffa0639050 [obdclass]
 #6 [ffff8806565c18b8] llog_osd_write_rec at ffffffffa063d04e [obdclass]
 #7 [ffff8806565c1968] llog_write_rec at ffffffffa0610428 [obdclass]
 #8 [ffff8806565c19c8] llog_cat_add_rec at ffffffffa0619299 [obdclass]
 #9 [ffff8806565c1a38] llog_add at ffffffffa0610221 [obdclass]
#10 [ffff8806565c1a88] mdd_changelog_store at ffffffffa0f54463 [mdd]
#11 [ffff8806565c1af8] mdd_changelog_data_store at ffffffffa0f465c6 [mdd]
#12 [ffff8806565c1b48] mdd_attr_set at ffffffffa0f4c9b1 [mdd]
#13 [ffff8806565c1bc8] mdt_attr_set at ffffffffa0e1cc28 [mdt]
#14 [ffff8806565c1c18] mdt_reint_setattr at ffffffffa0e1d4dd [mdt]
#15 [ffff8806565c1c88] mdt_reint_rec at ffffffffa0e16eb1 [mdt]
#16 [ffff8806565c1ca8] mdt_reint_internal at ffffffffa0dfec93 [mdt]
#17 [ffff8806565c1ce8] mdt_reint at ffffffffa0dfef94 [mdt]
#18 [ffff8806565c1d08] mdt_handle_common at ffffffffa0e01a8a [mdt]
#19 [ffff8806565c1d58] mds_regular_handle at ffffffffa0e3bc55 [mdt]
#20 [ffff8806565c1d68] ptlrpc_server_handle_request at ffffffffa07f2e25 [ptlrpc]
#21 [ffff8806565c1e48] ptlrpc_main at ffffffffa07f418d [ptlrpc]
#22 [ffff8806565c1ee8] kthread at ffffffff81096a36
#23 [ffff8806565c1f48] kernel_thread at ffffffff8100c0ca

On the OSTs, osd-ldiskfs.track_declares_assert=1

From the crash dump, here is the system information:

crash> sys
      KERNEL: /home/build/kernel/rpmbuild/BUILD/kernel-2.6.32.358.18.1.el6_lustre/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2013-10-29-13:05:16/vmcore  [PARTIAL DUMP]
        CPUS: 24
        DATE: Tue Oct 29 13:04:30 2013
      UPTIME: 3 days, 22:31:58
LOAD AVERAGE: 0.17, 0.06, 0.02
       TASKS: 362
    NODENAME: c03
     RELEASE: 2.6.32-358.18.1.el6_lustre.x86_64
     VERSION: #1 SMP Fri Oct 11 16:41:53 PDT 2013
     MACHINE: x86_64  (2399 Mhz)
      MEMORY: 32 GB
       PANIC: "Kernel panic - not syncing: LBUG"

Two of the three fsx commands were:
./fsx -c 5 /misc/export/nfs_x_1
./fsx /misc/export/nfs_x_2



 Comments   
Comment by James Nunez (Inactive) [ 29/Oct/13 ]

Looking at the log in the crash dump:

crash> log
...
LustreError: 16448:0:(osd_io.c:1065:osd_ldiskfs_write_record()) sda3: error reading offset 1249280 (block 305): rc = -28
LustreError: 16448:0:(llog_osd.c:156:llog_osd_write_blob()) scratch-MDT0000-osd: error writing log record: rc = -28
LustreError: 16448:0:(osd_io.c:1065:osd_ldiskfs_write_record()) sda3: error reading offset 1249280 (block 305): rc = -28
LustreError: 16448:0:(llog_osd.c:156:llog_osd_write_blob()) scratch-MDT0000-osd: error writing log record: rc = -28
LustreError: 16448:0:(osd_internal.h:963:osd_trans_exec_op()) ASSERTION( rb < OSD_OT_MAX ) failed: rb = 12
LustreError: 16448:0:(osd_internal.h:963:osd_trans_exec_op()) LBUG
Pid: 16448, comm: mdt00_005

Call Trace:
 [<ffffffffa04f0895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa04f0e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0d12243>] osd_punch+0x613/0x620 [osd_ldiskfs]
 [<ffffffffa0639050>] llog_osd_write_blob+0x1e0/0x850 [obdclass]
 [<ffffffffa063d04e>] llog_osd_write_rec+0xb5e/0x1370 [obdclass]
 [<ffffffffa0610428>] llog_write_rec+0xc8/0x290 [obdclass]
 [<ffffffffa0619299>] llog_cat_add_rec+0x2b9/0x480 [obdclass]
 [<ffffffffa0610221>] llog_add+0x91/0x1d0 [obdclass]
 [<ffffffffa0f54463>] mdd_changelog_store+0xf3/0x260 [mdd]
 [<ffffffffa0f465c6>] mdd_changelog_data_store+0x1a6/0x280 [mdd]
 [<ffffffffa0f4c9b1>] mdd_attr_set+0xeb1/0x1470 [mdd]
 [<ffffffffa0e1cc28>] mdt_attr_set+0x268/0x560 [mdt]
 [<ffffffffa0e1d4dd>] mdt_reint_setattr+0x5bd/0xcf0 [mdt]
 [<ffffffffa07e2f9e>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
 [<ffffffffa0e16eb1>] mdt_reint_rec+0x41/0xe0 [mdt]
 [<ffffffffa0dfec93>] mdt_reint_internal+0x4c3/0x780 [mdt]
 [<ffffffffa0dfef94>] mdt_reint+0x44/0xe0 [mdt]
 [<ffffffffa0e01a8a>] mdt_handle_common+0x52a/0x1470 [mdt]
 [<ffffffffa0e3bc55>] mds_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa07f2e25>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
 [<ffffffffa050227f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
 [<ffffffffa07ea4c9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
 [<ffffffffa07f418d>] ptlrpc_main+0xaed/0x1740 [ptlrpc]
 [<ffffffffa07f36a0>] ? ptlrpc_main+0x0/0x1740 [ptlrpc]
 [<ffffffff81096a36>] kthread+0x96/0xa0
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff810969a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

Kernel panic - not syncing: LBUG
Pid: 16448, comm: mdt00_005 Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1
Call Trace:
 [<ffffffff8150de58>] ? panic+0xa7/0x16f
 [<ffffffffa04f0eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
 [<ffffffffa0d12243>] ? osd_punch+0x613/0x620 [osd_ldiskfs]
 [<ffffffffa0639050>] ? llog_osd_write_blob+0x1e0/0x850 [obdclass]
 [<ffffffffa063d04e>] ? llog_osd_write_rec+0xb5e/0x1370 [obdclass]
 [<ffffffffa0610428>] ? llog_write_rec+0xc8/0x290 [obdclass]
 [<ffffffffa0619299>] ? llog_cat_add_rec+0x2b9/0x480 [obdclass]
 [<ffffffffa0610221>] ? llog_add+0x91/0x1d0 [obdclass]
 [<ffffffffa0f54463>] ? mdd_changelog_store+0xf3/0x260 [mdd]
 [<ffffffffa0f465c6>] ? mdd_changelog_data_store+0x1a6/0x280 [mdd]
 [<ffffffffa0f4c9b1>] ? mdd_attr_set+0xeb1/0x1470 [mdd]
 [<ffffffffa0e1cc28>] ? mdt_attr_set+0x268/0x560 [mdt]
 [<ffffffffa0e1d4dd>] ? mdt_reint_setattr+0x5bd/0xcf0 [mdt]
 [<ffffffffa07e2f9e>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]
 [<ffffffffa0e16eb1>] ? mdt_reint_rec+0x41/0xe0 [mdt]
 [<ffffffffa0dfec93>] ? mdt_reint_internal+0x4c3/0x780 [mdt]
 [<ffffffffa0dfef94>] ? mdt_reint+0x44/0xe0 [mdt]
 [<ffffffffa0e01a8a>] ? mdt_handle_common+0x52a/0x1470 [mdt]
 [<ffffffffa0e3bc55>] ? mds_regular_handle+0x15/0x20 [mdt]
 [<ffffffffa07f2e25>] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
 [<ffffffffa050227f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
 [<ffffffffa07ea4c9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
 [<ffffffffa07f418d>] ? ptlrpc_main+0xaed/0x1740 [ptlrpc]
 [<ffffffffa07f36a0>] ? ptlrpc_main+0x0/0x1740 [ptlrpc]
 [<ffffffff81096a36>] ? kthread+0x96/0xa0
 [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
 [<ffffffff810969a0>] ? kthread+0x0/0xa0
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Comment by Andreas Dilger [ 29/Oct/13 ]

It looks like this is a bug in llog_osd_write_blob() calling dt_punch() when a write fails:

        rc = dt_record_write(env, o, &lgi->lgi_buf, off, th);
        if (rc)
                CERROR("%s: error writing log tail: rc = %d\n",
                       o->do_lu.lo_dev->ld_obd->obd_name, rc);

out_unlock:
        dt_write_unlock(env, o);
out:
        /* cleanup the content written above */
        if (rc) {
                dt_punch(env, o, lgi->lgi_attr.la_size, OBD_OBJECT_EOF, th,
                         BYPASS_CAPA);
                dt_attr_set(env, o, &lgi->lgi_attr, th, BYPASS_CAPA);
        }

There are two separate classes of problems that I can see here:

  • Because OSD_OT_PUNCH is not declared, osd_trans_exec_op() considers this a rollback operation. The osd_trans_declare_op2rb[] mapping has the rollback operation [OSD_OT_WRITE] = OSD_OT_WRITE instead of OSD_OT_PUNCH.
  • The rollback mapping for [OSD_OT_PUNCH] = OSD_OT_MAX, which means that truncates can never be rolled back, which is what triggers this LASSERT().
  • llog_osd_write_blob() it saves the llog size lgi_attr before the lock is held. That means racing llog writes could invalidate this size before the write is actually done.
  • In llog_osd_write_blob() it drops dt_write_lock() before doing the error handling. This means that another thread could start writing to the llog, and then the truncate erases llog operations that are unrelated to the failure. In this case the failure is due to -ENOSPC on the "non-buf" write, so there is no need to do the truncate at all. The truncate is only be needed in case the "buf" is partially written, and should be done while holding dt_write_lock() on the object.
Comment by Oleg Drokin [ 25/Jan/14 ]

I hit this too today, Also due to ENOSPC, apparently. nfs was not involved (mainline kernel lustre client).
It was a sanity.sh test run:

<4>[ 5963.714103] Lustre: DEBUG MARKER: == sanity test 103: acl test ========================================================================= 00:14:02 (1390626842)
<0>[ 5964.573079] LustreError: 21998:0:(osd_internal.h:959:osd_trans_exec_op()) ASSERTION( rb < OSD_OT_MAX ) failed: rb = 12
<0>[ 5964.573451] LustreError: 21998:0:(osd_internal.h:959:osd_trans_exec_op()) LBUG
<4>[ 5964.573755] Pid: 21998, comm: mdt00_003
<4>[ 5964.573910] 
<4>[ 5964.573910] Call Trace:
<4>[ 5964.574184]  [<ffffffffa041a8a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4>[ 5964.574386]  [<ffffffffa041aea7>] lbug_with_loc+0x47/0xb0 [libcfs]
<4>[ 5964.574580]  [<ffffffffa0978ad3>] osd_punch+0x613/0x620 [osd_ldiskfs]
<4>[ 5964.574787]  [<ffffffffa0577d80>] llog_osd_write_blob+0x1e0/0x850 [obdclass]
<4>[ 5964.574993]  [<ffffffffa057bdad>] llog_osd_write_rec+0xadd/0x13a0 [obdclass]
<4>[ 5964.575191]  [<ffffffffa0365412>] ? do_get_write_access+0x3d2/0x550 [jbd2]
<4>[ 5964.575407]  [<ffffffffa056a4f8>] llog_write_rec+0xc8/0x290 [obdclass]
<4>[ 5964.575602]  [<ffffffffa05729e9>] llog_cat_add_rec+0x2b9/0x480 [obdclass]
<4>[ 5964.575800]  [<ffffffffa056a2f1>] llog_add+0x91/0x1d0 [obdclass]
<4>[ 5964.575984]  [<ffffffffa0e8c946>] osp_sync_add_rec+0x246/0xa10 [osp]
<4>[ 5964.576173]  [<ffffffffa0cdb8e8>] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]
<4>[ 5964.576482]  [<ffffffffa0e8d1bb>] osp_sync_add+0x7b/0x80 [osp]
<4>[ 5964.576679]  [<ffffffffa0e4fe20>] ? lod_xattr_set+0xc0/0x7e0 [lod]
<4>[ 5964.576876]  [<ffffffffa0e7d493>] osp_attr_set+0x193/0x1d0 [osp]
<4>[ 5964.577057]  [<ffffffffa0e4c837>] lod_attr_set+0x247/0x450 [lod]
<4>[ 5964.577241]  [<ffffffffa0d410d8>] mdd_acl_set+0x668/0x8d0 [mdd]
<4>[ 5964.577850]  [<ffffffffa0d28372>] mdd_xattr_set+0x192/0xbd0 [mdd]
<4>[ 5964.578059]  [<ffffffffa07f0304>] ? lustre_msg_get_versions+0xa4/0x120 [ptlrpc]
<4>[ 5964.578372]  [<ffffffffa0d88dca>] ? mdt_version_save+0x8a/0x1a0 [mdt]
<4>[ 5964.578561]  [<ffffffffa0d8ffaf>] mdt_reint_setxattr+0x73f/0x1850 [mdt]
<4>[ 5964.578763]  [<ffffffffa05dbc20>] ? lu_ucred+0x20/0x30 [obdclass]
<4>[ 5964.578947]  [<ffffffffa0d6a405>] ? mdt_ucred+0x15/0x20 [mdt]
<4>[ 5964.579127]  [<ffffffffa0d8457c>] ? mdt_root_squash+0x2c/0x410 [mdt]
<4>[ 5964.579317]  [<ffffffffa0d883b1>] mdt_reint_rec+0x41/0xe0 [mdt]
<4>[ 5964.579499]  [<ffffffffa0d6ed33>] mdt_reint_internal+0x4c3/0x780 [mdt]
<4>[ 5964.579687]  [<ffffffffa0d6f57b>] mdt_reint+0x6b/0x120 [mdt]
<4>[ 5964.579883]  [<ffffffffa084ebec>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
<4>[ 5964.580089]  [<ffffffffa07ffb18>] ptlrpc_main+0xcc8/0x1940 [ptlrpc]
<4>[ 5964.580288]  [<ffffffffa07fee50>] ? ptlrpc_main+0x0/0x1940 [ptlrpc]
<4>[ 5964.580476]  [<ffffffff81094726>] kthread+0x96/0xa0
<4>[ 5964.580657]  [<ffffffff8100c10a>] child_rip+0xa/0x20
<4>[ 5964.580837]  [<ffffffff81094690>] ? kthread+0x0/0xa0
<4>[ 5964.581002]  [<ffffffff8100c100>] ? child_rip+0x0/0x20
<4>[ 5964.581168] 
<0>[ 5964.581889] Kernel panic - not syncing: LBUG
Comment by Andreas Dilger [ 17/Apr/14 ]

This will likely be fixed with LU-4901.

Comment by Andreas Dilger [ 22/Apr/14 ]

The patch is actually being developed under LU-4528 instead of LU-4901.

Comment by Jodi Levi (Inactive) [ 03/Jun/14 ]

Reopening to remove fix version as this is a duplicate

Generated at Sat Feb 10 01:40:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.