[LU-7391] (osp_md_object.c:1155:osp_md_write()) ASSERTION( ob j->opo_ooa->ooa_attr.la_valid & LA_SIZE ) failed Created: 05/Nov/15  Updated: 05/Nov/15  Resolved: 05/Nov/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Blocker
Reporter: Frank Heckes (Inactive) Assignee: Di Wang
Resolution: Duplicate Votes: 0
Labels: soak
Environment:

lola
build: 2.7.62-28-g0754bc8, 0754bc8f2623bea184111af216f7567608db35b6; soakbuild '20151104.1'


Attachments: File console-lola-9.log.bz2     File lustre-log.1446686181.6493.bz2     File messages-lola-9.log.bz2     File soak.log.bz2    
Issue Links:
Duplicate
duplicates LU-7039 llog_osd.c:778:llog_osd_next_block())... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Error occurred during soak testing of build '20151104.1' on cluster lola (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20151104.1). MDTs are fromated with ldiskfs and OSTs with zfs as storage backend. DNE is enabled. MDSes are configured in HA failover configuration:

  • lola-8,9
    • mdt0/mgs, mdt1 primary node lola-8
    • mdt2,mdt3 primary node lola-9
  • lola-10,11
    • mdt4/mdt5 primary node lola-10
    • mdt6,mdt7 primary node lola-11

Event sequence:

  • 2015-11-04 17:15:28 restart of lola-11 finished
  • The following error occcured on lola-11:
    lola-11.log:Nov  4 17:16:17 lola-11 kernel: LustreError: 5104:0:(llog.c:581:llog_process_thread()) soaked-MDT0007-osp-MDT0006 retry remote llog process
    

    INTL-156

    lola-11.log:Nov  4 17:16:18 lola-11 kernel: LustreError: 4984:0:(ldlm_lib.c:1883:check_for_next_transno()) soaked-MDT0007: waking for gap in transno, VBR is OFF (skip: 558345901253, ql: 5, comp: 11, conn: 16, next: 558345901263, next_update 0 last_committed: 558345900230)
    
  • lola-) hit LBUG
    Error message reads as:
    Nov  4 17:16:21 lola-9 kernel: LustreError: 6493:0:(osp_md_object.c:1155:osp_md_write()) ASSERTION( ob
    j->opo_ooa->ooa_attr.la_valid & LA_SIZE ) failed:
    Nov  4 17:16:21 lola-9 kernel: LustreError: 6493:0:(osp_md_object.c:1155:osp_md_write()) LBUG
    Nov  4 17:16:21 lola-9 kernel: Pid: 6493, comm: mdt00_005
    Nov  4 17:16:21 lola-9 kernel:
    Nov  4 17:16:21 lola-9 kernel: Call Trace:
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa07d2875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa07d2e77>] lbug_with_loc+0x47/0xb0 [libcfs]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa14518cd>] osp_md_write+0x42d/0x4e0 [osp]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa092509d>] dt_record_write+0x3d/0x130 [obdclass]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa08e3398>] llog_osd_write_rec+0x768/0x1c50 [obdclass]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa08d1416>] llog_write_rec+0xb6/0x270 [obdclass]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa08da7e3>] llog_cat_add_rec+0x1c3/0x7b0 [obdclass]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa08d1229>] llog_add+0x89/0x1c0 [obdclass]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa0bca124>] sub_updates_write+0x1b4/0x14a0 [ptlrpc]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa0bcbe24>] top_trans_stop+0xa14/0xe30 [ptlrpc]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa136042e>] ? lod_attr_set+0x12e/0xaa0 [lod]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa0940800>] ? lu_ucred+0x20/0x30 [obdclass]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa134391c>] lod_trans_stop+0x2bc/0x330 [lod]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa13edb8a>] mdd_trans_stop+0x1a/0xac [mdd]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa13dcf3a>] mdd_create+0x12ea/0x1600 [mdd]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa128c7a4>] ? mdt_version_save+0x84/0x1a0 [mdt]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa128ef66>] mdt_reint_create+0xbb6/0xcc0 [mdt]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa0940800>] ? lu_ucred+0x20/0x30 [obdclass]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa126e675>] ? mdt_ucred+0x15/0x20 [mdt]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa128785c>] ? mdt_root_squash+0x2c/0x3f0 [mdt]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa0b73cf2>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa128b99d>] mdt_reint_rec+0x5d/0x200 [mdt]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa127777b>] mdt_reint_internal+0x62b/0xb80 [mdt]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa127816b>] mdt_reint+0x6b/0x120 [mdt]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa0bb60ec>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa0b5d9e1>] ptlrpc_main+0xe41/0x1910 [ptlrpc]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0
    Nov  4 17:16:21 lola-9 kernel: [<ffffffffa0b5cba0>] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
    Nov  4 17:16:21 lola-9 kernel: [<ffffffff8109e78e>] kthread+0x9e/0xc0
    Nov  4 17:16:21 lola-9 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
    Nov  4 17:16:21 lola-9 kernel: [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
    Nov  4 17:16:21 lola-9 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
    Nov  4 17:16:21 lola-9 kernel:
    Nov  4 17:16:21 lola-9 kernel: LustreError: dumping log to /tmp/lustre-log.1446686181.6493
    

    Attached soak.log, debug log file, messages and console log of node lola-9.



 Comments   
Comment by Frank Heckes (Inactive) [ 05/Nov/15 ]

Problem might be related LU-7039, due to the most recent patches applied to solve the issue.

Comment by Di Wang [ 05/Nov/15 ]

Yes, this is brought in by the patch of LU-7039, and I will fix LU-7039. so this can be marked as duplicate of LU-7039.

Comment by Joseph Gmitter (Inactive) [ 05/Nov/15 ]

Just discussing this again on the triage call and Oleg prefers the fix to occur in the new ticket since the patch for LU-7039 already landed.

Comment by Joseph Gmitter (Inactive) [ 05/Nov/15 ]

In discussing with Di and Oleg, this bug should be closed a a dup of LU-7039. Di reports that the bug is brought in by LU-703 in a large patch yet to be landed. The fix will occur there.
http://review.whamcloud.com/#/c/16969/

Generated at Sat Feb 10 02:08:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.