[LU-7332] LustreError: 201113:0:(osd_internal.h:1101:osd_trans_exec_check()) LBUG Created: 23/Oct/15  Updated: 18/May/16  Resolved: 26/Oct/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Vinayak (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Attachments: Text File sanity51eLBUG.tar    
Issue Links:
Related
is related to LU-5770 wrong tx credit calculations in mdd_d... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We are getting hit with this issue much frequently while running
test: sanity, test 51e

I am attaching the logs to this ticket.
Logs from node 51e.windu03.log.windu00:

LustreError: 201113:0:(osd_internal.h:1101:osd_trans_exec_check()) LBUG
 Pid: 201113, comm: mdt03_000
 
 Call Trace:
  libcfs_debug_dumpstack+0x55/0x80 [libcfs]
  lbug_with_loc+0x47/0xb0 [libcfs]
  osd_xattr_set+0x5d8/0x6c0 [osd_ldiskfs]
  ? ldiskfs_xattr_inode_get+0xdb/0xf0 [ldiskfs]
  lod_sub_object_xattr_set+0x223/0x460 [lod]
  lod_xattr_set_internal+0x126/0x2b0 [lod]
  lod_xattr_set+0x101/0x430 [lod]
  ? mdd_env_info+0x25/0x70 [mdd]
  mdd_links_write+0x235/0x2e0 [mdd]
  mdd_links_rename+0x312/0x620 [mdd]
  mdd_link+0x104c/0x10f0 [mdd]
  mdt_reint_link+0x9b1/0xb40 [mdt]
  ? mdt_root_squash+0x2c/0x3f0 [mdt]
  ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
  mdt_reint_rec+0x5d/0x200 [mdt]
  mdt_reint_internal+0x62b/0xb80 [mdt]
  mdt_reint+0x6b/0x120 [mdt]
  tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
  ptlrpc_main+0xe41/0x1910 [ptlrpc]
  ? ptlrpc_main+0x0/0x1910 [ptlrpc]
  kthread+0x96/0xa0
  child_rip+0xa/0x20
  ? kthread+0x0/0xa0
  ? child_rip+0x0/0x20


 Comments   
Comment by Vinayak (Inactive) [ 23/Oct/15 ]

Hello Andreas,

Initially we thought that this issue is much related to LU-6969 but we are still getting this issue even after LU-6969 patch is merged.

We found this issue on Latest Intel master. Please help me in correcting the Affect version also.

Comment by Peter Jones [ 23/Oct/15 ]

I am assuming that by "Latest Intel master" you mean the tip of the community tree master.

Comment by Vinayak (Inactive) [ 23/Oct/15 ]

Yes Peter. I meant the same.

Thanks,

Comment by Alex Zhuravlev [ 23/Oct/15 ]

this is because of huge LINKEA. please try http://review.whamcloud.com/#/c/12412/

Comment by Vinayak (Inactive) [ 23/Oct/15 ]

Thanks for pointing me to the solution Alex. I will try it and let you know..

Comment by Andreas Dilger [ 23/Oct/15 ]

Please reply back if that patch fixed your problem, and we can prioritize the landing of the patch.

Comment by Vinayak (Inactive) [ 24/Oct/15 ]

Hello Andreas, Alex,

We have tried the patch and sanity, test_51e passes in the initial run on 4 node set up (2 clients, 1 MDS, 1 OSS). Submitted the test for multi run on the same set up and also asked our testing team to verify the issue on environment (10+ nodes production env) where it is reproducible. I will update you soon whatever I hear back from our testing team.

Thanks,

Comment by Vinayak (Inactive) [ 26/Oct/15 ]

Multi run passed all (100 times) test instances.

Comment by Peter Jones [ 26/Oct/15 ]

Thanks! We'll close this out as a duplicate of LU-5770 then

Comment by Vinayak (Inactive) [ 27/Oct/15 ]

Sure Peter. I will keep the updates posted.

Thanks,

Generated at Sat Feb 10 02:07:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.