[LU-12808] jobid_get_from_cache softlockup. Created: 25/Sep/19  Updated: 20/Apr/20  Resolved: 20/Apr/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.2
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Mahmoud Hanafi Assignee: Wang Shilong (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12225 jobid_get_from_cache stalled on spin_... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

2.12.2 clients starting to see soft-lockup. Is it a dup of LU-12225

 

446307.776312] CPU: 14 PID: 32407 Comm: ggfs Tainted: G           OEL     4.12.14-95.19.1.20190617-nasa #1 SLE12-SP4 (unreleased)
[446307.788116] Hardware name: SGI.COM ICE-XIP113/X9DRT-Dakota, BIOS DA0E2016 02/01/2016
[446307.796279] task: ffff91f515800540 task.stack: ffff9d53c6e7c000
[446307.802625] RIP: 0010:native_queued_spin_lock_slowpath+0xda/0x1d0
[446307.809150] RSP: 0018:ffff9d53c6e7f710 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[446307.817142] RAX: 0000000000000000 RBX: ffff91f59c9acf00 RCX: 00000000003c0000
[446307.824699] RDX: ffff91f59f123a00 RSI: ffffffff89541a20 RDI: ffff91f59c9acf18
[446307.832256] RBP: ffff91f59c9acf18 R08: 0000000000000044 R09: 0000000000000000
[446307.839814] R10: 00003ffffffff000 R11: ffff91f591996708 R12: 0000000000000000
[446307.847373] R13: 0000000000000020 R14: ffff91ee5f54dcd0 R15: ffff91f51bb9f480
[446307.854939] FS:  00002aaaaab08000(0000) GS:ffff91f59f100000(0000) knlGS:0000000000000000
[446307.863459] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[446307.869637] CR2: 0000000015789018 CR3: 0000000adbd36006 CR4: 00000000001606e0
[446307.877196] Call Trace:
[446307.880082]  queued_spin_lock_slowpath+0x7/0xa
[446307.884980]  jobid_get_from_cache+0x1dc/0x610 [obdclass]
[446307.890739]  lustre_get_jobid+0x227/0x280 [obdclass]
[446307.896155]  ptlrpc_set_add_req+0x7e/0x100 [ptlrpc]
[446307.901482]  ptlrpc_queue_wait+0x70/0x210 [ptlrpc]
[446307.906726]  ldlm_cli_enqueue+0x3a1/0x8c0 [ptlrpc]
[446307.911969]  ? ldlm_expired_completion_wait+0x200/0x200 [ptlrpc]
[446307.918416]  ? ll_md_need_convert+0x160/0x160 [lustre]
[446307.923991]  ? mdc_changelog_cdev_finish+0x4e0/0x4e0 [mdc]
[446307.929911]  mdc_enqueue_base+0x33d/0x19f0 [mdc]
[446307.934962]  ? __switch_to_asm+0x40/0x70
[446307.939324]  mdc_intent_lock+0x126/0x4c0 [mdc]
[446307.944207]  ? ll_md_need_convert+0x160/0x160 [lustre]
[446307.949790]  ? ldlm_expired_completion_wait+0x200/0x200 [ptlrpc]
[446307.956225]  ? mdc_changelog_cdev_finish+0x4e0/0x4e0 [mdc]
[446307.962143]  lmv_intent_lock+0x3be/0x970 [lmv]
[446307.967021]  ? kmem_cache_alloc_trace+0x19e/0x1c0
[446307.972169]  ? ll_i2suppgid+0x11/0x30 [lustre]
[446307.977056]  ? ll_md_need_convert+0x160/0x160 [lustre]
[446307.982628]  ? ll_prep_md_op_data+0x1d0/0x4d0 [lustre]
[446307.988201]  ll_lookup_it+0x24e/0xa80 [lustre]
[446307.993081]  ll_lookup_nd+0x90/0x160 [lustre]
[446307.997865]  lookup_slow+0x91/0x130
[446308.001783]  walk_component+0x18f/0x210
[446308.006056]  path_lookupat+0x69/0x200
[446308.010154]  filename_lookup+0x9c/0x150
[446308.014420]  ? page_add_new_anon_rmap+0x72/0xc0
[446308.019384]  ? getname_flags+0x6c/0x1f0
[446308.023650]  vfs_statx+0x64/0xc0
[446308.027317]  SYSC_newstat+0x26/0x40
[446308.031243]  ? __do_page_fault+0x213/0x4c0
[446308.035775]  do_syscall_64+0x74/0x160
[446308.039876]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

 


 Comments   
Comment by Peter Jones [ 25/Sep/19 ]

Shilong

Do you think that this is a duplicate of LU-12225?

Peter

Comment by Wang Shilong (Inactive) [ 26/Sep/19 ]

This looks duplicated as LU-12225, the patch have been ported to b2_12 branch after release 2.12.2, still not new tags
2.12.3 release yet, so i'd suggest to double check your client used repo whether it include this fix, if not, we need upgrade to latest b2_12 branch to include the fix.

Thanks,
Shilong

Comment by Wang Shilong (Inactive) [ 20/Apr/20 ]

Feel free to re-open if still problem.

Generated at Sat Feb 10 02:55:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.