Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
[ 5302.557213] Lustre: DEBUG MARKER: == sanity test 205g: stress test for job_stats procfile == 00:21:32 (1735777292) [ 5393.581798] LustreError: 303135:0:(lprocfs_jobstats.c:133:job_putref()) ASSERTION( kref_read(&job->js_refcount) > 0 ) failed: [ 5393.581997] LustreError: 303135:0:(lprocfs_jobstats.c:133:job_putref()) LBUG [ 5393.582044] CPU: 1 PID: 303135 Comm: lctl Tainted: G W O --------- - - 4.18.0 #11 [ 5393.582084] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014 [ 5393.582124] Call Trace: [ 5393.582161] dump_stack+0x6e/0xa0 [ 5393.582189] lbug_with_loc.cold.4+0x5/0x63 [libcfs] [ 5393.582221] job_putref+0xa6/0xe0 [obdclass] [ 5393.582297] lprocfs_jobstats_seq_show+0x2d1/0x520 [obdclass] [ 5393.582374] seq_read+0x2c8/0x3e0 [ 5393.582398] proc_reg_read+0x31/0x50 [ 5393.582421] vfs_read+0xa1/0x150 [ 5393.582441] ksys_read+0x3d/0xa0 [ 5393.582462] do_syscall_64+0x4b/0x1b0 [ 5393.582483] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 5393.582511] RIP: 0033:0x7fe718b459b2
In lprocfs_job_cleanup() expired jobs can be put, however dropping from the lru happens at a separate point.
jobs should only be expired 'once' however it is desirable to avoid spinlocks in this tight loop.
Instead add a status flag to expire the job and avoid a double put.
Attachments
Issue Links
- is related to
-
LU-18351 Job stats scaling
- Resolved