[LU-16251] Fill jobid in an atomic way Created: 19/Oct/22 Updated: 05/Jan/23 Resolved: 08/Nov/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Feng Lei | Assignee: | Feng Lei |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
As per DDN-3114: jobid_interpret_string() is not an atomic operation. It is possible that two processes hit the same file+offset (so the same request) but with different jobid schemes. So that jobid_interpret_string() is re-entered and generate a "combined" jobid. |
| Comments |
| Comment by Gerrit Updater [ 19/Oct/22 ] |
|
"Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48915 |
| Comment by Feng Lei [ 27/Oct/22 ] |
|
The bug is reproduced after inserting a msleep(1) in jobid_interpret_string(). Here is the patch to reproduce the bug: diff --git a/lustre/obdclass/jobid.c b/lustre/obdclass/jobid.c
index 1b4bdd3168..b9b33044f8 100644
--- a/lustre/obdclass/jobid.c
+++ b/lustre/obdclass/jobid.c
@@ -32,6 +32,7 @@
*/
#define DEBUG_SUBSYSTEM S_RPC
+#include <linux/delay.h>
#include <linux/user_namespace.h>
#include <linux/uidgid.h>
#include <linux/utsname.h>
@@ -696,6 +697,7 @@ static int jobid_interpret_string(const char *jobfmt, char *jobid,
l = snprintf(jobid, joblen, "%u", current->pid);
break;
case 'u': /* user ID */
+ msleep(1);
l = snprintf(jobid, joblen, "%u",
from_kuid(&init_user_ns, current_fsuid()));
break;
diff --git a/lustre/tests/sanity.sh b/lustre/tests/sanity.sh
index 437d86aa96..bf47268373 100755
--- a/lustre/tests/sanity.sh
+++ b/lustre/tests/sanity.sh
@@ -28908,6 +28908,27 @@ test_906() {
}
run_test 906 "Simple test for io_uring I/O engine via fio"
+
+test_1001() {
+ local file=$DIR/$tdir/$tfile
+
+ test_mkdir $DIR/$tdir
+
+ for i in \{1..2}; do
+ echo "start worker $i..."
+ dd if=/dev/zero of=$file bs=1M count=100 &
+ done
+
+ echo "wait for all workers done..."
+ wait
+
+ $LCTL get_param *.*.job_stats | grep job_id | grep 'dd\.$'
+ if (( $? == 0)); then
+ error "found corrupted jobid"
+ fi
+}
+run_test 1001 "test corrupted jobid"
+
complete $SECONDS
[ -f $EXT2_DEV ] && rm $EXT2_DEV || true
check_and_cleanup_lustre
The test result: == sanity test 1001: test corrupted jobid ================ 09:20:20 (1666833620) start worker 1... start worker 2... wait for all workers done... 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 1.63228 s, 64.2 MB/s 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 1.65471 s, 63.4 MB/s - job_id: dd. sanity test_1001: @@@@@@ FAIL: found corrupted jobid Trace dump: = ./../tests/test-framework.sh:6524:error() = sanity.sh:29016:test_1001() = ./../tests/test-framework.sh:6860:run_one() = ./../tests/test-framework.sh:6910:run_one_logged() = ./../tests/test-framework.sh:6732:run_test() = sanity.sh:29019:main() Dumping lctl log to /tmp/test_logs/1666833512/sanity.test_1001.*.1666833622.log FAIL 1001 (3s) In this case we generate a corrupted jobid "dd." without the uid because there is a delay before appending uid to jobid. It should also be possible to generate other kinds of corruption easily if we insert the delay at other points and trigger it with other kinds of setting. |
| Comment by Gerrit Updater [ 08/Nov/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48915/ |
| Comment by Peter Jones [ 08/Nov/22 ] |
|
Landed for 2.16 |
| Comment by James Beal [ 10/Nov/22 ] |
|
I have just had a quick look at this patch given I am talking around this in a few weeks time.
Given:
- job_id: kworker/u518:6eamtrynka
snapshot_time: 1668009798
read_bytes: { samples: 1, unit: bytes, min: 663552, max: 663552, sum: 663552 }
write_bytes: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
- job_id: kworker/u130:0
snapshot_time: 1668009796
read_bytes: { samples: 21459, unit: bytes, min: 4096, max: 2297856, sum: 160104448 }
write_bytes: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
- job_id: thread-pool-14113912
snapshot_time: 1668009775
read_bytes: { samples: 2, unit: bytes, min: 36864, max: 983040, sum: 1019904 }
write_bytes: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0 }
[root@lus23-oss1 ~]# lctl get_param obdfilter.lus23-OST0000.job_stats | grep job_id | grep thread-pool | wc -l
126
[root@lus23-oss1 ~]# lctl get_param obdfilter.lus23-OST0000.job_stats | grep job_id | grep kworker | wc -l
155
[root@lus23-oss1 ~]# lctl get_param obdfilter.lus23-OST0000.job_stats | grep job_id | wc -l
330
And https://review.whamcloud.com/c/fs/lustre-release/+/48915/4/lustre/obdclass/jobid.c#478
should thread-pool be on the internal list ( I don't know the answer its an honest question ).
( and for user communtity, Is this a is aserver side fix or a client fix ). |