[LU-16251] Fill jobid in an atomic way Created: 19/Oct/22  Updated: 05/Jan/23  Resolved: 08/Nov/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Feng Lei Assignee: Feng Lei
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-16228 create lljobstats command Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

As per DDN-3114:

jobid_interpret_string() is not an atomic operation. It is possible that two processes hit the same file+offset (so the same request) but with different jobid schemes. So that jobid_interpret_string() is re-entered and generate a "combined" jobid.



 Comments   
Comment by Gerrit Updater [ 19/Oct/22 ]

"Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48915
Subject: LU-16251 obdclass: fill jobid in an atomic way
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d4c2887f4b0c170b918e65fff72ae4e76bd82d4a

Comment by Feng Lei [ 27/Oct/22 ]

The bug is reproduced after inserting a msleep(1) in jobid_interpret_string(). Here is the patch to reproduce the bug:

diff --git a/lustre/obdclass/jobid.c b/lustre/obdclass/jobid.c
index 1b4bdd3168..b9b33044f8 100644
--- a/lustre/obdclass/jobid.c
+++ b/lustre/obdclass/jobid.c
@@ -32,6 +32,7 @@
  */

 #define DEBUG_SUBSYSTEM S_RPC
+#include <linux/delay.h>
 #include <linux/user_namespace.h>
 #include <linux/uidgid.h>
 #include <linux/utsname.h>
@@ -696,6 +697,7 @@ static int jobid_interpret_string(const char *jobfmt, char *jobid,
                        l = snprintf(jobid, joblen, "%u", current->pid);
                        break;
                case 'u': /* user ID */
+                       msleep(1);
                        l = snprintf(jobid, joblen, "%u",
                                     from_kuid(&init_user_ns, current_fsuid()));
                        break;
diff --git a/lustre/tests/sanity.sh b/lustre/tests/sanity.sh
index 437d86aa96..bf47268373 100755
--- a/lustre/tests/sanity.sh
+++ b/lustre/tests/sanity.sh
@@ -28908,6 +28908,27 @@ test_906() {
 }
 run_test 906 "Simple test for io_uring I/O engine via fio"

+
+test_1001() {
+       local file=$DIR/$tdir/$tfile
+
+       test_mkdir $DIR/$tdir
+
+       for i in \{1..2}; do
+               echo "start worker $i..."
+               dd if=/dev/zero of=$file bs=1M count=100 &
+       done
+
+       echo "wait for all workers done..."
+       wait
+
+       $LCTL get_param *.*.job_stats | grep job_id | grep 'dd\.$'
+       if (( $? == 0)); then
+               error "found corrupted jobid"
+       fi
+}
+run_test 1001 "test corrupted jobid"
+
 complete $SECONDS
 [ -f $EXT2_DEV ] && rm $EXT2_DEV || true
 check_and_cleanup_lustre

The test result:

== sanity test 1001: test corrupted jobid ================ 09:20:20 (1666833620)
start worker 1...
start worker 2...
wait for all workers done...
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 1.63228 s, 64.2 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 1.65471 s, 63.4 MB/s
- job_id:          dd.
 sanity test_1001: @@@@@@ FAIL: found corrupted jobid 
  Trace dump:
  = ./../tests/test-framework.sh:6524:error()
  = sanity.sh:29016:test_1001()
  = ./../tests/test-framework.sh:6860:run_one()
  = ./../tests/test-framework.sh:6910:run_one_logged()
  = ./../tests/test-framework.sh:6732:run_test()
  = sanity.sh:29019:main()
Dumping lctl log to /tmp/test_logs/1666833512/sanity.test_1001.*.1666833622.log
FAIL 1001 (3s)

In this case we generate a corrupted jobid "dd." without the uid because there is a delay before appending uid to jobid. It should also be possible to generate other kinds of corruption easily if we insert the delay at other points and trigger it with other kinds of setting.

Comment by Gerrit Updater [ 08/Nov/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48915/
Subject: LU-16251 obdclass: fill jobid in a safe way
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9a0a89520e8b57bd63a9343fe3cdc56c61c41f6d

Comment by Peter Jones [ 08/Nov/22 ]

Landed for 2.16

Comment by James Beal [ 10/Nov/22 ]

I have just had a quick look at this patch given I am talking around this in a few weeks time.

 

Given:

 

 
- job_id:          kworker/u518:6eamtrynka
  snapshot_time:   1668009798
  read_bytes:      { samples:           1, unit: bytes, min:  663552, max:  663552, sum:          663552 }
  write_bytes:     { samples:           0, unit: bytes, min:       0, max:       0, sum:               0 }
 
- job_id:          kworker/u130:0
  snapshot_time:   1668009796
  read_bytes:      { samples:       21459, unit: bytes, min:    4096, max: 2297856, sum:       160104448 }
  write_bytes:     { samples:           0, unit: bytes, min:       0, max:       0, sum:               0 }
 
- job_id:          thread-pool-14113912
  snapshot_time:   1668009775
  read_bytes:      { samples:           2, unit: bytes, min:   36864, max:  983040, sum:         1019904 }
  write_bytes:     { samples:           0, unit: bytes, min:       0, max:       0, sum:               0 }
 
[root@lus23-oss1 ~]# lctl get_param obdfilter.lus23-OST0000.job_stats | grep job_id | grep thread-pool | wc -l
126
[root@lus23-oss1 ~]# lctl get_param obdfilter.lus23-OST0000.job_stats | grep job_id | grep kworker | wc -l
155
[root@lus23-oss1 ~]# lctl get_param obdfilter.lus23-OST0000.job_stats | grep job_id | wc -l
330

 

 

And https://review.whamcloud.com/c/fs/lustre-release/+/48915/4/lustre/obdclass/jobid.c#478

 

should thread-pool be on the internal list ( I don't know the answer its an honest question ).

 

( and for user communtity, Is this a is aserver side fix or a client fix ).

Generated at Sat Feb 10 03:25:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.