Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
After analyze online for HSM workload, we found the following contention between HSM archive request and hsm_cdtr thread:
hsm_archive:
mdt_hsm_add_actions
mdt_hsm_register_hal
mdt_agent_record_add
down_write(&cdt->cdt_llog_lock);
llog_cat_add()
up_write(&cdt->cdt_llog_lock);
// hsm_cdtr kernel daemon thread:
mdt_coordinator()
cdt_llog_process(mti->mti_env, mdt, mdt_coordinator_cb,
&hsd, 0, 0, WRITE);
down_write(&cdt->cdt_llog_lock);
llog_cat_process
up_write(&cdt->cdt_llog_lock);
# top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7675 root 20 0 0 0 0 R 100.0 0.0 8729:13 hsm_cdtr
1 root 20 0 53984 6232 2604 S 0.0 0.0 3:17.86 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.62 kthreadd
4 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
HSM archive and kernel thread hsm_cdtr are both contented for cdt->cdt_llog_lock to add or update the hsm_action llog.
The testing is use max_requests = 10000:
mdt.isg-tiny-MDT0000.hsm.max_requests=10000
It means "hsm_cdtr" kernel thread will scan 10000 llog actions with write lock held in the whole processing, reduce the max_requests to a small value (i.e. 256) should be mitigate the problem. Or we can release the write lock when scan certain number of hsm actions (128) in llog, resched and then re-acquire the write lock to go on scanning the hsm action llog.
And also when "hsm_cdtr" scan the hsm_action llog, we should record the latest cookie that has been handled, next scanning should be started from the previous end position .
Attachments
Issue Links
Activity
Link | Original: This issue is related to JFC-17 [ JFC-17 ] |
Link | Original: This issue is related to JFC-10 [ JFC-10 ] |
Link | New: This issue is related to JFC-20 [ JFC-20 ] |
Link | New: This issue is related to JFC-17 [ JFC-17 ] |
Fix Version/s | New: Lustre 2.15.0 [ 14791 ] | |
Resolution | New: Fixed [ 1 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Link | New: This issue is related to JFC-10 [ JFC-10 ] |
Description |
Original:
After analyze online for HSM workload, we found the following contention between HSM archive request and hsm_cdtr thread:
{code:java} hsm_archive: mdt_hsm_add_actions mdt_hsm_register_hal mdt_agent_record_add down_write(&cdt->cdt_llog_lock); llog_cat_add() up_write(&cdt->cdt_llog_lock); // hsm_cdtr kernel daemon thread: mdt_coordinator() cdt_llog_process(mti->mti_env, mdt, mdt_coordinator_cb, &hsd, 0, 0, WRITE); down_write(&cdt->cdt_llog_lock); llog_cat_process up_write(&cdt->cdt_llog_lock); # top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7675 root 20 0 0 0 0 R 100.0 0.0 8729:13 hsm_cdtr 1 root 20 0 53984 6232 2604 S 0.0 0.0 3:17.86 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.62 kthreadd 4 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H {code} HSM archive and kernel thread hsm_cdtr are both contented for cdt->cdt_llog_lock to add or update the hsm_action llog. The HPE testing is use max_requests = 10000: {code:java} mdt.isg-tiny-MDT0000.hsm.max_requests=10000 {code} It means "hsm_cdtr" kernel thread will scan 10000 llog actions with write lock held in the whole processing, reduce the max_requests to a small value (i.e. 256) should be mitigate the problem. Or we can release the write lock when scan certain number of hsm actions (128) in llog, resched and then re-acquire the write lock to go on scanning the hsm action llog. And also when "hsm_cdtr" scan the hsm_action llog, we should record the latest cookie that has been handled, next scanning should be started from the previous end position . |
New:
After analyze online for HSM workload, we found the following contention between HSM archive request and hsm_cdtr thread:
{code:java} hsm_archive: mdt_hsm_add_actions mdt_hsm_register_hal mdt_agent_record_add down_write(&cdt->cdt_llog_lock); llog_cat_add() up_write(&cdt->cdt_llog_lock); // hsm_cdtr kernel daemon thread: mdt_coordinator() cdt_llog_process(mti->mti_env, mdt, mdt_coordinator_cb, &hsd, 0, 0, WRITE); down_write(&cdt->cdt_llog_lock); llog_cat_process up_write(&cdt->cdt_llog_lock); # top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7675 root 20 0 0 0 0 R 100.0 0.0 8729:13 hsm_cdtr 1 root 20 0 53984 6232 2604 S 0.0 0.0 3:17.86 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.62 kthreadd 4 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H {code} HSM archive and kernel thread hsm_cdtr are both contented for cdt->cdt_llog_lock to add or update the hsm_action llog. The testing is use max_requests = 10000: {code:java} mdt.isg-tiny-MDT0000.hsm.max_requests=10000 {code} It means "hsm_cdtr" kernel thread will scan 10000 llog actions with write lock held in the whole processing, reduce the max_requests to a small value (i.e. 256) should be mitigate the problem. Or we can release the write lock when scan certain number of hsm actions (128) in llog, resched and then re-acquire the write lock to go on scanning the hsm action llog. And also when "hsm_cdtr" scan the hsm_action llog, we should record the latest cookie that has been handled, next scanning should be started from the previous end position . |
Description |
Original:
After analyze online for HPE HSM workload, we found the following contention between HSM archive request and hsm_cdtr thread:
{code:java} hsm_archive: mdt_hsm_add_actions mdt_hsm_register_hal mdt_agent_record_add down_write(&cdt->cdt_llog_lock); llog_cat_add() up_write(&cdt->cdt_llog_lock); // hsm_cdtr kernel daemon thread: mdt_coordinator() cdt_llog_process(mti->mti_env, mdt, mdt_coordinator_cb, &hsd, 0, 0, WRITE); down_write(&cdt->cdt_llog_lock); llog_cat_process up_write(&cdt->cdt_llog_lock); # top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7675 root 20 0 0 0 0 R 100.0 0.0 8729:13 hsm_cdtr 1 root 20 0 53984 6232 2604 S 0.0 0.0 3:17.86 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.62 kthreadd 4 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H {code} HSM archive and kernel thread hsm_cdtr are both contented for cdt->cdt_llog_lock to add or update the hsm_action llog. The HPE testing is use max_requests = 10000: {code:java} mdt.isg-tiny-MDT0000.hsm.max_requests=10000 {code} It means "hsm_cdtr" kernel thread will scan 10000 llog actions with write lock held in the whole processing, reduce the max_requests to a small value (i.e. 256) should be mitigate the problem. Or we can release the write lock when scan certain number of hsm actions (128) in llog, resched and then re-acquire the write lock to go on scanning the hsm action llog. And also when "hsm_cdtr" scan the hsm_action llog, we should record the latest cookie that has been handled, next scanning should be started from the previous end position . |
New:
After analyze online for HSM workload, we found the following contention between HSM archive request and hsm_cdtr thread:
{code:java} hsm_archive: mdt_hsm_add_actions mdt_hsm_register_hal mdt_agent_record_add down_write(&cdt->cdt_llog_lock); llog_cat_add() up_write(&cdt->cdt_llog_lock); // hsm_cdtr kernel daemon thread: mdt_coordinator() cdt_llog_process(mti->mti_env, mdt, mdt_coordinator_cb, &hsd, 0, 0, WRITE); down_write(&cdt->cdt_llog_lock); llog_cat_process up_write(&cdt->cdt_llog_lock); # top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7675 root 20 0 0 0 0 R 100.0 0.0 8729:13 hsm_cdtr 1 root 20 0 53984 6232 2604 S 0.0 0.0 3:17.86 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.62 kthreadd 4 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H {code} HSM archive and kernel thread hsm_cdtr are both contented for cdt->cdt_llog_lock to add or update the hsm_action llog. The HPE testing is use max_requests = 10000: {code:java} mdt.isg-tiny-MDT0000.hsm.max_requests=10000 {code} It means "hsm_cdtr" kernel thread will scan 10000 llog actions with write lock held in the whole processing, reduce the max_requests to a small value (i.e. 256) should be mitigate the problem. Or we can release the write lock when scan certain number of hsm actions (128) in llog, resched and then re-acquire the write lock to go on scanning the hsm action llog. And also when "hsm_cdtr" scan the hsm_action llog, we should record the latest cookie that has been handled, next scanning should be started from the previous end position . |