[LU-4881] Lustre client: eviction on open/truncate Created: 11/Apr/14 Updated: 13/Oct/21 Resolved: 13/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Patrick Valentin (Inactive) | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
kernel 2.6.32_431.1.2 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 13503 | ||||||||
| Description |
|
Lustre client: eviction on open/truncate lustre 2.4.2 and kernel 2.6.32_431.1.2 At CEA site, there are some applications openning the same file in O_TRUNC mode that cause client evictions and EIO errors (Input/output error). The support team wrote a reproducer (open_truncate_evicted.c attached file) allowing to reproduce the same error with 2 Bull Mesca nodes (32 core each). Below are the customer backtraces and the corresponding client and server log messages. Next there is a copy of the reporducer output, and the associated console output. Customer backtraces and corresponding log messages Backtrace of tasks trying to open the file: PID: 22464 TASK: ffff880b13ad3500 CPU: 25 COMMAND: "XYZ_mpi"
#0 [ffff880b13ad5938] schedule at ffffffff81528762
#1 [ffff880b13ad5a00] cfs_waitq_wait at ffffffffa03686fe [libcfs]
#2 [ffff880b13ad5a10] cl_lock_state_wait at ffffffffa04d94fa [obdclass]
#3 [ffff880b13ad5a90] cl_enqueue_locked at ffffffffa04d9ceb [obdclass]
#4 [ffff880b13ad5ad0] cl_lock_request at ffffffffa04da86e [obdclass]
#5 [ffff880b13ad5b30] cl_io_lock at ffffffffa04dfb0c [obdclass]
#6 [ffff880b13ad5b90] cl_io_loop at ffffffffa04dfd42 [obdclass]
#7 [ffff880b13ad5bc0] cl_setattr_ost at ffffffffa0a830d8 [lustre]
#8 [ffff880b13ad5c20] ll_setattr_raw at ffffffffa0a511fe [lustre]
#9 [ffff880b13ad5cc0] ll_setattr at ffffffffa0a5188b [lustre]
#10 [ffff880b13ad5cd0] notify_change at ffffffff811a79f8
#11 [ffff880b13ad5d40] do_truncate at ffffffff811876c4
#12 [ffff880b13ad5db0] do_filp_open at ffffffff8119c371
#13 [ffff880b13ad5f20] do_sys_open at ffffffff81186389
#14 [ffff880b13ad5f70] sys_open at ffffffff811864a0
#15 [ffff880b13ad5f80] system_call_fastpath at ffffffff8100b072
RIP: 00007f959ee2e55d RSP: 00007fff6f9929e8 RFLAGS: 00010206
RAX: 0000000000000002 RBX: ffffffff8100b072 RCX: 0000000000000006
RDX: 00000000000001b6 RSI: 0000000000000241 RDI: 00007f95a32a173f
RBP: 00007fff6f992940 R8: 00007f959f3c0489 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: ffffffff811864a0
R13: ffff880b13ad5f78 R14: 00007f959f5eab00 R15: 0000000000000004
ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b
The 11 hanging tasks waiting for the lock: PID: 22467 TASK: ffff880b756d3540 CPU: 6 COMMAND: "XYZ_mpi"
#0 [ffff880b756d5a58] schedule at ffffffff81528762
#1 [ffff880b756d5b20] rwsem_down_failed_common at ffffffff8152ae25
#2 [ffff880b756d5b80] rwsem_down_write_failed at ffffffff8152af83
#3 [ffff880b756d5bc0] call_rwsem_down_write_failed at ffffffff8128ef23
#4 [ffff880b756d5c20] ll_setattr_raw at ffffffffa0a509cc [lustre]
#5 [ffff880b756d5cc0] ll_setattr at ffffffffa0a5188b [lustre]
#6 [ffff880b756d5cd0] notify_change at ffffffff811a79f8
#7 [ffff880b756d5d40] do_truncate at ffffffff811876c4
#8 [ffff880b756d5db0] do_filp_open at ffffffff8119c371
#9 [ffff880b756d5f20] do_sys_open at ffffffff81186389
#10 [ffff880b756d5f70] sys_open at ffffffff811864a0
#11 [ffff880b756d5f80] system_call_fastpath at ffffffff8100b072
RIP: 00007f16e686655d RSP: 00007fff8ef09148 RFLAGS: 00010206
RAX: 0000000000000002 RBX: ffffffff8100b072 RCX: 0000000000000006
RDX: 00000000000001b6 RSI: 0000000000000241 RDI: 00007f16eacd973f
RBP: 00007fff8ef090a0 R8: 00007f16e6df8489 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: ffffffff811864a0
R13: ffff880b756d5f78 R14: 00007f16e7022b00 R15: 0000000000000004
ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b
Client log messages: LustreError: 11-0: fs3-OST00e4-osc-ffff88045c004000: Communicating with A.B.C.D@o2ib10, operation ost_punch failed with -107. Lustre: fs3-OST00e4-osc-ffff88045c004000: Connection to scratch3-OST00e4 (at A.B.C.D@o2ib10) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 2 previous similar messages LustreError: 167-0: fs3-OST00e4-osc-ffff88045c004000: This client was evicted by fs3-OST00e4; in progress operations using this service will fail. Lustre: fs3-OST00e4-osc-ffff88045c004000: Connection restored to fs3-OST00e4 (at A.B.C.D@o2ib10) Lustre: Skipped 2 previous similar messages Lustre: DEBUG MARKER: Wed Mar 5 15:15:01 2014 LustreError: 11-0: fs3-OST00e8-osc-ffff88045c004000: Communicating with A.B.C.D@o2ib10, operation obd_ping failed with -107. LustreError: Skipped 1 previous similar message LustreError: 167-0: fs3-OST00e8-osc-ffff88045c004000: This client was evicted by fs3-OST00e8; in progress operations using this service will fail. LustreError: Skipped 1 previous similar message LustreError: 3905:0:(ldlm_resource.c:804:ldlm_resource_complain()) fs3-OST00e8-osc-ffff88045c004000: namespace resource [0xe1b:0x0:0x0].0 (ffff8802cf99ce80) refcount nonzero (1) after lock cleanup; forcing cleanup. LustreError: 3905:0:(ldlm_resource.c:1415:ldlm_resource_dump()) --- Resource: [0xe1b:0x0:0x0].0 (ffff8802cf99ce80) refcount = 2 Lustre: DEBUG MARKER: Wed Mar 5 15:20:01 2014 Server log messages: LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 202s: evicting client at A.B.C.E@o2ib10 ns: filter-fs3-OST00e4_UUID lock: ffff8810233b9b40/0xb9a496999f417c9b lrc: 3/0,0 mode: PW/PW res: [0xe18:0x0:0x0].0 rrc: 4 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x80010020 nid: A.B.C.E@o2ib10 remote: 0xf075be8a293a25dc expref: 4 pid: 24731 timeout: 4809968903 lvb_type: 0 LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) Skipped 11 previous similar messages reporducer output and associated console messages reporducer first run: clascar54% ls -rtl total 85424 -rwxr-x---. 1 percher s8 85353840 Mar 6 15:34 initfile -rw-r-----. 1 percher s8 2097152 Mar 6 15:41 test0000.txt clascar54% srun --resv-ports -n 64 -N 2-2 ~/_cc/opentruncate/e 3 ERREUR dans open fd2: 5: Input/output error ERREUR dans write fd2: 5: Input/output error ERREUR dans open fd2: 5: Input/output error srun: error: clascar4174: task 3: Exited with exit code 1 srun: Terminating job step 108683.0 slurmd[clascar4175]: *** STEP 108683.0 KILLED AT 2014-03-06T15:41:51 WITH SIGNAL 9 *** slurmd[clascar4174]: *** STEP 108683.0 KILLED AT 2014-03-06T15:41:51 WITH SIGNAL 9 *** srun: Job step aborted: Waiting up to 2 seconds for job step to finish. slurmd[clascar4175]: *** STEP 108683.0 KILLED AT 2014-03-06T15:41:51 WITH SIGNAL 9 *** slurmd[clascar4174]: *** STEP 108683.0 KILLED AT 2014-03-06T15:41:51 WITH SIGNAL 9 *** srun: error: clascar4175: tasks 35,44: Exited with exit code 1 srun: error: clascar4174: tasks 0-2,4-31: Killed srun: error: clascar4175: tasks 32-34,36-43,45-63: Killed reporducer second run: clascar54% srun --resv-ports -n 64 -N 2-2 ~/_cc/opentruncate/e 3 ERREUR dans open fd2: 5: Input/output error ERREUR dans open fd2: 5: Input/output error ERREUR dans open fd2: 108: Cannot send after transport endpoint shutdown srun: error: clascar4024: task 42: Exited with exit code 1 srun: Terminating job step 108699.0 slurmd[clascar4024]: *** STEP 108699.0 KILLED AT 2014-03-06T16:13:55 WITH SIGNAL 9 *** slurmd[clascar4023]: *** STEP 108699.0 KILLED AT 2014-03-06T16:13:55 WITH SIGNAL 9 *** srun: Job step aborted: Waiting up to 2 seconds for job step to finish. srun: error: clascar4023: tasks 20-21: Exited with exit code 1 slurmd[clascar4023]: *** STEP 108699.0 KILLED AT 2014-03-06T16:13:55 WITH SIGNAL 9 *** slurmd[clascar4024]: *** STEP 108699.0 KILLED AT 2014-03-06T16:13:55 WITH SIGNAL 9 *** srun: error: clascar4023: tasks 0-19,22-31: Killed srun: error: clascar4024: tasks 32-41,43-63: Killed clascar54% reproducer running on lustre 2.1.6 (no error): clascar53% srun --resv-ports -n 64 -N 2-2 ~/_cc/opentruncate/eae2 3 srun: job 262910 queued and waiting for resources srun: job 262910 has been allocated resources clascar53% srun --resv-ports -n 64 -N 2-2 ~/_cc/opentruncate/eae2 3 srun: job 262924 queued and waiting for resources srun: job 262924 has been allocated resources clascar53% Console ::: clascar4174: clascar4174: Lustre: DEBUG MARKER: Thu Mar 6 15:40:01 2014 clascar4174: clascar4174: LustreError: 11-0: scratch3-OST00e1-osc-ffff8804658abc00: Communicating with JO.BOO.WL.LOT@o2ib10, operation ost_punch failed with -107. clascar4174: Lustre: scratch3-OST00e1-osc-ffff8804658abc00: Connection to scratch3-OST00e1 (at JO.BOO.WL.LOT@o2ib10) was lost; in progress operations using this service will wait for recovery to complete clascar4174: LustreError: 167-0: scratch3-OST00e1-osc-ffff8804658abc00: This client was evicted by scratch3-OST00e1; in progress operationsusing this service will fail. clascar4174: Lustre: scratch3-OST00e1-osc-ffff8804658abc00: Connection restored to scratch3-OST00e1 (at JO.BOO.WL.LOT@o2ib10) clascar4174: Lustre: DEBUG MARKER: Thu Mar 6 15:45:01 2014 clascar4174: clascar4175: Lustre: DEBUG MARKER: Thu Mar 6 15:40:01 2014 clascar4175: clascar4175: LustreError: 11-0: scratch3-OST00e2-osc-ffff880466736c00: Communicating with JO.BOO.WL.LOT@o2ib10, operation ldlm_enqueue failed with -107. clascar4175: Lustre: scratch3-OST00e2-osc-ffff880466736c00: Connection to scratch3-OST00e2 (at JO.BOO.WL.LOT@o2ib10) was lost; in progress operations using this service will wait for recovery to complete clascar4175: LustreError: 167-0: scratch3-OST00e2-osc-ffff880466736c00: This client was evicted by scratch3-OST00e2; in progress operationsusing this service will fail. clascar4175: Lustre: scratch3-OST00e2-osc-ffff880466736c00: Connection restored to scratch3-OST00e2 (at JO.BOO.WL.LOT@o2ib10) clascar4175: LustreError: 11-0: scratch3-OST00e1-osc-ffff880466736c00: Communicating with JO.BOO.WL.LOT@o2ib10, operation obd_ping failed with -107. clascar4175: Lustre: scratch3-OST00e1-osc-ffff880466736c00: Connection to scratch3-OST00e1 (at JO.BOO.WL.LOT@o2ib10) was lost; in progress operations using this service will wait for recovery to complete clascar4175: LustreError: 167-0: scratch3-OST00e1-osc-ffff880466736c00: This client was evicted by scratch3-OST00e1; in progress operationsusing this service will fail. clascar4175: Lustre: scratch3-OST00e1-osc-ffff880466736c00: Connection restored to scratch3-OST00e1 (at JO.BOO.WL.LOT@o2ib10) clascar4175: Lustre: DEBUG MARKER: Thu Mar 6 15:45:01 2014 clascar4175: |
| Comments |
| Comment by Bruno Faccini (Inactive) [ 11/Apr/14 ] |
|
Hello Patrick, |
| Comment by Lustre Bull [ 11/Apr/14 ] |
|
Hi Bruno, >> Humm, it seems it reminds me of an old problem ... yes, >> Just to fully qualify this problem, does the reproducer you attached always reproduce within CEA environment ? Yes they wrote in the MANTIS report that it's systematic and easy to reproduce with 2 nodes in lustre 2.4, and that they do no have the issue in 2.1 with the same reproducer (and application ???). >> And if yes and a lustre debug-log can be exported (??), could it be possible to run it again with full debug (or at least dlmtrace+rpctrace) enabled, a big debug log buffer configured, and also with dump_on_eviction set (or manual debug log dump) ? I ask them if they think it could be possible to export the logs, and transmit your requirements (trace level and log buffer size). Patrick. |
| Comment by Patrick Farrell (Inactive) [ 11/Apr/14 ] |
|
Patrick, Bruno - I haven't investigated this deeply, but from reading the description, this may be a duplicate of I don't know if it's mentioned in Not sure if the |
| Comment by Jodi Levi (Inactive) [ 15/Apr/14 ] |
|
Bruno, |
| Comment by Bruno Faccini (Inactive) [ 16/Apr/14 ] |
|
Patrick F.: thanks pointing to Jodi: I will double-check if Master is affected. Patrick V./Lustre Bull: any news about the availability of full debug log from the site ? Also, I have just been able to reproduce in-house by running the reproducer with a default striping of 12, and only 16 tasks running on 2x nodes with only 8 cores each ... |
| Comment by Patrick Valentin (Inactive) [ 16/Apr/14 ] |
|
Hi Bruno, |
| Comment by Bruno Faccini (Inactive) [ 17/Apr/14 ] |
|
Patrick V.: did you try to reproduce on your development cluster with a default-striping != 1 or enough "wide" ? And on the other hand, can you ask the site people if their default-striping is != 1 when they reproduce and if yes, to run their reproducer with a default-striping = 1 ? BTW, if I am asking this it is because, and even if I did not complete analysis of the full log from my in-house occurrence at this time (but will!), I strongly suspect what Patrick F. suggested, that this ticket is a dup of |
| Comment by Patrick Valentin (Inactive) [ 18/Apr/14 ] |
|
Bruno, We built a lustre 2.4.3 with patch #9152 of |
| Comment by Bruno Faccini (Inactive) [ 21/Apr/14 ] |
|
I have been able to complete the debug-log analysis of my own/in-house reproducer run, and it definitely shows the same scenario than the one described in both |
| Comment by Patrick Valentin (Inactive) [ 22/Apr/14 ] |
|
Bruno, |
| Comment by Bruno Faccini (Inactive) [ 03/Jun/14 ] |
|
Patrick, |