[LU-5469] Intermittent Clients hang during IO Created: 09/Aug/14 Updated: 10/Mar/16 Resolved: 10/Mar/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Mahmoud Hanafi | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
clients: sles11SP3 2.4.3 Server: centos 2.4.3 |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 15236 |
| Description |
|
Clients hang during IO this is a typical trace for the hung process [287554.530174] ld S ffff880989104a30 0 17536 1 0x00000000^M [287554.551603] ffff88096e0759a8 0000000000000082 ffff88096e074010 0000000000011800^M [287554.574176] 0000000000011800 0000000000011800 0000000000011800 ffff88096e075fd8^M [287554.596747] ffff88096e075fd8 0000000000011800 ffff8809de16c640 ffff8802733aa040^M [287554.619319] Call Trace:^M [287554.626926] [<ffffffffa08916a5>] cl_sync_io_wait+0x365/0x450 [obdclass]^M [287554.647277] [<ffffffffa0cc9169>] vvp_page_sync_io+0x59/0x120 [lustre]^M [287554.667083] [<ffffffffa0cc9731>] vvp_io_commit_write+0x501/0x640 [lustre]^M [287554.687953] [<ffffffffa0891ebc>] cl_io_commit_write+0x9c/0x1d0 [obdclass]^M [287554.708816] [<ffffffffa0c9df14>] ll_commit_write+0x104/0x1f0 [lustre]^M [287554.728608] [<ffffffffa0cb6fda>] ll_write_end+0x2a/0x60 [lustre]^M [287554.747098] [<ffffffff810f83e2>] generic_perform_write+0x122/0x1c0^M [287554.766095] [<ffffffff810f84e1>] generic_file_buffered_write+0x61/0xa0^M [287554.786125] [<ffffffff810fb476>] __generic_file_aio_write+0x296/0x490^M [287554.805895] [<ffffffff810fb6bc>] generic_file_aio_write+0x4c/0xb0^M [287554.824646] [<ffffffffa0ccc111>] vvp_io_write_start+0xc1/0x2e0 [lustre]^M [287554.844973] [<ffffffffa088ea09>] cl_io_start+0x69/0x140 [obdclass]^M [287554.864028] [<ffffffffa0892dc3>] cl_io_loop+0xa3/0x190 [obdclass]^M [287554.882804] [<ffffffffa0c71d91>] ll_file_io_generic+0x461/0x600 [lustre]^M [287554.903365] [<ffffffffa0c72166>] ll_file_aio_write+0x236/0x290 [lustre]^M [287554.923680] [<ffffffffa0c73373>] ll_file_write+0x203/0x290 [lustre]^M [287554.942941] [<ffffffff8115b03e>] vfs_write+0xce/0x140^M [287554.958555] [<ffffffff8115b1b3>] sys_write+0x53/0xa0^M [287554.973915] [<ffffffff81479c92>] system_call_fastpath+0x16/0x1b^M [287554.992131] [<00007ffe3f20e0b0>] 0x7ffe3f20e0af^M We don't see any errors. OSS trace is attached. |
| Comments |
| Comment by Peter Jones [ 09/Aug/14 ] |
|
Lai Could you please assist with this issue? Thanks Peter |
| Comment by Mahmoud Hanafi [ 11/Aug/14 ] |
|
I think I have captured debug info during a client hang. bridge4 ~ # lsof /nobackupp7 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME csh 11017 jahousma cwd DIR 1170,297908 4096 144124584776438323 /nobackupp7/jahousma/work/LAVA-FA/windTunnel_9x7/lava3/anaysis_data20140801/M1.6_x300 tar 11212 jahousma cwd DIR 1170,297908 4096 144124584776438323 /nobackupp7/jahousma/work/LAVA-FA/windTunnel_9x7/lava3/anaysis_data20140801/M1.6_x300 tar 11212 jahousma 3r REG 1170,297908 8905963520 144124584776438324 /nobackupp7/jahousma/work/LAVA-FA/windTunnel_9x7/lava3/anaysis_data20140801/M1.6_x300/prep.tar tar 11212 jahousma 4w REG 1170,297908 2072576 144124785633263686 /nobackupp7/jahousma/work/LAVA-FA/windTunnel_9x7/lava3/anaysis_data20140801/M1.6_x300/prep/GRID_DIR/grid.0441 (deleted) bridge4 ~ # lfs getstripe /nobackupp7/jahousma/work/LAVA-FA/windTunnel_9x7/lava3/anaysis_data20140801/M1.6_x300/prep/GRID_DIR/grid.0441 /nobackupp7/jahousma/work/LAVA-FA/windTunnel_9x7/lava3/anaysis_data20140801/M1.6_x300/prep/GRID_DIR/grid.0441 lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_layout_gen: 0 lmm_stripe_offset: 72 obdidx objid objid group 72 25921684 0x18b8894 0 bridge4 ~ # lfs path2fid /nobackupp7/jahousma/work/LAVA-FA/windTunnel_9x7/lava3/anaysis_data20140801/M1.6_x300/prep/GRID_DIR/grid.0441 [0x20008ba9b:0x46:0x0] [0]kdb> btp 11212
Stack traceback for pid 11212
0xffff88276ff3a4c0 11212 11017 0 8 S 0xffff88276ff3ab30 tar
[<ffffffff8146fb6b>] thread_return+0x0/0x295
[<ffffffffa09046a5>] cl_sync_io_wait+0x365/0x450 [obdclass]
[<ffffffffa0d3c169>] vvp_page_sync_io+0x59/0x120 [lustre]
[<ffffffffa0d3c731>] vvp_io_commit_write+0x501/0x640 [lustre]
[<ffffffffa0904ebc>] cl_io_commit_write+0x9c/0x1d0 [obdclass]
[<ffffffffa0d10f14>] ll_commit_write+0x104/0x1f0 [lustre]
[<ffffffffa0d29fda>] ll_write_end+0x2a/0x60 [lustre]
[<ffffffff810f83e2>] generic_perform_write+0x122/0x1c0
[<ffffffff810f84e1>] generic_file_buffered_write+0x61/0xa0
[<ffffffff810fb476>] __generic_file_aio_write+0x296/0x490
[<ffffffff810fb6bc>] generic_file_aio_write+0x4c/0xb0
[<ffffffffa0d3f111>] vvp_io_write_start+0xc1/0x2e0 [lustre]
[<ffffffffa0901a09>] cl_io_start+0x69/0x140 [obdclass]
[<ffffffffa0905dc3>] cl_io_loop+0xa3/0x190 [obdclass]
[<ffffffffa0ce4d91>] ll_file_io_generic+0x461/0x600 [lustre]
[<ffffffffa0ce5166>] ll_file_aio_write+0x236/0x290 [lustre]
[<ffffffffa0ce6373>] ll_file_write+0x203/0x290 [lustre]
[<ffffffff8115b03e>] vfs_write+0xce/0x140
[<ffffffff8115b1b3>] sys_write+0x53/0xa0
[<ffffffff81479c92>] system_call_fastpath+0x16/0x1b
[<00007fffcebea0b0>] 0x7fffcebea0b0
obdidx 72 is nbp7-OST0048 on service194. Client is service64. See uploaded tar file ftp://.../uploads/LU-5469/LU-5469.tgz |
| Comment by Oleg Drokin [ 11/Aug/14 ] |
|
Hm, I wonder what led you to think it's the problem on this OSS, is there anything in the logs (I see you said no errors)? All threads appear to be pretty much idle. Does the client ever emerge from this state, or once it's hit it's always sit there and never completes? |
| Comment by Mahmoud Hanafi [ 11/Aug/14 ] |
|
I am not Not sure if the problem is on the oss. But some time evicting the client from the OSS will free up the hung commands. See attached file (service64.trace.gz) for all threads from the client. This client was rebooted and with out any other lustre activity the untar was ran which caused it to hang. |
| Comment by Mahmoud Hanafi [ 11/Aug/14 ] |
|
Some additional info that may help. I had service64 in kdb and then return to normal state it reconnected to lustre and was evicted due to time outs. Here is the errors on the console for fid [0x20008ba9b:0x46:0x0] [ 3554.774736] Lustre: Evicted from MGS (at 10.151.27.38@o2ib) after server handle changed from 0xd98dcc9c3a960a33 to 0xd98dcc9c3ab9e594^M [ 3554.814302] LustreError: 167-0: nbp7-MDT0000-mdc-ffff881757f23000: This client was evicted by nbp7-MDT0000; in progress operations using this service will fail.^M [ 3554.814397] Lustre: MGC10.151.27.38@o2ib: Connection restored to MGS (at 10.151.27.38@o2ib)^M [ 3554.888956] LustreError: 167-0: nbp7-OST0000-osc-ffff881757f23000: This client was evicted by nbp7-OST0000; in progress operations using this service will fail.^M [ 3554.890014] LustreError: 14444:0:(file.c:163:ll_close_inode_openhandle()) inode 144124584776433707 mdc close failed: rc = -108^M [ 3554.892398] LustreError: 14444:0:(file.c:163:ll_close_inode_openhandle()) inode 144124584776438323 mdc close failed: rc = -108^M [ 3554.898727] Lustre: nbp7-MDT0000-mdc-ffff881757f23000: Connection restored to nbp7-MDT0000 (at 10.151.27.38@o2ib)^M [ 3578.892107] Lustre: Evicted from nbp7-OST0024_UUID (at 10.151.27.52@o2ib) after server handle changed from 0x7995b8509a5a26d9 to 0x7995b8509a5b49bb^M [ 3578.935531] LustreError: 167-0: nbp7-OST0024-osc-ffff881757f23000: This client was evicted by nbp7-OST0024; in progress operations using this service will fail.^M [ 3578.982721] LustreError: Skipped 35 previous similar messages^M [ 3578.982962] LustreError: 11-0: nbp7-OST0034-osc-ffff881757f23000: Communicating with 10.151.27.52@o2ib, operation ost_connect failed with -114.^M [ 3578.983268] Lustre: nbp7-OST0038-osc-ffff881757f23000: Connection restored to nbp7-OST0038 (at 10.151.27.52@o2ib)^M [ 3578.983272] Lustre: Skipped 36 previous similar messages^M [ 3578.983348] LustreError: 11212:0:(vvp_io.c:1088:vvp_io_commit_write()) Write page 506 of inode ffff88100f4946f8 failed -5^M [ 3578.983578] Lustre: 8742:0:(llite_lib.c:2501:ll_dirty_page_discard_warn()) nbp7: dirty page discard: 10.151.27.38@o2ib:/nbp7/fid: [0x20008ba9b:0x46:0x0]//jahousma/work/LAVA-FA/windTunnel_9x7/lava3/anaysis_data20140801/M1.6_x300/prep/GRID_DIR/grid.0441 may get corrupted (rc -108)^M [ 3578.983702] LustreError: 14516:0:(osc_lock.c:817:osc_ldlm_completion_ast()) lock@ffff881048c364d8[2 3 0 1 1 00000000] W(2):[0, 18446744073709551615]@[0x100480000:0x18b8894:0x0] {^M [ 3578.983712] LustreError: 14516:0:(osc_lock.c:817:osc_ldlm_completion_ast()) lovsub@ffff88376d8c3720: [0 ffff8810258aaa88 W(2):[0, 18446744073709551615]@[0x20008ba9b:0x46:0x0]] ^M [ 3578.983722] LustreError: 14516:0:(osc_lock.c:817:osc_ldlm_completion_ast()) osc@ffff881048cb0b00: ffff881049baa6c0 0x20040040001 0x17ccc6540b4b0a6d 3 ffff88073bb50c70 size: 0 mtime: 0 atime: 0 ctime: 0 blocks: 0^M [ 3578.983727] LustreError: 14516:0:(osc_lock.c:817:osc_ldlm_completion_ast()) } lock@ffff881048c364d8^M [ 3578.983731] LustreError: 14516:0:(osc_lock.c:817:osc_ldlm_completion_ast()) dlmlock returned -5^M [ 3578.983800] LustreError: 14516:0:(ldlm_resource.c:804:ldlm_resource_complain()) nbp7-OST0048-osc-ffff881757f23000: namespace resource [0x18b8894:0x0:0x0].0 (ffff8807195475c0) refcount nonzero (1) after lock cleanup; forcing cleanup.^M [ 3578.983807] LustreError: 14516:0:(ldlm_resource.c:1415:ldlm_resource_dump()) --- Resource: [0x18b8894:0x0:0x0].0 (ffff8807195475c0) refcount = 2^M [ 3578.983811] LustreError: 14516:0:(ldlm_resource.c:1418:ldlm_resource_dump()) Granted locks (in reverse order):^M [ 3578.983821] LustreError: 14516:0:(ldlm_resource.c:1421:ldlm_resource_dump()) ### ### ns: nbp7-OST0048-osc-ffff881757f23000 lock: ffff881049baa6c0/0x17ccc6540b4b0a6d lrc: 3/0,0 mode: PW/PW res: [0x18b8894:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->8191) flags: 0x126400000000 nid: local remote: 0x7995b8509a5a93bb expref: -99 pid: 11212 timeout: 0 lvb_type: 1^M [ 3578.991462] LustreError: 11212:0:(cl_lock.c:1420:cl_unuse_try()) result = -5, this is unlikely!^M [ 3578.991477] LustreError: 11212:0:(cl_lock.c:1435:cl_unuse_locked()) lock@ffff881048c36858[1 0 0 1 0 00000000] W(2):[0, 18446744073709551615]@[0x20008ba9b:0x46:0x0] {^M [ 3578.991485] LustreError: 11212:0:(cl_lock.c:1435:cl_unuse_locked()) vvp@ffff881048ccf5d8: ^M [ 3578.991491] LustreError: 11212:0:(cl_lock.c:1435:cl_unuse_locked()) lov@ffff8810258aaa88: 1^M [ 3578.991497] LustreError: 11212:0:(cl_lock.c:1435:cl_unuse_locked()) 0 0: ---^M [ 3578.991501] LustreError: 11212:0:(cl_lock.c:1435:cl_unuse_locked()) ^M [ 3578.991506] LustreError: 11212:0:(cl_lock.c:1435:cl_unuse_locked()) } lock@ffff881048c36858^M [ 3578.991512] LustreError: 11212:0:(cl_lock.c:1435:cl_unuse_locked()) unuse return -5^M [ 3579.409090] Lustre: Evicted from nbp7-OST0026_UUID (at 10.151.27.54@o2ib) after server handle changed from 0x8b8ade63867f09b8 to 0x8b8ade63867fe14c^M [ 3579.409094] Lustre: Skipped 10 previous similar messages^M [ 3579.409273] LustreError: 11-0: nbp7-OST0036-osc-ffff881757f23000: Communicating with 10.151.27.54@o2ib, operation ost_connect failed with -114.^M [ 3580.026860] LustreError: 167-0: nbp7-OST0025-osc-ffff881757f23000: This client was evicted by nbp7-OST0025; in progress operations using this service will fail.^M [ 3580.073983] LustreError: Skipped 32 previous similar messages^M [ 3580.093257] Lustre: nbp7-OST0035-osc-ffff881757f23000: Connection restored to nbp7-OST0035 (at 10.151.27.53@o2ib)^M [ 3580.127000] Lustre: Skipped 43 previous similar messages^M [ 3604.564439] Lustre: Evicted from nbp7-OST0034_UUID (at 10.151.27.52@o2ib) after server handle changed from 0x7995b8509a5a26fc to 0x7995b8509a5b49a6^M [ 3604.607860] Lustre: Skipped 33 previous similar messages^M [ 3604.625298] LustreError: 167-0: nbp7-OST0034-osc-ffff881757f23000: This client was evicted by nbp7-OST0034; in progress operations using this service will fail.^M [ 3604.672415] LustreError: Skipped 11 previous similar messages^M [ 3604.677874] Lustre: nbp7-OST0034-osc-ffff881757f23000: Connection restored to nbp7-OST0034 (at 10.151.27.52@o2ib)^M |
| Comment by Oleg Drokin [ 11/Aug/14 ] |
|
So can you please elaborate on the "intermittent hang"? Does the client recover all by itself after some time? |
| Comment by Mahmoud Hanafi [ 11/Aug/14 ] |
|
The issue is isolated to some users and some directories. The client doesn't recover unless it is rebooted. But after a reboot it can hang again with the same directory. on the effected client/directory a ls will hang, but it will work from a second host. |
| Comment by Mahmoud Hanafi [ 11/Aug/14 ] |
|
'read-inodes' is a tasks reads the inodes and runs every hour. We have been running that for a very long time. I can turn that tasks off. |
| Comment by Mahmoud Hanafi [ 11/Aug/14 ] |
|
list of recently applied patches
|
| Comment by Mahmoud Hanafi [ 12/Aug/14 ] |
|
It looks this is a quota issue. Please see file uploaded "uploads/ |
| Comment by Lai Siyao [ 12/Aug/14 ] |
|
Mahmoud, I encountered error when downloading s194.dump.gz: Could you re-upload it or just paste related logs here? |
| Comment by Oleg Drokin [ 12/Aug/14 ] |
|
Lai - the log appears to be fine for me. Note it's not a tar file, just gzipped text |
| Comment by Mahmoud Hanafi [ 12/Aug/14 ] |
|
There are 2 quota issues that was causing these hangs. 1. The quota slaves had not connected to the master. We have seen this issue before and can only recover if the whole filesystem is unmounted and remounted. Is there a way to force the slave to retry connecting to the master without unmounting. target name: nbp7-OST0048 pool ID: 0 type: dt quota enabled: none conn to master: not setup yet space acct: ug user uptodate: glb[0],slv[0],reint[1] group uptodate: glb[0],slv[0],reint[1] osd-ldiskfs.nbp7-OST004c.quota_slave.info= 2. If the salve is not connect to the master and enforcent is enable it will delay adjustment. There should be a max tries/timeout for this and return ENQUOTA. 00040000:04000000:24.0:1407806066.559492:0:10442:0:(qsd_handler.c:111:qsd_ready()) $$$ connection to master not ready qsd:nbp7-OST0024 qtype:usr id:9567 enforced:1 granted:717643756 pending:0 waiting:0 req:0 usage:24808 qunit:0 qtune:0 edquot:0 00040000:04000000:24.0:1407806066.559494:0:10442:0:(qsd_handler.c:917:qsd_adjust()) $$$ delaying adjustment since qsd isn't ready qsd:nbp7-OST0024 qtype:usr id:9567 enforced:1 granted:717643756 pending:0 waiting:0 req:0 usage:24808 qunit:0 qtune:0 edquot:0 00040000:04000000:29.0:1407806067.111494:0:10463:0:(qsd_entry.c:219:qsd_refresh_usage()) $$$ disk usage: 11524 qsd:nbp7-OST003c qtype:usr id:9567 enforced:1 granted:671088640 pending:0 waiting:0 req:0 usage:11524 qunit:0 qtune:0 edquot:0 00040000:04000000:29.0:1407806067.111496:0:10463:0:(qsd_handler.c:111:qsd_ready()) $$$ connection to master not ready qsd:nbp7-OST003c qtype:usr id:9567 enforced:1 granted:671088640 pending:0 waiting:0 req:0 usage:11524 qunit:0 qtune:0 edquot:0 00040000:04000000:29.0:1407806067.111498:0:10463:0:(qsd_handler.c:917:qsd_adjust()) $$$ delaying adjustment since qsd isn't ready qsd:nbp7-OST003c qtype:usr id:9567 enforced:1 granted:671088640 pending:0 waiting:0 req:0 usage:11524 qunit:0 qtune:0 edquot:0 00040000:04000000:28.0:1407806067.879498:0:10663:0:(qsd_entry.c:219:qsd_refresh_usage()) $$$ disk usage: 30092 qsd:nbp7-OST0034 qtype:usr id:9567 enforced:1 granted:711605332 pending:0 waiting:0 req:0 usage:30092 qunit:0 qtune:0 edquot:0 00040000:04000000:28.0:1407806067.879500:0:10663:0:(qsd_handler.c:111:qsd_ready()) $$$ connection to master not ready qsd:nbp7-OST0034 qtype:usr id:9567 enforced:1 granted:711605332 pending:0 waiting:0 req:0 usage:30092 qunit:0 qtune:0 edquot:0 00040000:04000000:28.0:1407806067.879502:0:10663:0:(qsd_handler.c:917:qsd_adjust()) $$$ delaying adjustment since qsd isn't ready qsd:nbp7-OST0034 qtype:usr id:9567 enforced:1 granted:711605332 pending:0 waiting:0 req:0 usage:30092 qunit:0 qtune:0 edquot:0 Once I disabled enforcement all hung tasks freed up. |
| Comment by Peter Jones [ 12/Aug/14 ] |
|
Niu Could you please comment on these quotas errors? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 13/Aug/14 ] |
I can't download it neither, probably we can't download it as anonymous now? but I don't know which user should be used for the ftp.whamcloud.com.
The pinger will always retry the connecting on all imports, so I don't think we need any way to force that. The problem is that we need to figure out why it can't connect to master without umount? I'll check the log to see if there is anything useful once I get the log.
I don't think it's correct to return EDQUOT in such case. The old quota design is to retry the dqacq on OST infinitely, however, that'll hold a OST service thread until the connection to master established. Our new quota design is to retry the write from client side infinitely, |
| Comment by Mahmoud Hanafi [ 13/Aug/14 ] |
|
The wait from client side has the side effect of all commands, ls for example, hanging from the effective directory. So if the user has the effected path in their path logins will hang. At the minimum an error should be logged. |
| Comment by Niu Yawei (Inactive) [ 14/Aug/14 ] |
|
I can't see why slave can't connect to master from the log. Could you explain how did you resolve the problem in detail? (just umount then mount all MDT & OSTs, or writeconf is required?) I'm wondering if it's an upgraded system? (because of |
| Comment by Mahmoud Hanafi [ 14/Aug/14 ] |
|
Yes the system was upgraded. We didn't the writeconf. I checked the MDT0000-mdc in the client long and it does show up as 2.1.5 #15 (224)marker 5 (flags=0x02, v2.1.5.0) nbp7-MDT0000 'add mdc' Sun Jun 16 20:10:35 2013-
To do a writeconf is not ideal, because it requires taking the filsystem offline. |
| Comment by Niu Yawei (Inactive) [ 15/Aug/14 ] |
Then what I can think of is:
But I strongly suggest you use writeconf on next maintaining down time instead of using such dangerous method. |
| Comment by Mahmoud Hanafi [ 10/Mar/16 ] |
|
Please close. no longer an issue. |