[LU-458] silence excess 1.8 error messages Created: 24/Jun/11 Updated: 25/May/12 Resolved: 25/May/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.8, Lustre 1.8.6 |
| Fix Version/s: | Lustre 1.8.8 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Story Points: | 2 |
| Rank (Obsolete): | 9739 |
| Description |
|
Some CERROR(...) or CWARN() messages clutter up the syslog and should be changed to CDEBUG(D_*, ...). Ideally, it should be possible to mount then unmount Lustre under normal usage without getting a screen full of messages. Simply turning off all of the messages is NOT an acceptable solution for all of them. Please try to make changes to both b1_8 and master in a similar manner where possible.
Happens many times for each evicted client, but nothing that can be done about it by administrator. This is already fixed on master to use CDEBUG(D_INFO, ...).
Should be quieted to CDEBUG(D_INODE,) for the -ENOENT case, since this can happen with racing "rm -r" vs. "rm -r" or "ls -l".
The "-28" (-ENOSPC), "-13" (-EACCES), and "-2" (-EPERM) shouldn't print an error on the client console. However, in this case the problem isn't on the client, but rather because the server is returning an RPC with PTL_RPC_MSG_ERR set. The PTL_RPC_MSG_ERR flag should only be used for cases where there is an error in the RPC handling that prevented the server from even executing the RPC, and NOT for the case where the RPC was processed correctly but returned an error (e.g. -EPERM or -EACCESS or -ENOSPC). Fixing these requires looking into the MDS/OSS code and seeing where the server is returning rc != 0 to the handler, or calling ptlrpc_error() (except in the case where it is not possible to pack a reply message). Some of these may already be fixed on master.
The exp_client_uuid should be kept in a static "last_uuid" string, and if the same UUID is evicted by another target on this node it doesn't need to be printed to the console again, only to debug logs. Use CDEBUG_LIMIT(D_CONSOLE | (mask), ...). Just looking through https://maloo.whamcloud.com/test_logs/eb3fd5de-98f0-11e0-9a27-52540025f9af to see what messages are printed on every mount, and what can be removed:
This can just use the "#ifdef CRAY_XT3" version and print "Lustre: Build Version: "BUILD_VERSION"\n", and a separate project that Brian is working on will fix the build version string.
Please ask Liang how important this is. Maybe it shouldn't be printed if it is 0xffff...?
Seems redundant with the message in obdclass.
Why do our test scripts specify a mount option in local.sh that is no longer useful?
It would be good to make these messages consistent with each other, like: Lustre: client lustre-client (fff880410c8f00) mount complete |
| Comments |
| Comment by Peter Jones [ 04/May/12 ] |
| Comment by Peter Jones [ 04/May/12 ] |
|
Yangsheng/Andreas Is this same change needed for b2_1 and master also? Thanks Peter |
| Comment by Andreas Dilger [ 04/May/12 ] |
|
I previously landed a patch which quieted a number of similar messages for 2.2 (2.1.54+) via |
| Comment by Jian Yu [ 15/May/12 ] |
|
While testing Lustre 1.8.8-wc1 RC1, I found the following messages in the dmesg logs of client nodes: Lustre: client ZZZZZZZZZZZZZZ�^�5(ffff88007d18cc00) umount complete ...... Lustre: client ZZZZZZZZZZZZZZ(ffff8800768b5c00) umount complete ...... Lustre: client ZZZZZZZZZZZZZZ(ffff880076ccfc00) umount complete https://maloo.whamcloud.com/test_logs/e1da5dca-9d59-11e1-8587-52540035b04c/show_text This is a regression introduced by change http://review.whamcloud.com/#change,2381. |
| Comment by Jian Yu [ 16/May/12 ] |
|
Patch on b1_8 branch to fix the above issue is in http://review.whamcloud.com/2799. |
| Comment by Jian Yu [ 21/May/12 ] |
Unfortunately, the above patch does not really fix the issue. What's more, it introduces another defect that the client profile is not deleted properly. 00000020:00000001:12:1337579915.876844:0:23445:0:(obd_mount.c:1735:lustre_common_put_super()) Process entered In ll_put_super(), the memory space pointed to by profilenm is in fact freed inside lustre_common_put_super(sb), which is called before LCONSOLE_WARN(). The new patch on b1_8 branch is in http://review.whamcloud.com/2841. |
| Comment by Peter Jones [ 25/May/12 ] |
|
Landed for 1.8.8-wc1 |