[LU-5567] /proc import file has binary data Created: 29/Aug/14 Updated: 26/Mar/18 Resolved: 26/Mar/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.2, Lustre 2.4.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Robert Read (Inactive) | Assignee: | Jian Yu |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 15524 |
| Description |
|
Lustre 2.5.2 Some tools have problems reading the import proc file in the mdc and osc directories. For example, "less" detects binary data in this file, and I've noticed a couple different YAML parsers have failed to parse the file. It turns out there is a sequence of NULs in the middle of the file which might be causing the issues: $ hexdump -C /proc/fs/lustre/osc/lustre-OST0001-osc-ffff88001b4abc00/import ... 000003d0 68 74 3a 20 30 0a 20 20 20 20 20 20 20 75 6e 72 |ht: 0. unr| 000003e0 65 67 69 73 74 65 72 69 6e 67 3a 20 30 0a 20 20 |egistering: 0. | 000003f0 20 20 20 20 20 74 69 6d 65 6f 75 74 73 3a 20 30 | timeouts: 0| 00000400 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000004b0 00 00 00 00 00 00 00 00 00 00 00 0a 20 20 20 20 |............ | 000004c0 20 20 20 61 76 67 5f 77 61 69 74 74 69 6d 65 3a | avg_waittime:| 000004d0 20 34 36 35 20 75 73 65 63 0a 20 20 20 20 73 65 | 465 usec. se| 000004e0 72 76 69 63 65 5f 65 73 74 69 6d 61 74 65 73 3a |rvice_estimates:| ... |
| Comments |
| Comment by Andreas Dilger [ 05/Sep/14 ] |
|
Testing this on my local 2.4.3 I see there is a problem printing out the fractional read MB_per_sec, though it doesn't affect the write MB_per_sec, so I suspect it may be related to the actual value being printed: 0001760 sp M B _ p e r _ s e c : sp 1 2 . 0002000 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul * 0002140 nul nul nul nul nul nul nul nul nul nul nul nul nul 4 9 nl 0002160 sp sp sp sp w r i t e _ d a t a _ a 0002200 v e r a g e s : nl sp sp sp sp sp sp sp 0002220 b y t e s _ p e r _ r p c : sp 6 0002240 0 1 3 8 5 nl sp sp sp sp sp sp sp u s e 0002260 c _ p e r _ r p c : sp 4 5 1 0 3 0002300 nl sp sp sp sp sp sp sp M B _ p e r _ s 0002320 e c : sp 1 3 . 3 3 nl This is being printed from lprocfs_rd_import() in 2.4 by: i += snprintf(page + i, count - i,
" MB_per_sec: %u.%.02u\n",
k / j, (100 * k / j) % 100);
and in master it is using seq_printf() with the same format. It may be that this bug is in the kernel code? |
| Comment by Robert Read (Inactive) [ 05/Sep/14 ] |
|
I also see this in the mdc/import which doesn't include the write averages: 000003d0 72 69 6e 67 3a 20 30 0a 20 20 20 20 20 20 20 74 |ring: 0. t| 000003e0 69 6d 65 6f 75 74 73 3a 20 30 0a 20 20 20 20 20 |imeouts: 0. | 000003f0 20 20 61 76 67 5f 77 61 69 74 74 69 6d 65 3a 20 | avg_waittime: | 00000400 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000410 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000430 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000440 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000450 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000460 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000470 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000480 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000490 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000004a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000004b0 00 00 00 00 00 00 00 35 33 30 20 75 73 65 63 0a |.......530 usec.| 000004c0 20 20 20 20 73 65 72 76 69 63 65 5f 65 73 74 69 | service_esti| 000004d0 6d 61 74 65 73 3a 0a 20 20 20 20 20 20 20 73 65 |mates:. se| 000004e0 72 76 69 63 65 73 3a 20 31 20 73 65 63 0a 20 20 |rvices: 1 sec. | |
| Comment by Robert Read (Inactive) [ 05/Sep/14 ] |
|
Oh, all of those blocks of NUL all start on a 1k boundary (0x400 in my case and 02000 in your case). Perhaps the problem is with the seq_file handling, and not how we format the data. |
| Comment by Andreas Dilger [ 05/Sep/14 ] |
|
Yes, I found the same thing. Reading the data with "dd bs=80 if=/proc/.../import | less" shows this very clearly. The problem is no longer present in master, likely because the code uses seq_printf() directly. There were a few patches involved to get this correct:
I don't know if we want to backport all of this to b2_4 and b2_5, or just fix the actual bug in lprocfs_rd_import()? The patches do improve the "imports" contents, so they may still be worthwhile. |
| Comment by Robert Read (Inactive) [ 05/Sep/14 ] |
|
Yes, I see strange things with dd indeed. It looks like the last two patches have already been included in b2_5 with commit 87cb49b54b1a, so we're only missing the change to use seq_file, which is the important part. |
| Comment by Peter Jones [ 12/Sep/14 ] |
|
Yu, Jian Could you please look into this issue? Thanks Peter |
| Comment by Jian Yu [ 15/Sep/14 ] |
|
While back-porting patch http://review.whamcloud.com/7135 (commit 3270bfc2) to Lustre b2_5 branch, I hit conflicts with the follow files: lustre/obdclass/lprocfs_status.c lustre/osd-ldiskfs/osd_handler.c Trying to resolve the conflicts. |
| Comment by Jian Yu [ 17/Sep/14 ] |
|
Here are the back-ported patches for Lustre b2_5 branch:
|
| Comment by Jian Yu [ 26/Mar/18 ] |
|
Close old ticket on obsolete branch. |