[LU-5567] /proc import file has binary data Created: 29/Aug/14  Updated: 26/Mar/18  Resolved: 26/Mar/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.2, Lustre 2.4.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Robert Read (Inactive) Assignee: Jian Yu
Resolution: Incomplete Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 15524

 Description   

Lustre 2.5.2

Some tools have problems reading the import proc file in the mdc and osc directories. For example, "less" detects binary data in this file, and I've noticed a couple different YAML parsers have failed to parse the file. It turns out there is a sequence of NULs in the middle of the file which might be causing the issues:

$ hexdump -C /proc/fs/lustre/osc/lustre-OST0001-osc-ffff88001b4abc00/import
...
000003d0  68 74 3a 20 30 0a 20 20  20 20 20 20 20 75 6e 72  |ht: 0.       unr|
000003e0  65 67 69 73 74 65 72 69  6e 67 3a 20 30 0a 20 20  |egistering: 0.  |
000003f0  20 20 20 20 20 74 69 6d  65 6f 75 74 73 3a 20 30  |     timeouts: 0|
00000400  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000004b0  00 00 00 00 00 00 00 00  00 00 00 0a 20 20 20 20  |............    |
000004c0  20 20 20 61 76 67 5f 77  61 69 74 74 69 6d 65 3a  |   avg_waittime:|
000004d0  20 34 36 35 20 75 73 65  63 0a 20 20 20 20 73 65  | 465 usec.    se|
000004e0  72 76 69 63 65 5f 65 73  74 69 6d 61 74 65 73 3a  |rvice_estimates:|
...


 Comments   
Comment by Andreas Dilger [ 05/Sep/14 ]

Testing this on my local 2.4.3 I see there is a problem printing out the fractional read MB_per_sec, though it doesn't affect the write MB_per_sec, so I suspect it may be related to the actual value being printed:

0001760  sp   M   B   _   p   e   r   _   s   e   c   :  sp   1   2   .
0002000 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
*
0002140 nul nul nul nul nul nul nul nul nul nul nul nul nul   4   9  nl
0002160  sp  sp  sp  sp   w   r   i   t   e   _   d   a   t   a   _   a
0002200   v   e   r   a   g   e   s   :  nl  sp  sp  sp  sp  sp  sp  sp
0002220   b   y   t   e   s   _   p   e   r   _   r   p   c   :  sp   6
0002240   0   1   3   8   5  nl  sp  sp  sp  sp  sp  sp  sp   u   s   e
0002260   c   _   p   e   r   _   r   p   c   :  sp   4   5   1   0   3
0002300  nl  sp  sp  sp  sp  sp  sp  sp   M   B   _   p   e   r   _   s
0002320   e   c   :  sp   1   3   .   3   3  nl

This is being printed from lprocfs_rd_import() in 2.4 by:

                                i += snprintf(page + i, count - i,
                                              "       MB_per_sec: %u.%.02u\n",
                                              k / j, (100 * k / j) % 100);

and in master it is using seq_printf() with the same format. It may be that this bug is in the kernel code?

Comment by Robert Read (Inactive) [ 05/Sep/14 ]

I also see this in the mdc/import which doesn't include the write averages:

000003d0  72 69 6e 67 3a 20 30 0a  20 20 20 20 20 20 20 74  |ring: 0.       t|
000003e0  69 6d 65 6f 75 74 73 3a  20 30 0a 20 20 20 20 20  |imeouts: 0.     |
000003f0  20 20 61 76 67 5f 77 61  69 74 74 69 6d 65 3a 20  |  avg_waittime: |
00000400  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000410  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000420  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000430  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000440  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000450  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000460  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000470  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000480  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000490  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000004a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000004b0  00 00 00 00 00 00 00 35  33 30 20 75 73 65 63 0a  |.......530 usec.|
000004c0  20 20 20 20 73 65 72 76  69 63 65 5f 65 73 74 69  |    service_esti|
000004d0  6d 61 74 65 73 3a 0a 20  20 20 20 20 20 20 73 65  |mates:.       se|
000004e0  72 76 69 63 65 73 3a 20  31 20 73 65 63 0a 20 20  |rvices: 1 sec.  |

Comment by Robert Read (Inactive) [ 05/Sep/14 ]

Oh, all of those blocks of NUL all start on a 1k boundary (0x400 in my case and 02000 in your case). Perhaps the problem is with the seq_file handling, and not how we format the data.

Comment by Andreas Dilger [ 05/Sep/14 ]

Yes, I found the same thing. Reading the data with "dd bs=80 if=/proc/.../import | less" shows this very clearly. The problem is no longer present in master, likely because the code uses seq_printf() directly. There were a few patches involved to get this correct:

I don't know if we want to backport all of this to b2_4 and b2_5, or just fix the actual bug in lprocfs_rd_import()? The patches do improve the "imports" contents, so they may still be worthwhile.

Comment by Robert Read (Inactive) [ 05/Sep/14 ]

Yes, I see strange things with dd indeed.

It looks like the last two patches have already been included in b2_5 with commit 87cb49b54b1a, so we're only missing the change to use seq_file, which is the important part.

Comment by Peter Jones [ 12/Sep/14 ]

Yu, Jian

Could you please look into this issue?

Thanks

Peter

Comment by Jian Yu [ 15/Sep/14 ]

While back-porting patch http://review.whamcloud.com/7135 (commit 3270bfc2) to Lustre b2_5 branch, I hit conflicts with the follow files:

lustre/obdclass/lprocfs_status.c
lustre/osd-ldiskfs/osd_handler.c

Trying to resolve the conflicts.

Comment by Jian Yu [ 17/Sep/14 ]

Here are the back-ported patches for Lustre b2_5 branch:

Comment by Jian Yu [ 26/Mar/18 ]

Close old ticket on obsolete branch.

Generated at Sat Feb 10 01:52:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.