[LU-615] sanity test 180a: awk: (FILENAME=/proc/fs/lustre/osc/lustre-OST0000-osc-ffff88032273d800/import FNR=32) fatal error: internal error Created: 22/Aug/11 Updated: 15/Jul/13 Resolved: 06/Feb/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0, Lustre 2.2.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jian Yu | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Tag: v2_1_0_0_RC0 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 5475 | ||||||||
| Description |
|
sanity test 180a failed as follows: == sanity test 180a: test obdecho on osc == 02:30:13 (1314005413) awk: (FILENAME=/proc/fs/lustre/osc/lustre-OST0000-osc-ffff88032273d800/import FNR=32) fatal error: internal error Maloo report: https://maloo.whamcloud.com/test_sets/cdf213b2-ccac-11e0-8d02-52540025f9af |
| Comments |
| Comment by Jian Yu [ 02/Sep/11 ] |
|
Lustre Build: http://newbuild.whamcloud.com/job/lustre-master/275/ sanity test 180a failed with the same issue: https://maloo.whamcloud.com/test_sets/e4d30bfa-d561-11e0-8d02-52540025f9af |
| Comment by Andreas Dilger [ 13/Sep/11 ] |
|
Probably related to |
| Comment by Mikhail Pershin [ 05/Oct/11 ] |
|
we are seeing that bug very often in Orion over rhel6. I have an idea that awk has troubles parsing proc files. The patch copy proc file to the regular one and use it, there is fix: http://review.whamcloud.com/1467 the Maloo run with it shows test 180a pass, but this is only single run. |
| Comment by Mikhail Pershin [ 18/Oct/11 ] |
|
It seems this is related to our implementation of proc read method there. The amount of data in 'import' is quite big, more than 1024 bytes. Strace shows the following while reading proc file: open("/proc/fs/lustre/osc/lustre-OST0001-osc-d9f1ea00/import", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfdb9348) = -1 ENOTTY (Inappropriate ioctl for device) fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 read(3, "import:\n name: lustre-OST0001"..., 1024) = 1013 1013 bytes is on my local system, on Toro it is more than 1024, e.g. my local address is just 0@lo while on Toro it is full IP address. If we will use just tmp file with output of proc data that works fine because awk gets file size by fstat64, so this solution will work. |
| Comment by Mikhail Pershin [ 19/Oct/11 ] |
|
Redo 'import' proc file via seq_file API: http://review.whamcloud.com/#change,1544 I've made local tests to make sure it is able to read big amount of data, to be sure I've added several line of just garbage so output become bigger than 1024 and run awk with strace: open("/proc/fs/lustre/osc/lustre-OST0001-osc-d1a26e00/import", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfe6bec8) = -1 ENOTTY (Inappropriate ioctl for device) fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 read(3, "import:\n name: lustre-OST0001"..., 1024) = 1024 read(3, "hat is why I put this line to ex"..., 1024) = 341 fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb765f000 write(1, "747\n", 4747 ) = 4 read(3, "", 1024) = 0 close(3) = 0 exit_group(0) = ? the command was: ' /proc/fs/lustre/osc/lustre-OST0001-osc-d1a26e00/import it searched usec_per_rpc string which is at end of file and we can see that awk did two reads - 1024 bytes and 341 bytes, string was found in second part. Previous implementation lprocfs_rd_import() wasn't able to return data from offset at all and just cut data at 1024 causing awk to behave unpredictable |
| Comment by Sarah Liu [ 31/Jan/12 ] |
|
hit the same issue when testing interop between 2.1-wc1 client and 2.1.55 server Lustre: DEBUG MARKER: == sanity test 180a: test obdecho on osc == 19:30:56 (1328067056)
|
| Comment by Peter Jones [ 06/Feb/12 ] |
|
duplicate of |