[LU-3996] LustreError: 8136:0:(llog_osd.c:241:llog_osd_read_header()) MGS-osd: error reading log header from [0xa:0xa:0x0]: rc = -14 Created: 23/Sep/13 Updated: 16/Oct/13 Resolved: 16/Oct/13 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Cliff White (Inactive) | Assignee: | Mikhail Pershin |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 10693 | ||||||||
| Description |
|
Running parallel-scale on Hyperion, initialization for iOR test. Formatted with ZFS. LustreError: 8136:0:(llog_osd.c:241:llog_osd_read_header()) MGS-osd: error reading log header from [0xa:0xa:0x0]: rc = -14 2013-09-23 11:32:12 LustreError: 8136:0:(mgs_llog.c:1386:record_start_log()) MGS: can't start log lustre-params: rc = -14 2013-09-23 11:32:12 BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8 Console log attached. |
| Comments |
| Comment by Oleg Drokin [ 24/Sep/13 ] |
|
There's not enough information to see why llog init fails with EFAULT, but the crash reason is obvious, we fail to test for the llog opening status and try to close not opened llog as the result. Patch for the crash is in http://review.whamcloud.com/7742 |
| Comment by Jodi Levi (Inactive) [ 25/Sep/13 ] |
|
Cliff to provide the additional debug logs. |
| Comment by Cliff White (Inactive) [ 26/Sep/13 ] |
|
Reproduced crash on 2.4.93 with panic_on_oops=0 Console log and lctl dk attached. |
| Comment by Oleg Drokin [ 27/Sep/13 ] |
|
Hm, the lctldk output is too late afte the crash, it starts at 11:11am and end on 11:14am on 26th, and the oops was at 11:10 |
| Comment by Cliff White (Inactive) [ 27/Sep/13 ] |
|
dumplog.115529 contains the error. I have included the dumps from 30 seconds before and 30 seconds after. |
| Comment by Oleg Drokin [ 27/Sep/13 ] |
|
Thanks! Ok, so the issue is we are trying to do a 8k read and can read only smaller amount of bytes (not enough logging to see how many) from the referenced llog file. Right now it sounds like the underlying mgs filesystem llog is damaged, we should mount it directly and check what's up with the llog file for [0xa:0xa:0x0] llog, it's probably way too short? |
| Comment by Cliff White (Inactive) [ 30/Sep/13 ] |
|
Each time this has failed, it has been on a freshly formatted filesystem. I can replicate and look at the log if that is necessary |
| Comment by Oleg Drokin [ 30/Sep/13 ] |
|
There's a fair chance the llog is created in a bad way from format, if this is really the case, there's no log for this process. |
| Comment by Cliff White (Inactive) [ 30/Sep/13 ] |
|
all logs from CONFIGS directory on MGS after crash, dumped to text with llog_reader. Same error as before: 2013-09-30 14:34:04 LustreError: 8465:0:(llog_osd.c:241:llog_osd_read_header()) MGS-osd: error reading log header from [0xa:0xa:0x0]: rc = -14 2013-09-30 14:34:04 LustreError: 8465:0:(mgs_llog.c:1386:record_start_log()) MGS: can't start log lustre-params: rc = -14 2013-09-30 14:34:04 BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8 2013-09-30 14:34:04 IP: [<ffffffffa07ffe99>] llog_handle_put+0x9/0x70 [obdclass] |
| Comment by Cliff White (Inactive) [ 30/Sep/13 ] |
|
debub_mb=1024, log dumped every 30 seconds for duration of test. |
| Comment by Cliff White (Inactive) [ 02/Oct/13 ] |
|
Found an easy way to reproduce this:
|
| Comment by Mikhail Pershin [ 04/Oct/13 ] |
|
Cliff, can you reproduce that issue with commit http://git.whamcloud.com/?p=fs/lustre-release.git;a=commit;h=a217228ce3e1c93fdfeb1d1aa6ff48b3f82abf83 ? |
| Comment by Cliff White (Inactive) [ 04/Oct/13 ] |
|
No, I cannot - ran the one-line test and it does not crash. Will run IOR shortly |
| Comment by Cliff White (Inactive) [ 07/Oct/13 ] |
|
Ran IOR without any crashes. Latest build fixes |
| Comment by Jodi Levi (Inactive) [ 16/Oct/13 ] |
|
Removed fixversion as this is a duplicate. |