[LU-5654] MDT 0 failed to set up some OST OSPs when all OSTs were started in parallel for the first time Created: 24/Sep/14 Updated: 04/Jun/15 Resolved: 09/Oct/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Li Wei (Inactive) | Assignee: | Li Wei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 15850 | ||||
| Description |
|
A file system with 2 MDTs and 163 OSTs were formatted and started like this:
After all mount commands succeeded, MDT 0 failed to set up OSPs for some OSTs: Sep 23 07:02:43 lola-8 kernel: LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts: Sep 23 07:02:44 lola-8 kernel: Lustre: ctl-soaked-MDT0000: No data found on store. Initialize space Sep 23 07:02:44 lola-8 kernel: Lustre: soaked-MDT0000: new disk, initializing Sep 23 07:02:44 lola-8 kernel: LustreError: 11-0: soaked-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11. Sep 23 07:02:57 lola-8 kernel: LustreError: 11-0: soaked-MDT0001-osp-MDT0000: Communicating with 192.168.1.109@o2ib10, operation mds_connect failed with -11. Sep 23 07:03:09 lola-8 kernel: Lustre: ctl-soaked-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400):a2:ost Sep 23 07:03:14 lola-8 kernel: LustreError: 6677:0:(osd_io.c:1361:osd_ldiskfs_read()) soaked-MDT0000: can't read 32@512 on ino 222: rc = 0 Sep 23 07:03:14 lola-8 kernel: LustreError: 6677:0:(llog_osd.c:1466:llog_osd_get_cat_list()) soaked-MDT0000-osd: error reading CATALOGS: rc = -14 Sep 23 07:03:14 lola-8 kernel: LustreError: 6677:0:(osp_sync.c:1045:osp_sync_llog_init()) soaked-OST0010-osc-MDT0000: can't get id from catalogs: rc = -14 Sep 23 07:03:14 lola-8 kernel: LustreError: 6677:0:(osp_sync.c:1147:osp_sync_init()) soaked-OST0010-osc-MDT0000: can't initialize llog: rc = -14 Sep 23 07:03:14 lola-8 kernel: LustreError: 6677:0:(obd_config.c:561:class_setup()) setup soaked-OST0010-osc-MDT0000 failed (-14) Sep 23 07:03:14 lola-8 kernel: LustreError: 6677:0:(obd_config.c:1609:class_config_llog_handler()) MGC192.168.1.108@o2ib10: cfg command failed: rc = -14 Sep 23 07:03:14 lola-8 kernel: Lustre: cmd=cf003 0:soaked-OST0010-osc-MDT0000 1:soaked-OST0010_UUID 2:192.168.1.102@o2ib10 Sep 23 07:03:14 lola-8 kernel: Sep 23 07:03:14 lola-8 kernel: LustreError: 6584:0:(mgc_request.c:517:do_requeue()) failed processing log: -14 Sep 23 07:03:17 lola-8 kernel: Lustre: ctl-soaked-MDT0000: super-sequence allocation rc = 0 [0x0000000600000400-0x0000000640000400):9c:ost Sep 23 07:03:17 lola-8 kernel: Lustre: Skipped 15 previous similar messages Sep 23 07:03:22 lola-8 kernel: LustreError: 6699:0:(obd_config.c:776:class_add_conn()) try to add conn on immature client dev Sep 23 07:03:22 lola-8 kernel: LustreError: 6699:0:(obd_class.h:938:obd_connect()) Device 13 not setup Sep 23 07:03:22 lola-8 kernel: LustreError: 6699:0:(lod_lov.c:268:lod_add_device()) soaked-OST0010-osc-MDT0000: cannot connect to next dev soaked-OST0010_UUID (-19) Sep 23 07:03:22 lola-8 kernel: LustreError: 6699:0:(obd_config.c:1609:class_config_llog_handler()) MGC192.168.1.108@o2ib10: cfg command failed: rc = -19 Sep 23 07:03:22 lola-8 kernel: Lustre: cmd=cf00d 0:soaked-MDT0000-mdtlov 1:soaked-OST0010_UUID 2:16 3:1 Sep 23 07:03:22 lola-8 kernel: Sep 23 07:03:22 lola-8 kernel: LustreError: 6584:0:(mgc_request.c:517:do_requeue()) failed processing log: -19 Sep 23 07:03:27 lola-8 kernel: Lustre: ctl-soaked-MDT0000: super-sequence allocation rc = 0 [0x0000002380000400-0x00000023c0000400):3b:ost Sep 23 07:03:27 lola-8 kernel: Lustre: Skipped 117 previous similar messages Sep 23 07:03:52 lola-8 kernel: Lustre: ctl-soaked-MDT0000: super-sequence allocation rc = 0 [0x0000002a40000400-0x0000002a80000400):7a:ost Sep 23 07:03:52 lola-8 kernel: Lustre: Skipped 26 previous similar messages MDT 0/MGT was ldiskfs-based. |
| Comments |
| Comment by Li Wei (Inactive) [ 24/Sep/14 ] |
|
The first problem is that osd_ldiskfs_read() does not correctly handle the case in which a block to be read has not been allocated yet---a hole. |
| Comment by Li Wei (Inactive) [ 24/Sep/14 ] |
|
http://review.whamcloud.com/12035 addresses the osd_ldiskfs_read() problem. |
| Comment by Li Wei (Inactive) [ 24/Sep/14 ] |
|
http://review.whamcloud.com/12037 fixes a leak on osp_init0()'s error path. |
| Comment by Oleg Drokin [ 01/Oct/14 ] |
|
patch 12035 was reverted from master as causing |
| Comment by Li Wei (Inactive) [ 01/Oct/14 ] |
|
http://review.whamcloud.com/12145 is an updated version of 12035, addressing the racy conf-sanity test. |
| Comment by Jodi Levi (Inactive) [ 09/Oct/14 ] |
|
Patch landed to Master. |
| Comment by Li Wei (Inactive) [ 17/Oct/14 ] |
|
http://review.whamcloud.com/12319 (b2_5 port) |