Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
3
-
9223372036854775807
Description
Writeconf on an MDT with index > 0000 will cause "add mdc" to be added to $FSNAME-client config and "add osp" to be added to $FSNAME-MDTXXXX configs.
However, the configs may already contain these directives. Duplicating the OSP device will cause the assertion failure, unlike duplicating the MDC which will just return -EEXIST.
A possible solution is to check configs for duplicates before writing to them. However, sometimes we would like to change nids which are part of "add mdc" and "add osp".
Another solution is to mark previous entries with SKIP flags. This patch implements this approach. Since after revoking the config lock, the clients and the MDTs will receive the updated log and apply its newer entries, we still have to handle OSP duplication, but this is only an issue immediately after writeconf processing.
[1904009.530445] LDISKFS-fs (md0): mounted filesystem with ordered data mode. quota=on. Opts: [1904010.544738] LustreError: 11-0: snx11117-MDT0000-osp-MDT0001: Communicating with 10.9.100.10@o2ib3, operation mds_connect failed with -114. [1904010.814980] Lustre: snx11117-MDD0001: changelog on [1904019.177269] LustreError: 84835:0:(genops.c:345:class_newdev()) Device snx11117-MDT0002-osp-MDT0001 already exists at 7, won't add [1904019.189880] LustreError: 84835:0:(obd_config.c:368:class_attach()) Cannot create device snx11117-MDT0002-osp-MDT0001 of type osp : -17 [1904019.202925] LustreError: 84835:0:(obd_config.c:1610:class_config_llog_handler()) MGC10.9.100.9@o2ib3: cfg command failed: rc = -17 [1904019.215616] Lustre: cmd=cf001 0:snx11117-MDT0002-osp-MDT0001 1:osp 2:snx11117-MDT0001-mdtlov_UUID [1904019.215617] [1904019.228104] LustreError: 84588:0:(mgc_request.c:517:do_requeue()) failed processing log: -17 [1904036.493105] LustreError: 85373:0:(obd_config.c:464:class_setup()) Device 7 already setup (type osp) [1904036.503095] LustreError: 85373:0:(obd_config.c:1610:class_config_llog_handler()) MGC10.9.100.9@o2ib3: cfg command failed: rc = -17 [1904036.515779] Lustre: cmd=cf003 0:snx11117-MDT0002-osp-MDT0001 1:snx11117-MDT0002_UUID 2:10.9.100.16@o2ib3 [1904036.515780] [1904036.528886] LustreError: 84588:0:(mgc_request.c:517:do_requeue()) failed processing log: -17 [1904044.588103] LustreError: 85579:0:(osp_dev.c:1175:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed: [1904044.599237] LustreError: 85579:0:(osp_dev.c:1175:osp_obd_connect()) LBUG [1904044.606507] Pid: 85579, comm: llog_process_th [1904044.611432] [1904044.611433] Call Trace: [1904044.616506] [<ffffffffa07dd895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [1904044.624041] [<ffffffffa07dde97>] lbug_with_loc+0x47/0xb0 [libcfs] [1904044.630798] [<ffffffffa0919f2c>] osp_obd_connect+0x3bc/0x420 [osp] [1904044.645834] [<ffffffffa06fb7c1>] lod_add_device+0x8d1/0x1e00 [lod] [1904044.652674] [<ffffffffa06f4259>] lod_process_config+0xb89/0x1720 [lod] [1904044.667268] [<ffffffffa099a370>] class_process_config+0x1900/0x1ac0 [obdclass] [1904044.682914] [<ffffffffa099b664>] class_config_llog_handler+0xa34/0x18b0 [obdclass] [1904044.697415] [<ffffffffa095ecf9>] llog_process_thread+0xaa9/0xe80 [obdclass] [1904044.705421] [<ffffffffa095f115>] llog_process_thread_daemonize+0x45/0x70 [obdclass] [1904044.722898] [<ffffffff8109ac66>] kthread+0x96/0xa0 [1904044.728341] [<ffffffff8100c20a>] child_rip+0xa/0x20 [1904044.745110] [1904044.747419] Kernel panic - not syncing: LBUG
Attachments
Issue Links
- is duplicated by
-
LU-10828 OBD devices and exports not cleaned up after llog processing failures
- Closed
- is related to
-
LU-15151 conf-sanity test_119: mds1: ssh: Could not resolve hostname mds1: Name or service not known
- Resolved
- is related to
-
LU-15000 MDS crashes with (osp_dev.c:1404:osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed
- Resolved