Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.13.0
-
After upgrade from 2.5 version
-
3
-
9223372036854775807
Description
LWP uses fsname-client config for creating devices
the error occurred for #25 record
00000040:00001000:2.0:1569263809.706616:0:96423:0:(llog.c:603:llog_process_thread()) lrh_index: 25 lrh_len: 112 (4304 remains) 00000020:00000001:2.0:1569263809.706616:0:96423:0:(obd_mount_server.c:804:client_lwp_config_process()) Process entered 00000020:00000001:2.0:1569263809.706617:0:96423:0:(obd_mount_server.c:738:lustre_lwp_add_conn()) Process entered 00000020:00000001:2.0:1569263809.706618:0:96423:0:(obd_mount_server.c:700:lustre_find_lwp()) Process entered 00000020:00000010:2.0:1569263809.706618:0:96423:0:(obd_mount_server.c:705:lustre_find_lwp()) kmalloced '*lwpname': 64 at ffff8807c129bd40. 00000020:00000001:2.0:1569263809.706619:0:96423:0:(obd_mount_server.c:346:tgt_name2lwp_name()) Process entered 00000020:00000010:2.0:1569263809.706619:0:96423:0:(obd_mount_server.c:348:tgt_name2lwp_name()) kmalloced 'fsname': 64 at ffff8807c129b780. 00000020:00000001:2.0:1569263809.706620:0:96423:0:(obd_mount_server.c:372:tgt_name2lwp_name()) Process leaving via cleanup (rc=0 : 0 : 0x0) 00000020:00000010:2.0:1569263809.706621:0:96423:0:(obd_mount_server.c:376:tgt_name2lwp_name()) kfreed 'fsname': 64 at ffff8807c129b780. 00000020:00000001:2.0:1569263809.706656:0:96423:0:(obd_mount_server.c:727:lustre_find_lwp()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) 00000020:00020000:2.0:1569263809.706657:0:96423:0:(obd_mount_server.c:742:lustre_lwp_add_conn()) xxx1116-OST0000: can't find lwp device. 00000020:00000001:2.0:1569263809.706660:0:96423:0:(obd_mount_server.c:743:lustre_lwp_add_conn()) Process leaving via out (rc=18446744073709551614 : -2 : 0xfffffffffffffffe) 00000020:00000010:2.0:1569263809.706661:0:96423:0:(obd_mount_server.c:772:lustre_lwp_add_conn()) kfreed 'lwpname': 64 at ffff8807c129bd40. 00000020:00000001:2.0:1569263809.706662:0:96423:0:(obd_mount_server.c:773:lustre_lwp_add_conn()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) 00000020:00000001:2.0:1569263809.706663:0:96423:0:(obd_mount_server.c:902:client_lwp_config_process()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)
Base on config it should be skipped
#19 (224)END marker 7 (flags=0x02, v2.5.1.0) xxx1116-client 'mount opts' Thu Oct 1 15:29:07 2015- #20 (224)SKIP START marker 11 (flags=0x05, v2.5.1.0) xxx1116-MDT0002 'add mdc' Thu Oct 1 15:55:50 2015-Thu Oct 1 16:22:50 2015 #21 (088)SKIP add_uuid nid=10.10.10.8@o2ib(0x500000a956a08) 0: 1:10.10.10.8@o2ib #22 (128)SKIP attach 0:xxx1116-MDT0002-mdc 1:mdc 2:xxx1116-clilmv_UUID #23 (144)SKIP setup 0:xxx1116-MDT0002-mdc 1:xxx1116-MDT0002_UUID 2:10.10.10.8@o2ib #24 (088)SKIP add_uuid nid=10.10.10.7@o2ib(0x500000a956a07) 0: 1:10.10.10.7@o2ib #25 (112)SKIP add_conn 0:xxx1116-MDT0002-mdc 1:10.10.10.7@o2ib #26 (088)SKIP add_uuid nid=10.10.10.7@o2ib(0x500000a956a07) 0: 1:10.10.10.7@o2ib #27 (112)SKIP add_conn 0:xxx1116-MDT0002-mdc 1:10.10.10.7@o2ib #28 (168)SKIP modify_mdc_tgts add 0:xxx1116-clilmv 1:xxx1116-MDT0002_UUID 2:2 3:1 4:xxx1116-MDT0002-mdc_UUID #29 (224)SKIP END marker 11 (flags=0x06, v2.5.1.0) xxx1116-MDT0002 'add mdc' Thu Oct 1 15:55:50 2015-Thu Oct 1 16:22:50 2015 #30 (224)marker 12 (flags=0x01, v2.5.1.0) xxx1116-client 'mount opts' Thu Oct 1 15:55:50 2015-
It looks like client_lwp_config_process () has a bug, and processing add_conn without processing add_uuid before.
For marker it skips the record if SKIP flag is set. For add_uuid it base on flags from marker processing so skips too. But for add_conn it processes the record, tries to find a lwp device and fails. Because a device is added by add_uuid record.
The workaround is to cleanup client config from SKIP records.
lctl clear_conf command and write conf should help also.
Only SKIP for command 'add mdc' breaks LWP config processing.
Landed for 2.14