Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2200

Test failure on test suite conf-sanity, subtest test_32a

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.1, Lustre 2.5.0
    • Lustre 2.4.0, Lustre 2.4.1
    • None
    • 3
    • 5244

    Description

      This issue was created by maloo for Oleg Drokin <green@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/1e253cd6-17c2-11e2-a41f-52540035b04c.

      The sub-test test_32a failed with the following error:

      CMD: client-20-ib mount -t lustre -o loop,exclude=t32fs-OST0000 /tmp/t32/mdt /tmp/t32/mnt/mdt
      client-20-ib: mount.lustre: mount /dev/loop0 at /tmp/t32/mnt/mdt failed: No such file or directory
      client-20-ib: Is the MGS specification correct?
      client-20-ib: Is the filesystem name correct?
      client-20-ib: If upgrading, is the copied client log valid? (see upgrade docs)
      conf-sanity test_32a: @@@@@@ FAIL: Mounting the MDT
      test_32a failed with 1

      Info required for matching: conf-sanity 32a

      Attachments

        Issue Links

          Activity

            [LU-2200] Test failure on test suite conf-sanity, subtest test_32a
            yujian Jian Yu added a comment -

            Patch http://review.whamcloud.com/6197 was cherry-picked to Lustre b2_4 branch.

            yujian Jian Yu added a comment - Patch http://review.whamcloud.com/6197 was cherry-picked to Lustre b2_4 branch.

            reproduced again: https://maloo.whamcloud.com/test_sets/2b3a0a00-fcc8-11e2-9fdb-52540035b04c

            from the debug log of client 2: https://maloo.whamcloud.com/test_logs/b0e58a3a-fcc8-11e2-9fdb-52540035b04c/show_text

            10000000:01000000:1.0:1375567124.192540:0:27905:0:(mgc_request.c:1820:mgc_process_log()) MGC192.168.4.20@o2ib: configuration from log 'lustre-sptlrpc' failed (-2).
            

            is it the same problem?

            jay Jinshan Xiong (Inactive) added a comment - reproduced again: https://maloo.whamcloud.com/test_sets/2b3a0a00-fcc8-11e2-9fdb-52540035b04c from the debug log of client 2: https://maloo.whamcloud.com/test_logs/b0e58a3a-fcc8-11e2-9fdb-52540035b04c/show_text 10000000:01000000:1.0:1375567124.192540:0:27905:0:(mgc_request.c:1820:mgc_process_log()) MGC192.168.4.20@o2ib: configuration from log 'lustre-sptlrpc' failed (-2). is it the same problem?
            utopiabound Nathaniel Clark added a comment - - edited

            Patch merged to master (post 2.4.51)

            utopiabound Nathaniel Clark added a comment - - edited Patch merged to master (post 2.4.51)

            Because of LU-3357, using replace_nids causes the first lctl conf_param ($fsname-OST0000.osc.max_dirty_mb=15) to fail.

            utopiabound Nathaniel Clark added a comment - Because of LU-3357 , using replace_nids causes the first lctl conf_param ($fsname-OST0000.osc.max_dirty_mb=15) to fail.

            Testing lctl replace_nids now. Skipping non-writeconf tests will does allow 32b to be run successfully on IB nodes (See patch set 10)

            utopiabound Nathaniel Clark added a comment - Testing lctl replace_nids now. Skipping non-writeconf tests will does allow 32b to be run successfully on IB nodes (See patch set 10)

            What about using "lctl replace_nids" to fix up the configuration for IB testing?

            adilger Andreas Dilger added a comment - What about using "lctl replace_nids" to fix up the configuration for IB testing?

            Because the tcp based nids are in the config logs, a writeconf is needed before the existing filesystem tar balls can be mounted on IB nodes. I hope that any conf-sanity/32 test that does writeconf will be able to work on IB, but the non-writeconf test (32a), I believe, has no chance of passing without major revisions to the lustre stack.

            utopiabound Nathaniel Clark added a comment - Because the tcp based nids are in the config logs, a writeconf is needed before the existing filesystem tar balls can be mounted on IB nodes. I hope that any conf-sanity/32 test that does writeconf will be able to work on IB, but the non-writeconf test (32a), I believe, has no chance of passing without major revisions to the lustre stack.

            I opend a possible related LU. LU-3347 (local_storage.c:872:local_oid_storage_init()) ASSERTION( (*los)->los_last_oid >= first_oid ) failed: 0 < 1

            It is a timeout error for test_32a.

            keith Keith Mannthey (Inactive) added a comment - I opend a possible related LU. LU-3347 (local_storage.c:872:local_oid_storage_init()) ASSERTION( (*los)->los_last_oid >= first_oid ) failed: 0 < 1 It is a timeout error for test_32a.

            This seems to be the order that llog_process_thread sends LCFG commands through class_config_llog_handler().

            1) LCFG_MARKER (10)	- find if excluded
            2) LCFG_ADD_UUID (5)	
            3) LCFG_ATTACH (1)		
            4) LCFG_SETUP (3)	- ERROR
            5) LCFG_LOV_ADD_OBD (d)	- Only interation with EXCLUDED flag: Change to LCFG_LOV_ADD_INA (13)
            

            Step 5 would be where the OST is excluded by adding it as inactive, but during step 4 OBD tries to setup a connection and that fails

            The call chain for LCFG_SETUP is:

            class_setup
             obd_setup
              osp_device_alloc (as osp::ldto_device_alloc)
               osp_init0
                client_obd_setup
                 client_import_add_conn
                  import_set_conn
                   ptlrpc_uuid_to_connection
                    ptrlrpc_uuid_to_peer -- Fails to find peer
            
            utopiabound Nathaniel Clark added a comment - This seems to be the order that llog_process_thread sends LCFG commands through class_config_llog_handler(). 1) LCFG_MARKER (10) - find if excluded 2) LCFG_ADD_UUID (5) 3) LCFG_ATTACH (1) 4) LCFG_SETUP (3) - ERROR 5) LCFG_LOV_ADD_OBD (d) - Only interation with EXCLUDED flag: Change to LCFG_LOV_ADD_INA (13) Step 5 would be where the OST is excluded by adding it as inactive, but during step 4 OBD tries to setup a connection and that fails The call chain for LCFG_SETUP is: class_setup obd_setup osp_device_alloc (as osp::ldto_device_alloc) osp_init0 client_obd_setup client_import_add_conn import_set_conn ptlrpc_uuid_to_connection ptrlrpc_uuid_to_peer -- Fails to find peer

            Adding an OST to the exlude list seems to only affect the LCFG_LOV_ADD_OBD command (turning it into LCFG_LOV_ADD_INA), the problem (from my rough reading) seems to stem from the fact that the command in question is LCFG_ADD_UUID, and the NID in question is 192.168.203.129@tcp which fails on IB.

            utopiabound Nathaniel Clark added a comment - Adding an OST to the exlude list seems to only affect the LCFG_LOV_ADD_OBD command (turning it into LCFG_LOV_ADD_INA), the problem (from my rough reading) seems to stem from the fact that the command in question is LCFG_ADD_UUID, and the NID in question is 192.168.203.129@tcp which fails on IB.

            People

              utopiabound Nathaniel Clark
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: