[LU-3850] mds-survey on a secondary MDT crashes Created: 28/Aug/13  Updated: 31/Dec/13  Resolved: 16/Nov/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1

Type: Bug Priority: Major
Reporter: Gregoire Pichon Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9970

 Description   

When running mds-survey on multiple MDTs, the test_mkdir operation on a secondary MDT, ie. not MDT0000, crashes the system with the following LBUG.

LustreError: 6743:0:(lod_dev.c:69:lod_fld_lookup()) ASSERTION( fid_is_sane(fid) ) failed: Invalid FID [0x0:0x0:0x0]
LustreError: 6743:0:(lod_dev.c:69:lod_fld_lookup()) LBUG

crash> bt
PID: 6743   TASK: ffff8804b642a040  CPU: 1   COMMAND: "lctl"
 #0 [ffff880494b83650] machine_kexec at ffffffff8102c48b
 #1 [ffff880494b836b0] crash_kexec at ffffffff810abae2
 #2 [ffff880494b83780] panic at ffffffff81499d3d
 #3 [ffff880494b83800] lbug_with_loc at ffffffffa054deeb [libcfs]
 #4 [ffff880494b83820] lod_fld_lookup at ffffffffa0f01ac5 [lod]
 #5 [ffff880494b83880] lod_object_alloc at ffffffffa0f042e6 [lod]
 #6 [ffff880494b838c0] mdd_object_init at ffffffffa0cba412 [mdd]
 #7 [ffff880494b838f0] lu_object_alloc at ffffffffa067fc4d [obdclass]
 #8 [ffff880494b83950] lu_object_find_at at ffffffffa06807b5 [obdclass]
 #9 [ffff880494b83a10] echo_md_handler at ffffffffa076fc71 [obdecho]
#10 [ffff880494b83af0] echo_client_iocontrol at ffffffffa0775257 [obdecho]
#11 [ffff880494b83d90] class_handle_ioctl at ffffffffa063f4cf [obdclass]
#12 [ffff880494b83e40] obd_class_ioctl at ffffffffa06272ab [obdclass]
#13 [ffff880494b83e60] vfs_ioctl at ffffffff81181372
#14 [ffff880494b83ea0] do_vfs_ioctl at ffffffff81181514
#15 [ffff880494b83f30] sys_ioctl at ffffffff81181a91
#16 [ffff880494b83f80] system_call_fastpath at ffffffff81003072
    RIP: 0000003a3fedf7b7  RSP: 00007fffd6d74ba0  RFLAGS: 00010246
    RAX: 0000000000000010  RBX: ffffffff81003072  RCX: 0000000000000384
    RDX: 00007fffd6d74c20  RSI: 00000000824066dd  RDI: 0000000000000003
    RBP: 0000000000000001   R8: 00007fffd6d76c20   R9: 0000000000000240
    R10: 0000000000000001  R11: 0000000000000246  R12: 00000000824066dd
    R13: 00007fffd6d74c20  R14: 00000000006796c0  R15: 0000000000000003
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

Steps to reproduce the internal commands launched by mds-survey:

[root@mo90 ~]# lctl dl
  0 UP osd-ldiskfs fs2-MDT0001-osd fs2-MDT0001-osd_UUID 11
  1 UP mgc MGC30.1.0.95@o2ib 3a9a8da6-eabc-340d-b7bd-ad6260f44767 5
  2 UP mds MDS MDS_uuid 3
  3 UP lod fs2-MDT0001-mdtlov fs2-MDT0001-mdtlov_UUID 4
  4 UP mdt fs2-MDT0001 fs2-MDT0001_UUID 7
  5 UP mdd fs2-MDD0001 fs2-MDD0001_UUID 4
  6 UP osp fs2-OST0003-osc-MDT0001 fs2-MDT0001-mdtlov_UUID 5
  7 UP osp fs2-OST0002-osc-MDT0001 fs2-MDT0001-mdtlov_UUID 5
  8 UP osp fs2-OST0001-osc-MDT0001 fs2-MDT0001-mdtlov_UUID 5
  9 UP osp fs2-OST0000-osc-MDT0001 fs2-MDT0001-mdtlov_UUID 5
 10 UP osp fs2-MDT0000-osp-MDT0001 fs2-MDT0001-mdtlov_UUID 5
 11 UP lwp fs2-MDT0000-lwp-MDT0001 fs2-MDT0000-lwp-MDT0001_UUID 5
[root@mo90 ~]# modprobe obdecho
[root@mo90 ~]# lctl << EOF
> attach echo_client fs2-MDT0001_ecc fs2-MDT0001_ecc_UUID
> setup fs2-MDT0001 mdd
> EOF
[root@mo90 ~]# lctl --device 12  test_mkdir /tests

Looking at the code, the fid that is null comes from the mdd_device associated to the echo_device.

crash> struct echo_device.ed_next 0xffff8804b70f0200
  ed_next = 0xffff880921903000
crash> print &((struct mdd_device *)0xffff880921903000)->mdd_md_dev.md_lu_dev
$4 = (struct lu_device *) 0xffff880921903000
crash> struct mdd_device 0xffff880921903000
struct mdd_device {
  mdd_md_dev = {
    md_lu_dev = {
      ld_ref = {
        counter = 18
      }, 
      ld_type = 0xffffffffa0cf3500, 
      ld_ops = 0xffffffffa0ce4a00, 
      ld_site = 0xffff8804abf14150, 
      ld_proc_entry = 0x0, 
      ld_obd = 0xffff8809219540b8, 
      ld_reference = {<No data fields>}, 
      ld_linkage = {
        next = 0xffff8804abf14030, 
        prev = 0xffff880921905030
      }
    }, 
    md_ops = 0xffffffffa0ce4a20, 
    md_upcall = {
      mu_upcall_sem = {
        count = 0, 
        wait_lock = {
          raw_lock = {
            slock = 0
          }
        }, 
        wait_list = {
          next = 0x0, 
          prev = 0x0
        }
      }, 
      mu_upcall_dev = 0x0, 
      mu_upcall = 0
    }
  }, 
  mdd_child_exp = 0xffff88092195bc00, 
  mdd_child = 0xffff880921950000, 
  mdd_bottom = 0xffff8804abf14000, 
  mdd_root_fid = {
    f_seq = 0, 
    f_oid = 0, 
    f_ver = 0
  }, 
  mdd_local_root_fid = {
    f_seq = 8589934593, 
    f_oid = 13, 
    f_ver = 0
  }, 
  mdd_dt_conf = {
    ddp_max_name_len = 255, 
    ddp_max_nlink = 65000, 
    ddp_block_shift = 12, 
    ddp_mntopts = 3, 
    ddp_max_ea_size = 4096, 
    ddp_mnt = 0xffff8804abcde3c0, 
    ddp_mount_type = 1, 
    ddp_maxbytes = 17592186040320, 
    ddp_grant_reserved = 2, 
    ddp_inodespace = 28, 
    ddp_grant_frag = 24576
  }, 
  mdd_orphans = 0xffff8804949173f8, 
  mdd_proc_entry = 0xffff880921956d40, 
  mdd_cl = {
    mc_lock = {
      raw_lock = {
        slock = 0
      }
    }, 
    mc_flags = 0, 
    mc_mask = -526337, 
    mc_index = 0, 
    mc_starttime = 4295358216, 
    mc_user_lock = {
      raw_lock = {
        slock = 0
      }
    }, 
    mc_lastuser = 0
  }, 
  mdd_atime_diff = 60, 
  mdd_dot_lustre = 0x0, 
  mdd_dot_lustre_objs = {
    mdd_obf = 0x0
  }, 
  mdd_lfsck = {
    ml_mutex = {
      count = {
        counter = 1
      }, 
      wait_lock = {
        raw_lock = {
          slock = 0
        }
      }, 
      wait_list = {
        next = 0xffff880921903148, 
        prev = 0xffff880921903148
      }, 
      owner = 0x0
    }, 
    ml_lock = {
      raw_lock = {
        slock = 0
      }
    }, 
    ml_list_scan = {
      next = 0xffff880921903168, 
      prev = 0xffff880921903168
    }, 
    ml_list_dir = {
      next = 0xffff880921903178, 
      prev = 0xffff880921903178
    }, 
    ml_list_double_scan = {
      next = 0xffff880921903188, 
      prev = 0xffff880921903188
    }, 
    ml_list_idle = {
      next = 0xffff880494901500, 
      prev = 0xffff880494901500
    }, 
    ml_thread = {
      t_link = {
        next = 0x0, 
        prev = 0x0
      }, 
      t_data = 0x0, 
      t_flags = 0, 
      t_id = 0, 
      t_pid = 0, 
      t_watchdog = 0x0, 
      t_svcpt = 0x0, 
      t_ctl_waitq = {
        lock = {
          raw_lock = {
            slock = 0
          }
        }, 
        task_list = {
          next = 0xffff8809219031e8, 
          prev = 0xffff8809219031e8
        }
      }, 
      t_env = 0x0, 
      t_name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
    }, 
    ml_time_last_checkpoint = 0, 
    ml_time_next_checkpoint = 0, 
    ml_bookmark_obj = 0xffff8804949015c0, 
    ml_bookmark_ram = {
      lb_magic = 538119197, 
      lb_version = 2, 
      lb_param = 0, 
      lb_speed_limit = 0, 
      lb_padding = 0, 
      lb_reserved = {0, 0, 0, 0, 0, 0}
    }, 
    ml_bookmark_disk = {
      lb_magic = 538119197, 
      lb_version = 2, 
      lb_param = 0, 
      lb_speed_limit = 0, 
      lb_padding = 0, 
      lb_reserved = {0, 0, 0, 0, 0, 0}
    }, 
    ml_pos_current = {
      lp_oit_cookie = 0, 
      lp_dir_parent = {
        f_seq = 0, 
        f_oid = 0, 
        f_ver = 0
      }, 
      lp_dir_cookie = 0
    }, 
    ml_obj_oit = 0xffff880494901680, 
    ml_obj_dir = 0x0, 
    ml_di_oit = 0x0, 
    ml_di_dir = 0x0, 
    ml_args_oit = 0, 
    ml_args_dir = 0, 
    ml_sleep_rate = 0, 
    ml_sleep_jif = 0, 
    ml_new_scanned = 0, 
    ml_paused = 0, 
    ml_oit_over = 0, 
    ml_drop_dryrun = 0, 
    ml_initialized = 1, 
    ml_current_oit_processed = 0
  }, 
  mdd_sync_permission = 1, 
  mdd_connects = 1, 
  mdd_los = 0xffff880494dccc40
}

Does mds-survey support several MDTs ?



 Comments   
Comment by Peter Jones [ 28/Aug/13 ]

Minh

Could you please advise on this ticket?

Thanks

Peter

Comment by Di Wang [ 29/Aug/13 ]

Thanks for testing this. Since echo_client can only be attached to the local MDT, if you create remote directory, echo client will send the request to the MDT where the directory is located, which is different with the normal remote create(it sends to req to the parent MDT). This is why you met these problem.

Here is the patch to fix these problem. Please try. http://review.whamcloud.com/7502 Thanks.

Comment by Minh Diep [ 29/Aug/13 ]

Thanks WangDi for looking at this so quick.

Comment by Gregoire Pichon [ 30/Aug/13 ]

Hi,

Thanks for the patch.
It works better except that the namespace of secondary MDT seems to overlap the namespace of MDT0.

When creating a directory on the secondary MDT, I have the following LustreError.

# lctl --device 12 test_mkdir /tests9
# dmesg | tail -n1
LustreError: 9810:0:(osp_md_object.c:762:osp_md_index_lookup()) fs2-MDT0000-osp-MDT0001: lookup [0x200000007:0x1:0x0] tests9 failed: rc = -2

When creating a directory on the secondary MDT that exists on MDT0, it fails.

# lctl --device 12 test_mkdir /tests8
error: test_mkdir: File exists

When deleting a directory on the secondary MDT that exists on MDT0, it crashes.

# lctl --device 12 test_rmdir /tests8
<CRASH>
LustreError: 9820:0:(osp_sync.c:201:osp_sync_declare_add()) ASSERTION( ctxt ) failed: 
LustreError: 9820:0:(osp_sync.c:201:osp_sync_declare_add()) LBUG
crash> bt
PID: 9820   TASK: ffff88092be5b7f0  CPU: 4   COMMAND: "lctl"
 #0 [ffff880935ccf5e0] machine_kexec at ffffffff8102c48b
 #1 [ffff880935ccf640] crash_kexec at ffffffff810abae2
 #2 [ffff880935ccf710] panic at ffffffff81499d3d
 #3 [ffff880935ccf790] lbug_with_loc at ffffffffa093eeeb [libcfs]
 #4 [ffff880935ccf7b0] osp_sync_declare_add at ffffffffa13227f9 [osp]
 #5 [ffff880935ccf7f0] osp_md_declare_object_destroy at ffffffffa132c7ab [osp]
 #6 [ffff880935ccf820] lod_declare_object_destroy at ffffffffa12d3697 [lod]
 #7 [ffff880935ccf850] mdd_declare_finish_unlink at ffffffffa107f6d0 [mdd]
 #8 [ffff880935ccf880] mdd_unlink at ffffffffa10853aa [mdd]
 #9 [ffff880935ccf940] echo_md_destroy_internal at ffffffffa1396e1c [obdecho]
#10 [ffff880935ccf9a0] echo_destroy_object at ffffffffa1397a64 [obdecho]
#11 [ffff880935ccfa10] echo_md_handler at ffffffffa13a04d8 [obdecho]
#12 [ffff880935ccfaf0] echo_client_iocontrol at ffffffffa13a6267 [obdecho]
#13 [ffff880935ccfd90] class_handle_ioctl at ffffffffa09fa4cf [obdclass]
#14 [ffff880935ccfe40] obd_class_ioctl at ffffffffa09e22ab [obdclass]
#15 [ffff880935ccfe60] vfs_ioctl at ffffffff81181372
#16 [ffff880935ccfea0] do_vfs_ioctl at ffffffff81181514
#17 [ffff880935ccff30] sys_ioctl at ffffffff81181a91
#18 [ffff880935ccff80] system_call_fastpath at ffffffff81003072
    RIP: 0000003a3fedf7b7  RSP: 00007fff425f5818  RFLAGS: 00010206
    RAX: 0000000000000010  RBX: ffffffff81003072  RCX: 0000000000000018
    RDX: 00007fff425f5900  RSI: 00000000824066dd  RDI: 0000000000000003
    RBP: 0000000000000001   R8: 00007fff425f7900   R9: 0000000000000240
    R10: 00007fff425f55a0  R11: 0000000000000206  R12: 00000000824066dd
    R13: 00007fff425f5900  R14: 00000000006796a0  R15: 0000000000000003
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b
Comment by Di Wang [ 31/Aug/13 ]

oh, the directory you created by lctl test_mkdir will be in the global namespace of all of MDTs, instead of only on secondary MDT. that is why you see "FILE exists" if you try to create the directory on MDT0 with same name. so you have to use different name, even on different MDT.

Though you should not see a LBUG here, I will check.

Btw: why do you want to create a remote dir for md echo client tests? Could you please describe more about what you will do here? Thanks.

Comment by Di Wang [ 31/Aug/13 ]

Just update the patch. I disable unlink directory on the remote MDT. i.e. you can only test_rmdir tests8 on MDT0.

Because echo client on MDT1 can only send the request to MDT1, and all of unlink req should be sent to the MDT where the directory is located, so you can not do "remote unlink" by echo client.

Comment by Gregoire Pichon [ 02/Sep/13 ]

The tests I am doing are driven by the mds-survey script.

The logic of this script is to perform the following actions on every MDT device specified.

  • load obdecho on hosts where MDT devices are located
  • attach and setup an echo_client device for each MDT device
  • create dir_count directories in each echo_client device (lctl test_mkdir)
  • launch metadata operations in parallel on each echo_client_device (lctl test_create, then lctl test_lookup, ..., then lctl test_destroy)
  • remove directories in each echo_client device (lctl test_rmdir)
  • cleanup and detach echo_client devices
  • unload obdecho module

Ideally, the operations done by lctl test_xxx should not be visible to the filesystem namespace as seen by Lustre clients. This is the way obdfilter-survey works and it allows doing performance tests on formated filesystems without impacting existing data. I don't know if this is possible to implement.

At least, operations on every MDT device should generate the same internal operations, without any difference between MDT0 and remote MDTs. Tell me If I am wrong, but a Lustre client is able create subdirectories in a directory hosted on the MDT0, and similarly to create subdirectories in a directory hosted on a secondary MDT without connecting to MDT0. The lctl test_mkdir operation should do the same.

Comment by Di Wang [ 02/Sep/13 ]

yes, test_mkdir is similar as normally mkdir, after you a remote dir on a secondary MDT, then creating any subdirectories under this remote dir would not connecting to MDT0.

But any directories created by lctl test_mkdir are visible to the whole namespace, instead of only local to one MDT. So you have to make sure each echo client will create directories with different name in your script. And also each echo client can only remove dir(lctl test_destory/test_rmdir) on its own MDT.

Comment by Gregoire Pichon [ 04/Sep/13 ]

Di Wang,

Is there a way to create a remote directory or to lookup the dir-striping of a directory from the echo_client interface, similarly to what is done with 'lfs mkdir' from a Lustre client ?

I am going to fix mds-survey to allow support of multiple MDTs and I need to create a remote directory for each MDT target specified. The mds-survey script runs on the Lustre servers and does not require any client to be mounted.

thanks.

Comment by Gregoire Pichon [ 05/Sep/13 ]

Ooops, please ignore my previous questions. It appears that echo_client interface completely ignore the remote directory settings done by 'lfs mkdir'. Actually, whatever the dir-striping of a directory, the objects created by echo_client are located on the MDT it is attached to.

I still have one question: how is it possible that echo_client lookup on a secondary MDT has the visibility of the whole file system namespace, like lookup from MDT0 ? Does it forward the lookup requests to the MDT0 ?

Comment by Di Wang [ 05/Sep/13 ]

yes, each MDT has the visibility of the whole namespace, and each MDT knows the root FID, which is unique to the whole namespace, and the lookup will start from MDT0(where the root is). Yes, they will forward the requests to MDT0 or other MDTs if needed. But the point of mds-survey is testing the local MDT performance, so probably you should avoid unnecessary cross-MDT operation here.

Comment by Gregoire Pichon [ 25/Oct/13 ]

Peter,

Do you think it could be possible to have the patch from Di Wand (http://review.whamcloud.com/7502) reviewed, so it can be integrated in master and also in next 2.5 maintenance release ?

thanks.

Comment by Peter Jones [ 25/Oct/13 ]

Yes. This is already on my list to follow up on.

Comment by Peter Jones [ 16/Nov/13 ]

Landed for 2.6. Will also consider for inclusion in 2.5.1 when work commences on that release

Generated at Sat Feb 10 01:37:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.