meerkat.merge

merge(left: meerkat.dataframe.DataFrame, right: meerkat.dataframe.DataFrame, how: str = 'inner', on: Union[str, List[str]] = None, left_on: Union[str, List[str]] = None, right_on: Union[str, List[str]] = None, sort: bool = False, suffixes: Sequence[str] = ('_x', '_y'), validate=None) meerkat.dataframe.DataFrame[source]

Perform a database-style join operation between two DataFrames.

Parameters
  • left (DataFrame) – Left DataFrame.

  • right (DataFrame) – Right DataFrame.

  • how (str, optional) – The join type. Defaults to “inner”.

  • on (Union[str, List[str]], optional) – The columns(s) to join on. These columns must be ScalarColumn. Defaults to None, in which case the left_on and right_on parameters must be passed.

  • left_on (Union[str, List[str]], optional) – The column(s) in the left DataFrame to join on. These columns must be ScalarColumn. Defaults to None.

  • right_on (Union[str, List[str]], optional) – The column(s) in the right DataFrame to join on. These columns must be ScalarColumn. Defaults to None.

  • sort (bool, optional) – Whether to sort the result DataFrame by the join key(s). Defaults to False.

  • suffixes (Sequence[str], optional) – Suffixes to use in the case their are conflicting column names in the result DataFrame. Should be a sequence of length two, with suffixes[0] the suffix for the column from the left DataFrame and suffixes[1] the suffix for the right. Defaults to (“_x”, “_y”).

  • validate (_type_, optional) –

    The check to perform on the result DataFrame. Defaults to None, in which case no check is performed. Valid options are:

    • “one_to_one” or “1:1”: check if merge keys are unique in both left and right datasets.

    • “one_to_many” or “1:m”: check if merge keys are unique in left dataset.

    • “many_to_one” or “m:1”: check if merge keys are unique in right dataset.

    • “many_to_many” or “m:m”: allowed, but does not result in checks.

Returns

The merged DataFrame.

Return type

DataFrame