meerkat.sample¶

sample(data: Union[meerkat.dataframe.DataFrame, meerkat.columns.abstract.Column], n: int = None, frac: float = None, replace: bool = False, weights: Union[str, numpy.ndarray] = None, random_state: Union[int, numpy.random.mtrand.RandomState] = None) Union[meerkat.dataframe.DataFrame, meerkat.columns.abstract.Column][source]¶

Select a random sample of rows from DataFrame or Column. Roughly equivalent to sample in Pandas https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sample.html.

Parameters
  • data (Union[DataFrame, AbstractColumn]) – DataFrame or Column to sample from.

  • n (int) – Number of samples to draw. If frac is specified, this parameter should not be passed. Defaults to 1 if frac is not passed.

  • frac (float) – Fraction of rows to sample. If n is specified, this parameter should not be passed.

  • replace (bool) – Sample with or without replacement. Defaults to False.

  • weights (Union[str, np.ndarray]) – Weights to use for sampling. If None (default), the rows will be sampled uniformly. If a numpy array, the sample will be weighted accordingly. If a string and data is a DataFrame, the sampled_df will be applied to the rows based on the column with the name specified. If weights do not sum to 1 they will be normalized to sum to 1.

  • random_state (Union[int, np.random.RandomState]) – Random state or seed to use for sampling.

Returns

A random sample of rows from DataFrame or

Column.

Return type

Union[DataFrame, AbstractColumn]