Configuring Meerkat

Several aspects of Meerkat’s behavior can be configured by the user. For example, one may wish to change the number of DataFrame rows shown in Jupyter Notebooks.

You can see the current state of the Meerkat configuration with:

In [1]: import meerkat as mk

In [2]: mk.config
Out[2]: MeerkatConfig(display=DisplayConfig(max_rows=10, show_images=True, max_image_height=128, max_image_width=128, show_audio=True), datasets=<meerkat.config.DatasetsConfig object at 0x7fcd8c4b1c10>, system=SystemConfig(use_gpu=True, ssh_identity_file='/home/runner/.meerkat/ssh/id_rsa'), engines=EnginesConfig(openai_api_key=None, openai_organization=None, anthropic_api_key=None))

Configuring with YAML

To make persistent changes to the configuration, edit the YAML file at ~/.meerkat/config.yaml. For example, the YAML file below will change the default directory to which datasets are downloaded and increase the max number of rows displayed in Jupyter Notebooks:

    root_dir: "/path/to/storage"
    max_rows: 20

If you would rather keep the YAML file elsewhere, then you can set the environment variable MEERKAT_CONFIG to point to the file:

export MEERKAT_CONFIG="/path/to/mk/config.yaml"

If you’re using a conda, you can permanently set this variable for your environment:

conda env config vars set MEERKAT_CONFIG="path/to/mk/config.yaml"
conda activate env_name  # need to reactivate the environment

Configuring Programmatically

You can also update the config programmatically, though, unlike the YAML method above, these changes will not persist beyond the lifetime of your program.

mk.config.datasets.root_dir = "/path/to/storage"
mk.config.public_bucket_name = "mk-test"