Unstructured Datasets meet Foundation Models.

Meerkat is an open-source Python library, designed to help technical teams interactively wrangle images, videos, text documents and more with foundation models.

Our goal is to make foundation models a more reliable software abstraction for processing unstructured datasets. Read our blogpost to learn more.

Install Meerkat

$ pip install meerkat-ml
Notice
Meerkat is a research project, so users should expect rapid updates and rough edges. The current API is subject to change.
Data Frames for Images
A Meerkat DataFrame is a heterogeneous data structure with an API backed by foundation models.
  • Structured fields (e.g. numbers and dates) live alongside unstructured objects (e.g. images), and their tensor representations (e.g. embeddings).
  • Functions like mk.embed abstract away boiler-plate ML code, keeping the focus on the data.
import meerkat as mk 

df = mk.from_csv("paintings.csv")
df["img"] = mk.files("img_path")
df["embedding"] = mk.embed(
	df["img"], 
	engine="clip"
)
Interactivity in Python
Interactive data frame visualizations that allow you to control foundation models as they process your data.
  • Meerkat visualizations are implemented in Python, so they can be composed and customized in notebooks or data scripts.
  • Labeling is critical for instructing and validating foundation models. Labeling GUIs are a priority in Meerkat.
match = mk.gui.Match(df, 
	against="embedding", 
	engine="clip"
)
sorted_df = mk.sort(df, 
	by=match.criterion.name, 
	ascending=False
)
gallery = mk.gui.Gallery(sorted_df)
mk.gui.html.div([match, gallery])

Built for technical teams

๐Ÿงช๏ธ Data Science Teams

Data frames, visualizations and interactive data analysis over unstructured data in Jupyter Notebooks with pure Python.

๐Ÿ‘จโ€๐Ÿ’ป๏ธ Software Engineering Teams

Fully custom applications in SvelteKit that seamlessly connect to unstructured data and model APIs in Python.

๐Ÿค–๏ธ Machine Learning Teams

Graphical user interfaces to prompt and control foundation models, collect feedback and iterate, all with Python scripting.

...with the support of

Stanford CRFM Logo
Stanford HAI Logo
Svelte Logo
Tailwind Logo
Flowbite Logo
Pydantic Logo
FastAPI Logo
Pandas Logo
Numpy Logo
Pytorch Logo
Apache Arrow Logo
Huggingface Logo