Unstructured Datasets meet Foundation Models.
Meerkat is an open-source Python library, designed to help technical teams interactively wrangle images, videos, text documents and more with foundation models.
Our goal is to make foundation models a more reliable software abstraction for processing unstructured datasets. Read our blogpost to learn more.
Install Meerkat
$ pip install meerkat-ml
Notice
Meerkat is a research project, so users should expect rapid
updates and rough edges. The current API is subject to
change.
Data Frames for
Images
A Meerkat DataFrame is a heterogeneous data structure with an API backed by
foundation models.
- Structured fields (e.g. numbers and dates) live alongside unstructured objects (e.g. images), and their tensor representations (e.g. embeddings).
- Functions like mk.embed abstract away boiler-plate ML code, keeping the focus on the data.
import meerkat as mk
df = mk.from_csv("paintings.csv")
df["img"] = mk.files("img_path")
df["embedding"] = mk.embed(
df["img"],
engine="clip"
)
Interactivity in Python
Interactive data frame visualizations that allow you
to control foundation models as they process your
data.
- Meerkat visualizations are implemented in Python, so they can be composed and customized in notebooks or data scripts.
- Labeling is critical for instructing and validating foundation models. Labeling GUIs are a priority in Meerkat.
match = mk.gui.Match(df,
against="embedding",
engine="clip"
)
sorted_df = mk.sort(df,
by=match.criterion.name,
ascending=False
)
gallery = mk.gui.Gallery(sorted_df)
mk.gui.html.div([match, gallery])
Built for technical teams
๐งช๏ธ Data Science Teams
Data frames, visualizations and interactive data analysis over unstructured data in Jupyter Notebooks with pure Python.
๐จโ๐ป๏ธ Software Engineering Teams
Fully custom applications in SvelteKit that seamlessly connect to unstructured data and model APIs in Python.
๐ค๏ธ Machine Learning Teams
Graphical user interfaces to prompt and control foundation models, collect feedback and iterate, all with Python scripting.