Unstructured Datasets meet Foundation Models.
Meerkat is an open-source Python library, designed to help technical teams interactively wrangle images, videos, text documents and more with foundation models.
Our goal is to make foundation models a more reliable software abstraction for processing unstructured datasets. Read our blogpost to learn more.
$ pip install meerkat-ml
Meerkat is a research project, so users should expect rapid updates and rough edges. The current API is subject to change.
Data Frames for Images
A Meerkat DataFrame is a heterogeneous data structure with an API backed by foundation models.
- Structured fields (e.g. numbers and dates) live alongside unstructured objects (e.g. images), and their tensor representations (e.g. embeddings).
- Functions like mk.embed abstract away boiler-plate ML code, keeping the focus on the data.
import meerkat as mk df = mk.from_csv("paintings.csv") df["img"] = mk.files("img_path") df["embedding"] = mk.embed( df["img"], engine="clip" )
Interactivity in Python
Interactive data frame visualizations that allow you to control foundation models as they process your data.
- Meerkat visualizations are implemented in Python, so they can be composed and customized in notebooks or data scripts.
- Labeling is critical for instructing and validating foundation models. Labeling GUIs are a priority in Meerkat.
match = mk.gui.Match(df, against="embedding", engine="clip" ) sorted_df = mk.sort(df, by=match.criterion.name, ascending=False ) gallery = mk.gui.Gallery(sorted_df) mk.gui.html.div([match, gallery])
Built for technical teams
🧪️ Data Science Teams
Data frames, visualizations and interactive data analysis over unstructured data in Jupyter Notebooks with pure Python.
👨💻️ Software Engineering Teams
Fully custom applications in SvelteKit that seamlessly connect to unstructured data and model APIs in Python.
🤖️ Machine Learning Teams
Graphical user interfaces to prompt and control foundation models, collect feedback and iterate, all with Python scripting.