tribev2: pretrained models for predicting brain responses to videos

Predicting how our brains respond to complex stimuli like videos is a tough problem that blends neuroscience, machine learning, and multimodal data processing. tribev2 from Facebook Research tackles this challenge by providing pretrained models that predict brain activity patterns on a cortical mesh from video inputs — with the added ability to incorporate text and audio. This repo is a solid resource if you’re working at the intersection of AI and brain imaging, or if you want to explore neural response modeling without building everything from scratch.

What tribev2 does and how it works

tribev2 is a Python-based project that exposes pretrained models capable of predicting brain responses to video stimuli. The predictions are mapped onto the fsaverage5 cortical mesh, a widely used standard brain mesh with roughly 20,000 vertices representing cortical surface points. This approach lets you visualize and analyze brain activation patterns spatially.

The repo primarily consists of Jupyter Notebooks and Python code leveraging PyTorch under the hood. It integrates pretrained models hosted on HuggingFace, making model loading straightforward. The key feature is the ability to process multiple modalities: video, text, and audio. Text inputs are automatically converted to speech and transcribed to align word-level timing with the video timeline, enabling precise event extraction.

Under the hood, tribev2 accounts for the hemodynamic lag — the delay in blood flow changes that fMRI scans measure as brain activity — by offsetting predictions by about 5 seconds into the past. This temporal adjustment is crucial to better match the timing between the stimuli and the recorded brain responses.

What makes tribev2 technically interesting

The repo’s strength lies in its domain-specific integration of neuroscience knowledge with machine learning pipelines. It’s not just a black-box video-to-output model; it embeds neuroscientific constraints explicitly, such as the hemodynamic lag compensation and mapping to a standard cortical mesh.

The model’s predictions represent the “average” subject rather than individual brain responses, which is a tradeoff that simplifies the model but limits personalization. This is worth understanding if you’re thinking about applying it to subject-specific neuroimaging data.

Code-wise, the repo is surprisingly clean given the complexity of the domain. It exposes a simple API through the TribeModel class with methods for loading pretrained weights, extracting time-aligned event dataframes from videos (or audio/text), and predicting brain responses.

The integration with HuggingFace makes the model loading and caching seamless, improving developer experience. The use of pandas dataframes for event timing is a practical design choice that fits well with scientific workflows.

One limitation is that the repo focuses on inference and pretrained models; training your own models requires additional dependencies and setup, which are optional and installed separately.

Quick start with tribev2

Getting up and running with tribev2’s inference is straightforward if you have Python and pip set up. Here’s the exact sequence from the repo’s quick start section:

from tribev2 import TribeModel

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

df = model.get_events_dataframe(video_path="path/to/video.mp4")
preds, segments = model.predict(events=df)
print(preds.shape)  # (n_timesteps, n_vertices)

This loads the pretrained model from HuggingFace, extracts event timings from a video file, and predicts the brain responses mapped on the fsaverage5 mesh.

You can also pass text_path or audio_path to get_events_dataframe to include those modalities. The repo handles automatic speech conversion and transcription, which aligns well with the event-based prediction approach.

Installation commands are explicit and separated by use case:

pip install -e .

for inference only, or

pip install -e ".[plotting]"

to add brain visualization support, and

pip install -e ".[training]"

if you want training dependencies like PyTorch Lightning and Weights & Biases.

who tribev2 is for and final thoughts

tribev2 is a niche but valuable tool for researchers and developers working on computational neuroscience, brain imaging, and multimodal neural response modeling. It’s not a general-purpose video or audio analysis library — the models and data structures are designed specifically around fMRI brain response prediction and a standard cortical mesh.

If you want to experiment with brain activity prediction from videos or integrate this kind of modeling into neuroimaging research pipelines, tribev2 gives you a solid, well-documented starting point with pretrained weights and a clean API.

The tradeoff is clear: it predicts average brain responses, not personalized ones, and training your own models involves a more complex setup. Still, the repo’s modular design and HuggingFace integration make it approachable for Python researchers with some machine learning background.

Worth exploring if you want to bridge AI and neuroscience without reinventing the wheel.

→ GitHub Repo: facebookresearch/tribev2 ⭐ 2,383 · Jupyter Notebook

Noureddine RAMDI / tribev2: pretrained models for predicting brain responses to videos

What tribev2 does and how it works

What makes tribev2 technically interesting

Quick start with tribev2

who tribev2 is for and final thoughts