AI-ML-Cheatsheets offers a modular, offline-ready collection of concise AI/ML reference sheets from foundational math to transformers and large language models.
Fast3R from Meta FAIR processes 1000+ unordered images simultaneously for 3D reconstruction using a ViT-Large backbone and multi-view attention, eliminating iterative matching.
Lynx generates personalized videos from a single image using a frozen Diffusion Transformer with ID and Ref adapters. This modular design balances fidelity and efficiency.
pdf-document-layout-analysis is a Dockerized microservice using Vision Grid Transformer and LightGBM for PDF layout analysis, offering high accuracy or fast processing with OCR, translation, and multi-format export.
Falcon-Perception is a PyTorch engine for multimodal autoregressive Transformers handling detection, segmentation, and OCR with FlexAttention and efficient caching.
Explore the Hands-On Large Language Models repo, a Jupyter notebook-based practical guide from fundamentals to fine-tuning, designed for hands-on LLM learning on free Colab GPUs.
LingBot-Map performs streaming 3D reconstruction from long image sequences at ~20 FPS using a geometric context transformer and paged KV cache attention for efficient memory management.
Google DeepMind’s representations4d bundles three self-supervised video learning approaches using transformers, including a novel object-centric tracking method with latent tokens moving off the pixel grid.
ByteDance’s In-Place TTT enables adaptive transformer inference by updating MLP down-projection weights in-place at test time, supporting long-context reasoning without extra modules.
OmniStream uses a multi-frame transformer to process continuous video streams with patch-level temporal indexing, supporting downstream vision-language-action tasks.
OpenMythos implements a recurrent-depth transformer that recycles layers via looped blocks, using input injection to prevent signal drift. It scales from 1B to 1T parameters with up to 1M token context.
This repo provides annotated PyTorch implementations of major deep learning papers with side-by-side explanations, aiding understanding and prototyping.
Hugging Face Transformers offers a unified Python API to access over 1 million pretrained AI models for text, vision, and audio, simplifying complex pipelines with its Pipeline API.