ALICE is a Python-based toolkit for managing YOLO training datasets from home camera setups, featuring a unique single-file builder and seamless Frigate NVR integration.
DeepSpeed is a Python library that optimizes large-scale deep learning training with multi-hardware support and JIT CUDA extensions. Explore its architecture, strengths, and quick installation.
DiT4DiT uses a frozen Cosmos-Predict2.5 video transformer backbone combined with flow-matching action heads to model robot actions as video latent transitions, achieving near-perfect success on LIBERO and real-time humanoid control.
Hivemind is a PyTorch library enabling decentralized deep learning over the internet using a peer-to-peer Distributed Hash Table (DHT). It supports fault-tolerant training and decentralized parameter averaging without global sync.
ML-From-Scratch offers bare-bones Python implementations of key machine learning algorithms using only NumPy, focusing on transparency over efficiency. Explore how it demystifies ML fundamentals.
OmniGen2 unifies visual understanding, text-to-image generation, and image editing using distinct decoding pathways for text and images, built on Qwen-VL-2.5 with CPU offloading for accessibility.
OverlapNet uses Siamese networks on 2D range images from 3D LiDAR to detect loop closures by predicting overlap and relative yaw angle simultaneously. Practical demos included.
Pixal3D generates high-fidelity 3D assets with PBR textures from a single image using pixel-aligned projection conditioning. It offers a three-stage cascade and low-VRAM mode for consumer GPUs.
SAM3-UNet adapts Meta’s SAM3 foundation model for dense prediction tasks using a parameter-efficient adapter and U-Net decoder, enabling training under 6 GB GPU memory.
SVFR combines blind face restoration, colorization, and inpainting in a single stable video diffusion model, enabling efficient multi-task video face enhancement.
Tencent’s HY-World 2.0 generates persistent 3D assets from text, images, or video using a four-stage pipeline. It outputs editable worlds compatible with Blender, Unity, and Unreal Engine.
Explore how ai-by-hand-excel implements deep learning architectures like Transformers entirely in Excel formulas, exposing the math behind AI step-by-step without code.
CodeFormer uses a codebook transformer architecture for blind face restoration, letting users control the tradeoff between quality and fidelity with a unique fidelity weight parameter.
Medical-SAM3 adapts the SAM3 foundation model for universal prompt-driven medical image segmentation, offering pretrained weights and evaluation tools on diverse medical datasets.
Explore a 100-day machine learning coding challenge combining classical algorithms, deep learning, and curated resources. A practical, day-by-day learning path for self-directed devs.
AniGen is a Linux-only Python project for 3D animation generation using NVIDIA GPUs and CUDA. It integrates PyTorch, spconv, and pytorch3d with a smooth setup script for complex dependencies.
Awesome-Deblurring compiles 100+ key papers tracing image and video deblurring from classical optimization to modern deep learning, serving as a go-to bibliography for researchers and developers.
DIMO distills motion priors from text-conditioned and multi-view video models into a shared latent space, enabling diverse 3D motion generation for arbitrary objects using 3D Gaussian splatting and 4D rendering.
Magika replaces magic-byte heuristics with a tiny deep learning model for file type detection, achieving ~99% accuracy across 200+ types with 5ms CPU inference.
Omni-Diffusion models text, image, and speech tokens jointly via masked discrete diffusion, enabling any-to-any multimodal generation with a single unified model.