Transformers on Noureddine RAMDI

Transformers on Noureddine RAMDIhttps://ramdi.fr/tags/transformers/Recent content in Transformers on Noureddine RAMDIHugoenSat, 23 May 2026 20:41:27 +0000AI-ML-Cheatsheets: a structured collection of AI and machine learning reference sheetshttps://ramdi.fr/github-stars/ai-ml-cheatsheets-a-structured-collection-of-ai-and-machine-learning-reference-sheets/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/ai-ml-cheatsheets-a-structured-collection-of-ai-and-machine-learning-reference-sheets/AI-ML-Cheatsheets offers a modular, offline-ready collection of concise AI/ML reference sheets from foundational math to transformers and large language models.Fast3R: scalable multi-view 3D reconstruction with a single forward passhttps://ramdi.fr/github-stars/fast3r-scalable-multi-view-3d-reconstruction-with-a-single-forward-pass/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/fast3r-scalable-multi-view-3d-reconstruction-with-a-single-forward-pass/Fast3R from Meta FAIR processes 1000+ unordered images simultaneously for 3D reconstruction using a ViT-Large backbone and multi-view attention, eliminating iterative matching.Lynx: modular personalized video generation with dual adapters on a frozen diffusion transformerhttps://ramdi.fr/github-stars/lynx-modular-personalized-video-generation-with-dual-adapters-on-a-frozen-diffusion-transformer/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/lynx-modular-personalized-video-generation-with-dual-adapters-on-a-frozen-diffusion-transformer/Lynx generates personalized videos from a single image using a frozen Diffusion Transformer with ID and Ref adapters. This modular design balances fidelity and efficiency.pdf-document-layout-analysis: a dual-model PDF layout analysis microservice with Docker deploymenthttps://ramdi.fr/github-stars/pdf-document-layout-analysis-a-dual-model-pdf-layout-analysis-microservice-with-docker-deployment/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/pdf-document-layout-analysis-a-dual-model-pdf-layout-analysis-microservice-with-docker-deployment/pdf-document-layout-analysis is a Dockerized microservice using Vision Grid Transformer and LightGBM for PDF layout analysis, offering high accuracy or fast processing with OCR, translation, and multi-format export.Falcon-Perception: a minimal multimodal PyTorch engine for object detection, segmentation, and OCRhttps://ramdi.fr/github-stars/falcon-perception-a-minimal-multimodal-pytorch-engine-for-object-detection-segmentation-and-ocr/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/falcon-perception-a-minimal-multimodal-pytorch-engine-for-object-detection-segmentation-and-ocr/Falcon-Perception is a PyTorch engine for multimodal autoregressive Transformers handling detection, segmentation, and OCR with FlexAttention and efficient caching.Hands-On Large Language Models: A practical, visual journey through LLM engineeringhttps://ramdi.fr/github-stars/hands-on-large-language-models-a-practical-visual-journey-through-llm-engineering/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/hands-on-large-language-models-a-practical-visual-journey-through-llm-engineering/Explore the Hands-On Large Language Models repo, a Jupyter notebook-based practical guide from fundamentals to fine-tuning, designed for hands-on LLM learning on free Colab GPUs.Streaming 3D scene reconstruction with LingBot-Map’s geometric context transformerhttps://ramdi.fr/github-stars/streaming-3d-scene-reconstruction-with-lingbot-maps-geometric-context-transformer/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/streaming-3d-scene-reconstruction-with-lingbot-maps-geometric-context-transformer/LingBot-Map performs streaming 3D reconstruction from long image sequences at ~20 FPS using a geometric context transformer and paged KV cache attention for efficient memory management.Exploring DeepMind's representations4d: advanced self-supervised video representations with moving latent tokenshttps://ramdi.fr/github-stars/exploring-deepmind-s-representations4d-advanced-self-supervised-video-representations-with-moving-latent-tokens/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/exploring-deepmind-s-representations4d-advanced-self-supervised-video-representations-with-moving-latent-tokens/Google DeepMind’s representations4d bundles three self-supervised video learning approaches using transformers, including a novel object-centric tracking method with latent tokens moving off the pixel grid.In-Place TTT: Adaptive test-time training for transformer LLMs with in-place fast-weight updateshttps://ramdi.fr/github-stars/in-place-ttt-adaptive-test-time-training-for-transformer-llms-with-in-place-fast-weight-updates/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/in-place-ttt-adaptive-test-time-training-for-transformer-llms-with-in-place-fast-weight-updates/ByteDance’s In-Place TTT enables adaptive transformer inference by updating MLP down-projection weights in-place at test time, supporting long-context reasoning without extra modules.OmniStream: a multi-frame transformer for continuous video stream perceptionhttps://ramdi.fr/github-stars/omnistream-a-multi-frame-transformer-for-continuous-video-stream-perception/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/omnistream-a-multi-frame-transformer-for-continuous-video-stream-perception/OmniStream uses a multi-frame transformer to process continuous video streams with patch-level temporal indexing, supporting downstream vision-language-action tasks.OpenMythos: Exploring recurrent-depth transformers with input injection for sustained reasoninghttps://ramdi.fr/github-stars/openmythos-exploring-recurrent-depth-transformers-with-input-injection-for-sustained-reasoning/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/openmythos-exploring-recurrent-depth-transformers-with-input-injection-for-sustained-reasoning/OpenMythos implements a recurrent-depth transformer that recycles layers via looped blocks, using input injection to prevent signal drift. It scales from 1B to 1T parameters with up to 1M token context.annotated_deep_learning_paper_implementations: annotated PyTorch implementations of key deep learning papershttps://ramdi.fr/github-stars/annotated-deep-learning-paper-implementations-annotated-pytorch-implementations-of-key-deep-learning-papers/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/annotated-deep-learning-paper-implementations-annotated-pytorch-implementations-of-key-deep-learning-papers/This repo provides annotated PyTorch implementations of major deep learning papers with side-by-side explanations, aiding understanding and prototyping.Hugging Face Transformers: a unified API for state-of-the-art AI models across modalitieshttps://ramdi.fr/github-stars/hugging-face-transformers-a-unified-api-for-state-of-the-art-ai-models-across-modalities/Sun, 26 Apr 2026 09:31:26 +0000https://ramdi.fr/github-stars/hugging-face-transformers-a-unified-api-for-state-of-the-art-ai-models-across-modalities/Hugging Face Transformers offers a unified Python API to access over 1 million pretrained AI models for text, vision, and audio, simplifying complex pipelines with its Pipeline API.