Automating knowledge graph extraction from text with LangChain and GPT-4o

Knowledge graphs are a powerful way to represent entities and their relationships, but manually creating them from unstructured text is tedious and error-prone. This repository tackles that pain point by combining LangChain’s experimental graph transformers with OpenAI’s GPT-4o to automate knowledge graph extraction and visualization as an interactive graph.

What this repo does and how it works

At its core, this project is a Streamlit application that takes unstructured text input and outputs a visual knowledge graph representing entities and their relationships. It uses LangChain’s experimental graph transformers to parse the text and identify nodes (entities) and edges (relationships), leveraging GPT-4o as the LLM engine for entity recognition and relation extraction.

The system supports two input modes: users can either type or paste text directly into a text box or upload a plain text (.txt) file. Once processed, the extracted graph data is rendered interactively using PyVis, which applies a physics-based layout to the nodes and edges, making it easy to explore complex graphs in the browser.

Under the hood, the architecture centers on LangChain’s LLM framework. The graph transformer is an experimental feature within LangChain that orchestrates the extraction of structured graph data from raw text using a large language model. The extracted graph structure is then passed to PyVis for rendering.

This repo demonstrates a practical pattern for combining LLMs with a graph visualization library to automate information extraction and representation — turning unstructured natural language into structured, navigable knowledge.

What stands out technically and the tradeoffs

The key technical strength here is the use of LangChain’s experimental graph transformers integrated with OpenAI’s GPT-4o model. This approach leverages the generative capabilities of GPT-4o to recognize entities and infer relationships without the need for handcrafted NLP pipelines or custom entity extraction models.

This means the repo sidesteps traditional heavy NLP preprocessing, instead relying on prompting strategies within LangChain to coax the LLM into outputting structured graph data. The tradeoff is that this method depends heavily on the quality and consistency of LLM outputs, which can sometimes vary or produce incomplete graphs.

The code itself is organized as a Streamlit app, which makes the developer and user experience smooth — you get an immediate web UI for input and visualization without extra frontend work. The choice of PyVis for visualization is practical: it offers a physics-based, interactive graph rendered in the browser, which is more intuitive than static diagrams.

However, this setup is not without limitations. Using an LLM for graph extraction can be costly for large inputs due to API usage, and the experimental status of the graph transformers in LangChain means the feature may evolve or encounter rough edges. Additionally, the entity and relationship extraction quality depends on the prompt design and the underlying model’s knowledge, which might not generalize well to very domain-specific texts.

Quick start

Prerequisites

Python 3.8 or higher
OpenAI API key

Installation and setup

Install required dependencies via:

pip install -r requirements.txt

Clone the repo and set up your environment:

git clone [repository-url]
cd knowledge_graph_app_2

Replace [repository-url] with the actual URL of the repository.

Create a .env file in the root directory with your OpenAI API key:

OPENAI_API_KEY=your_openai_api_key_here

After this, you can run the Streamlit app (usually with streamlit run app.py if the main script is named so) and interactively input text or upload files to see extracted knowledge graphs.

Verdict

This repo is a solid example of using LLMs for automated knowledge graph extraction and visualization in a lightweight, user-friendly package. It’s ideal for developers exploring how to transform unstructured text into structured knowledge representations without building complex NLP pipelines.

That said, it is an experimental proof of concept rather than a production-grade solution. The reliance on GPT-4o means potential costs and variability in output quality. Also, the current focus on plain text input and basic graph rendering might require adaptation for real-world applications needing domain-specific tuning or integration with larger knowledge management systems.

If you want to prototype LLM-driven knowledge graph extraction quickly or study LangChain’s experimental graph transformers in action, this repo is worth your time. For production use, expect to build on top of it with additional validation, error handling, and possibly custom model fine-tuning.

A hands-on course for mastering large language models: fine-tuning, quantization, and tooling — Explore a comprehensive LLM course with practical notebooks on fine-tuning (QLoRA, DPO), quantization (GPTQ), and tools
Navigating free-tier LLM APIs with the awesome-free-llm-apis catalog — A curated catalog of free-tier LLM APIs compatible with OpenAI SDK, detailing rate limits, model specs, and providers to
Pathway LLM App: unified pipelines for scalable retrieval-augmented generation and AI search — Pathway LLM App provides integrated pipelines for scalable RAG and AI search, combining vector and full-text indexing wi
LlamaFactory: modular, extensible fine-tuning framework for large language models — LlamaFactory offers a modular Python framework for fine-tuning 100+ LLMs with diverse algorithms and optimizations, incl
vLLM: Efficient large language model serving with paged attention and continuous batching — vLLM is a Python library for high-throughput LLM inference using paged attention and continuous batching. It supports qu

→ GitHub Repo: thu-vu92/knowledge-graph-llms ⭐ 718 · Jupyter Notebook

Noureddine RAMDI / Automating knowledge graph extraction from text with LangChain and GPT-4o