Build a Custom Knowledge RAG Chatbot

Using n8n to build RAG-powered chatbots

Getting StartedIntermediate 18 min

Build a Custom Knowledge RAG Chatbot

Using n8n to build RAG-powered chatbots

By Mihai Farcas
January 2025
RAGAIVector DatabaseEmbeddings

Overview

A Retrieval Augmented Generation (RAG) chatbot allows you to create AI-powered assistants that can answer questions using specific, indexed knowledge sources. This tutorial demonstrates building a GitHub API documentation chatbot using n8n's workflow automation.

What is RAG?

Retrieval Augmented Generation (RAG) is a technique that enhances AI responses by retrieving relevant information from a knowledge base before generating answers. This solves the problem of AI models having outdated or missing information by grounding their responses in your specific data.

  • Combines retrieval and generation for accurate responses
  • Uses vector databases for semantic search
  • Grounds AI responses in your specific knowledge
  • Reduces hallucinations and improves accuracy

Prerequisites

  • n8n account (cloud or self-hosted)
  • OpenAI API key for embeddings and chat
  • Pinecone account and API key for vector storage
  • Knowledge source (documents, PDFs, web pages, APIs)
  • Basic understanding of embeddings and vector databases

Architecture Overview

A RAG system has two main phases: 1) Indexing - where you process and store your knowledge in a vector database, and 2) Retrieval - where you search for relevant information and use it to generate responses. We'll build both phases in n8n.

Step 1: Fetch Your Knowledge Source

Start by retrieving the content you want your chatbot to know about. In this example, we'll use GitHub's OpenAPI specification, but you could use any documents, PDFs, or web content.

json
{
  "url": "https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/api.github.com/api.github.com.json",
  "method": "GET",
  "responseFormat": "json"
}

Step 2: Process and Chunk Content

Large documents need to be split into smaller chunks for effective retrieval. Use the Document Splitter or Code node to break your content into manageable pieces (typically 500-1000 characters).

javascript
// Example chunking logic
const chunkSize = 800;
const overlap = 100;
const text = $json.content;
const chunks = [];

for (let i = 0; i < text.length; i += chunkSize - overlap) {
  chunks.push({
    text: text.slice(i, i + chunkSize),
    metadata: {
      source: $json.source,
      chunk: chunks.length
    }
  });
}

return chunks;

Step 3: Generate Embeddings

Transform text chunks into numerical vector representations using OpenAI's embedding model. These vectors capture semantic meaning, allowing for intelligent similarity searches.

json
{
  "model": "text-embedding-3-small",
  "input": "{{$json.text}}",
  "dimensions": 1536
}

Step 4: Store in Vector Database

Use the Pinecone Vector Store node to index your embeddings. Configure your Pinecone index with the correct dimensions (1536 for OpenAI embeddings) and connect it to your n8n workflow.

  • Create a Pinecone index with dimension 1536
  • Set similarity metric to cosine
  • Configure namespace for organization
  • Insert document chunks with embeddings

Step 5: Build the Chat Interface

Create a new workflow for the chatbot. Add a Chat Trigger node to provide the user interface where people can ask questions about your indexed knowledge.

Step 6: Configure AI Agent

Add an AI Agent node and write a clear system message that explains the agent's role and knowledge domain.

text
System Message:
You are a helpful assistant providing information about the GitHub API based on the official OpenAPI specifications. 

When answering questions:
- Use the vector store tool to find relevant API documentation
- Provide accurate, specific information from the retrieved context
- Include code examples when helpful
- If information isn't in the context, say so honestly
- Be concise but thorough

Step 7: Add Chat Model

Connect an OpenAI Chat Model node (gpt-4o-mini or gpt-4) to your agent. This model will generate responses based on the retrieved context.

json
{
  "model": "gpt-4o-mini",
  "temperature": 0.3,
  "maxTokens": 800
}

Step 8: Implement Memory

Add a Window Buffer Memory node to maintain conversation context. This allows users to ask follow-up questions that reference previous exchanges.

Step 9: Add Vector Store Retrieval

Connect a Vector Store Tool to your agent. This gives the agent the ability to search your indexed knowledge and retrieve relevant information.

json
{
  "vectorStore": "pinecone",
  "topK": 4,
  "scoreThreshold": 0.7,
  "name": "github_api_docs",
  "description": "Search the GitHub API documentation to answer questions about endpoints, parameters, and usage."
}

Step 10: Test Your RAG Chatbot

Test with specific questions about your knowledge domain. Verify that the chatbot retrieves relevant information and generates accurate responses. Monitor retrieval quality and adjust parameters as needed.

  • Ask specific questions about your knowledge domain
  • Verify retrieved chunks are relevant
  • Check answer accuracy against source material
  • Test edge cases and ambiguous questions
  • Monitor token usage and response times

Optimization Tips

  • Experiment with chunk sizes (500-1000 characters works well)
  • Tune topK parameter (3-5 retrieved chunks is typical)
  • Set appropriate score thresholds to filter irrelevant results
  • Use metadata filters for multi-source knowledge bases
  • Consider hybrid search (keyword + semantic) for better accuracy
  • Monitor and iterate based on user feedback

Conclusion

RAG chatbots combine the power of large language models with your specific knowledge, creating accurate, grounded AI assistants. n8n makes building RAG systems accessible through visual workflows that connect embeddings, vector databases, and chat models without complex coding.

Next Steps

  • Add multiple knowledge sources to your vector database
  • Implement hybrid search combining keywords and semantics
  • Experiment with different embedding models
  • Add citations showing which documents informed each answer
  • Create specialized RAG chatbots for different knowledge domains
  • Explore advanced retrieval techniques like re-ranking