How to Build a RAG for Cryptocurrency News

Learn the steps to make a chatbot powered by AI with Python completly for free with Chromadb, Langchain and LLMs.

May 10, 2025

The crypto space is constantly evolving, with market trends shifting rapidly in response to geopolitical events, protocol updates, social media buzz, significant cryptocurrency movements, and influential figures, among other factors.

As a result, crypto articles can become outdated quickly. What’s relevant one week may not be the next. Therefore, writers must continuously monitor trends and produce fresh content at a rate that is hard to see in other technological fields.

This constant stream of cryptocurrency, blockchain, and web3 news serves as a valuable resource for those looking to inform their investment strategies, develop apps around trending topics, analyze market sentiment, or simply stay up-to-date on the latest ecosystem advancements.

We can get the latest news with web scrapers, but what about fetching the information that we’re looking for without spending time reading through several articles or making extensive Google searches?

To address this challenge, I created a Retrieval-Augmented Generation (RAG) system specifically designed for crypto news. This system enables users to query topics related to cryptocurrencies, blockchain, and web3, and receive the most up-to-date information on those subjects.

For example, if you ask ChatGPT to explain Decentralized Finance (DeFi), it can provide a detailed and comprehensive answer. However, if you ask about the latest Ethereum price predictions, the chat interface may need to search the web, and its response will likely be limited.

In this piece, I will guide you through the steps I took to build CryptoRAG, a chatbot tailored to get the latest information about cryptocurrencies, blockchain and web3.

We’ll start by exploring the fundamentals of RAG systems and the specific architecture I implemented. Next, we’ll cover how to upload the data to the Chroma database by creating Langchain documents. In addition, we’ll look at the query function, the prompt template, and two different ways of using Llama3.3 in the pipeline. Finally, we’ll bring everything together by creating a user interface to enable question-answering capabilities.

Hands-on with my CryptoRAG architecture

Large Language Models (LLMs) are trained on huge volumes of publicly available data on the internet, and use countless parameters to originate outputs that can answer your questions, create original content, or simply communicate with you. However, just like human beings, these models don’t have all the information in the world, and certainly not the latest one.

Retrieval-Augmented Generation (RAG) fixes this by integrating external data from a specific domain, document, or other sources with a Large Language Model (LLM). This hybrid approach enables the model to produce outputs that are tailored to a particular field or industry.

The RAG process is split into 3 main modules

Retrieval Module: This module fetches and ranks relevant data within a vector database based on a given input. During this process, the input query is analyzed, and the system searches through indexed data to retrieve information that aligns with the query, prioritizing relevance.
Augmentation Module: This second module plays a crucial role in enhancing the performance of the RAG system. Its function is to take the top-ranked data retrieved from the Retrieval Module and seamlessly integrate it into the prompt that feeds the Large Language Model (LLM). By adding the relevant external data to the prompt, the Augmentation Module provides the LLM with additional context and information, enabling more accurate, informative, and relevant responses.
Generation Module: This module generates a response by combining its language capabilities with newly retrieved external data, which is incorporated in the prompt template for the LLM. The result is a response based on the external data but cleaned and explained like the LLM would do for it’s default trained data. Depending on the prompt template, the output can show sources and other metadata features.

Based on the above steps, my CryptoRAG architecture ended up like this:

CryptoRAG architecture — image by author

Outside the RAG modules, there’s the Web Scraping script I run daily to get the latest cryptocurrency news using my Cointelegraph Scraper on Apify.

Since I don’t need a big storage solution, a simple Google Sheet serves to retain the crypto news data temporarily.

Then I used a script to extract the data from the Google Sheet and start the encoding process, where I used Langchain to split the data into chunks and add metadata. These chunks are then vectorized using a well-known embedding model from Hugging Face: sentence-transformers/all-MiniLM-L6-v2. At this point, we are already in the Retrieval Module.

Once the chunks are vectorized, they are stored in the Chroma vector database. I chose Chroma because it is a completely open-source solution with no associated costs, making it an accessible and budget-friendly option.

When an input is provided to the RAG system, it is also encoded using the same embedding model, and the most relevant chunks of information are fetched from the Chroma database using similarity search. In other words, it compares vectors of information, and the closest numbers match the information that is most related to the input.

Now we enter the Augmentation Module, where the most similar documents are used to create a prompt template to feed into the LLM model (Llama 3.3). Later, we’ll explore how to design an effective prompt and use the model, whether for free or via paid APIs.

Once the prompt template is applied to the LLM, it generates a response to the initial query, completing the Q&A process. The input and outputs are connected through a simple chat interface with some suggestions for those starting to use the app.

Access the data and create documents with Langchain

As mentioned previously, the scraper uploads the data to a Google Sheet. To access it, some configurations in the Google Cloud Platform must be done beforehand, such as creating a service account and generating credentials. You can learn more about it here.

Once done, we can use gspread Python library to access the Google Sheet and extract the information as a Pandas dataframe.

import pandas as pd
import gspread
from google.oauth2.service_account import Credentials

class GoogleAccess:
    """ Class containing the Google interaction functions """

    @staticmethod
    def google_authentication(credentials) -> object:
        """ This function gives authentication to the Google Account.

        Scopes defines the permissions to Google Sheets and Google Drive.
        Than, using gspread, it authorizes the authentication and
        opens the worksheet to work with.

        """
        # Authenticate with Google Sheets using the JSON key file
        scope = ['https://www.googleapis.com/auth/spreadsheets',
                'https://www.googleapis.com/auth/drive']

        creds = Credentials.from_service_account_file(
            credentials, scopes=scope)

        client = gspread.authorize(creds)

        return client

    @staticmethod
    def read_from_sheet(client, sheet_id) -> pd.DataFrame:
        """ This function reads the data from the Google Sheet. """

        # Open the Google Sheet by id
        sheet = client.open_by_key(sheet_id)

        # Access the worksheet name
        ws = sheet.worksheet('main')

        # Create a dataframe
        df = pd.DataFrame(data=ws.get_all_records())
        return df

The credentials are stored in a JSON file that contains the service account information. To create Langchain documents, we just need to use the credentials file along with the Google Sheet ID as inputs in this function:

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document

def create_documents(credentials_google, google_sheet_id):
    """ This function splits the articles in chunks 
    from a dataset obtained by reading a google sheet. 
    And creates documents with each chunk.
    """

    # Create client
    client = GoogleAccess.google_authentication(credentials_google)
    df = GoogleAccess.read_from_sheet(client, google_sheet_id)
    df['id'] = df['id'].astype('str')

    # Initialize the text splitter
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=5000,
        chunk_overlap=0)

    # Create a list of LangChain documents
    documents = []

    for _, row in df.iterrows():
        content_chunks = splitter.split_text(row['content'])
        for i, chunk in enumerate(content_chunks):
            documents.append(Document(
                page_content=chunk,
                metadata={
                    "category": row['category'],
                    "link": row['link']
                },
                id=f"{row['id']}{i}"
            ))

    return documents

For each article (row in the dataframe), I divide it into chunks of 5,000 characters. I opted for larger chunks because smaller ones seemed to negatively impact my RAG’s performance, resulting in poor-quality information about the articles. I also included the link in the metadata to embed the sources of information directly into the final chatbot interface.

Upload the encoded documents to the Chroma vector database

The previous functions allowed me to create the Langchain documents, but I still needed to vectorize and upload them to the vector database. This is achieved in the following script:

import os
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
from src.preprocess import create_documents
from langchain_chroma import Chroma

load_dotenv('keys.env')

def update_chromadb():
    """Update ChromaDB with new documents by deleting existing
      collection and creating a new one."""
    try:
        # Create temporary directory for new database
        temp_dir = "chroma_temp"
        os.makedirs(temp_dir, exist_ok=True)

        # Initialize Hugging Face embeddings
        embedding_model = HuggingFaceEmbeddings(
            model_name="sentence-transformers/all-MiniLM-L6-v2")

        # Create new documents
        documents = create_documents(
            'credentials.json', os.getenv("SHEET_ID"))

        # Create new database in temporary location
        Chroma.from_documents(
            documents=documents,
            collection_name="cointelegraph", 
            embedding=embedding_model,
            persist_directory=temp_dir
        )

        # If successful, replace old database with new one
        if os.path.exists("chroma"):
            shutil.rmtree("chroma")
        shutil.move(temp_dir, "chroma")

        logger.info(
            "Successfully refreshed ChromaDB collection with new documents")
        
        return True
        
    except Exception as e:
        logger.error("Error updating ChromaDB: %s", str(e))
        if os.path.exists(temp_dir):
            shutil.rmtree(temp_dir)
        raise

The chroma object, integrated with Langchain, is instantiated using the .from_documents() function to upload the previously created documents and simultaneously embed them using the sentence-transformers/all-MiniLM-L6-v2 model from Hugging Face.

The data is stored in the cointelegraph collection, which works like a table in SQL databases. During the process, the data is temporarily stored in a folder named chroma_temp. If the documents are successfully updated, they replace the existing data in the main database folder, chroma.

At the moment of launching the application, the following function is triggered to get the Chroma db object without making any changes in the database:

def initialize_chatbot():
    """Initialize ChromaDB and embeddings."""
    global db, embedding_model, last_update

    try:
        # Initialize Hugging Face embeddings
        embedding_model = HuggingFaceEmbeddings(
            model_name="sentence-transformers/all-MiniLM-L6-v2")

        # Load existing database
        db = Chroma(
            collection_name="cointelegraph",
            embedding_function=embedding_model,
            persist_directory="chroma"
        )
        
        if not last_update:
            last_update = datetime.now()
            
        logger.info("Successfully loaded ChromaDB collection")
    except Exception as e:
        logger.error("Error initializing ChromaDB: %s", str(e))
        raise

This db object will be used to query the vector database and obtain a response, which we will see in the next chapter.

Query the RAG system and get a detailed output

This is the part where we finally use the LLM, and for that, I tried two approaches. The first one is free to use with Ollama. But if you lack a GPU, chances are that the model will take too much computational time. The second approach is using OpenRouter models and the OpenAI Python library.

Use an LLM for free in your RAG system

Let’s start by exploring the free method using Ollama. You first need to install it via curl in your WSL or Linux machine with this command:

curl -fsSL https://ollama.com/install.sh | sh

Next, run the following command to start Ollama:

ollama serve

Fetch the Ollama model you want to use:

ollama run llama3.3

The model Llama3.3 is huge! So either you have enough storage (and a GPU) to run it, or I recommend you look for lightweight Ollama model alternatives, such as mistral.

Now guess what? There’s also a Langchain python package for Ollama, so let’s install it:

pip install langchain-ollama

With this configuration, our query function would look like this:

from langchain_core.prompts import PromptTemplate
import os
from langchain_ollama.llms import OllamaLLM


def query_rag(query_text, db, prompt, model_name):
    """
    Query a Retrieval-Augmented Generation (RAG) system using Chroma database,
    and format the response with links and categories.
    """

    # Retrieving the context from the DB using similarity search
    results = db.similarity_search_with_relevance_scores(query_text, k=3)

    # Check if there are any matching results or if
    # the relevance score is too low
    if len(results) == 0 or results[0][1] < 0.2:
        return (
            "No relevant information found. Try to ask something "
            "related to crypto, blockchain and web3.")

    else:
        # Combine context from matching documents
        context_text = "\n\n - -\n\n".join(
            [doc.page_content for doc, _score in results]
        )

        # Extract categories and links from metadata
        categories = ", ".join(
            {doc.metadata.get("category", "Unknown") for doc, _ in results})
        links = ", ".join(
            {doc.metadata.get("link", "N/A") for doc, _ in results})

        # Create and format the prompt
        template = PromptTemplate(
            input_variables=["context", "question", "categories", "links"],
            template=prompt,
        )
        prompt = template.format(
            context=context_text,
            question=query_text,
            categories=categories,
            links=links
        )

        # Use the language model to generate a response
        model = OllamaLLM(
           model=model_name, cache=False, verbose=True, num_ctx=500)
        response_text = model.invoke(prompt)

        # Format and return response including generated text and sources
        formatted_response = f"{response_text}\nSources: {links}"

        return formatted_response

Note that the model_name input you see in the script above, should be a string with the name of the model you are running in Ollama, for instance llama3.3.

Use OpenRouter models with OpenAI

If you don’t have a GPU, like me, the best option is to use a paid service. I prefer OpenRouter because it allows me to purchase credits with crypto and works seamlessly with the OpenAI Python package:

pip install openai

Once the package is installed, you need to generate a token in your OpenRouter account and change the query function to this instead:

from langchain_core.prompts import PromptTemplate
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv("keys.env")


def query_rag(query_text, db, prompt):
    """
    Query a Retrieval-Augmented Generation (RAG) system using Chroma database,
    and format the response with links and categories.
    """

    # Retrieving the context from the DB using similarity search
    results = db.similarity_search_with_relevance_scores(query_text, k=3)

    # Check if there are any matching results or if
    # the relevance score is too low
    if len(results) == 0 or results[0][1] < 0.2:
        return (
            "No relevant information found. Try to ask something "
            "related to crypto, blockchain and web3.")

    else:
        # Combine context from matching documents
        context_text = "\n\n - -\n\n".join(
            [doc.page_content for doc, _score in results]
        )

        # Extract categories and links from metadata
        categories = ", ".join(
            {doc.metadata.get("category", "Unknown") for doc, _ in results})
        links = ", ".join(
            {doc.metadata.get("link", "N/A") for doc, _ in results})

        # Create and format the prompt
        template = PromptTemplate(
            input_variables=["context", "question", "categories", "links"],
            template=prompt,
        )
        prompt = template.format(
            context=context_text,
            question=query_text,
            categories=categories,
            links=links
        )

        client = OpenAI(
            base_url="https://openrouter.ai/api/v1",
            api_key=os.getenv("API_KEY"),
            )

        completion = client.chat.completions.create(
            model="meta-llama/llama-3.3-70b-instruct",
            messages=[
                {
                "role": "user",
                "content": prompt
                }
            ]
        )

        response_text = completion.choices[0].message.content

        formatted_response = f"{response_text}\nSources: {links}"

        return formatted_response

This approach produces much faster results, than the free one, but also comes with a cost. For instance, using 2,495 tokens in the prompt, I paid $0.000433.

Build the right prompt template

As you have seen in the previous scripts, the query_rag() function takes a prompt as input. There are many ways of building it, mine took this direction:

PROMPT_TEMPLATE_1 = """
    You are an expert assistant. Using the provided context from the 
    database, answer the question in detail. Write your response without 
    referencing articles, documents, or sources explicitly. Never start or 
    mention the following: "In the piece", "In the article" and so on in the 
    response. Provide clear and concise answers that are helpful and
    relevant to the question.

    Context:
    {context}

    Question:
    {question}

    Answer:
    - Detailed response: 
    - Relevant categories: {categories}
    - Links to explore further: {links}
    """

I had to explicitly state that I didn’t want the answer to begin by mentioning articles, as that happened a few times. Additionally, I wanted to integrate the context, the question, and the metadata information in the final output.

To create a template prompt for the model, I had to install langchain_core:

pip install langchain-core

Once done, I use the PromptTemplate instance to create the final prompt object that you can see in the previous scripts (both paid and free versions).

Build the chat interface with Flask and Cursor IDE

I’ve decided to use Python for the backend because it eases the process of connecting to Langchain, LLM packages and Chroma database. Therefore, I used the Flask framework to run the server.

In addition, I used the capabilities of Cursor composer to help me build a robust app.py script that runs daily, because I need the latest information to be uploaded to the Chroma database while the application runs.

The final main function looks like this:

""""Flask web application for Web3 & Cryptocurrency Expert Chatbot."""
# pylint: disable=W0718
import logging
import sys
import os
from langchain_chroma import Chroma
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
from flask import Flask, request, jsonify, render_template, send_from_directory
from flask_cors import CORS
from src.pipeline import query_rag
from src.prompts import PROMPT_TEMPLATE_1
from flask_apscheduler import APScheduler
from datetime import datetime
from src.db_handler import update_chromadb

class Config:
    SCHEDULER_API_ENABLED = True

scheduler = APScheduler()

# Configure logging with more detailed format
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(sys.stdout),
        logging.FileHandler('app.log')
    ]
)
logger = logging.getLogger(__name__)

# Initialize Flask app
app = Flask(__name__, static_folder='static', static_url_path='/static')
app.config.from_object(Config())
CORS(app)

# Initialize global variables for database and embedding model
db = None
embedding_model = None
last_update = None

def update_db_task():
    """Background task to update the database"""
    global db, last_update
    try:
        logger.info("Starting scheduled database update...")
        
        # Update the database
        update_chromadb()
        
        # Reinitialize the database connection
        initialize_chatbot()
        
        last_update = datetime.now()
        logger.info("Scheduled database update completed successfully")
    except Exception as e:
        logger.error(f"Error in scheduled database update: {str(e)}")

@app.route('/')
def home():
    """Render the chat interface."""
    try:
        return render_template('index.html')
    except Exception as e:
        logger.error("Error serving home page: %s", str(e))
        return "Internal server error", 500

@app.route('/chat', methods=['POST'])
def chat():
    """Handle chat messages using RAG system."""
    try:
        if db is None or embedding_model is None:
            logger.error("Chatbot components not properly initialized")
            return jsonify({'error': 'Chatbot not initialized properly'}), 503

        data = request.get_json()
        if not data or 'message' not in data:
            logger.warning("Invalid request: missing message")
            return jsonify({'error': 'No message provided'}), 400

        user_message = data['message'].strip()
        logger.info("Processing chat message: %s...", user_message[:50])

        # Use RAG system to generate response
        formatted_response= query_rag(
            user_message, db, PROMPT_TEMPLATE_1)

        logger.info("Successfully generated response")
        return jsonify({
            'response': formatted_response,
        })

    except Exception as e:
        logger.error("Error processing chat message: %s", str(e))
        return jsonify(
            {'error': 'An error occurred while processing your request'}), 500


if __name__ == '__main__':
    try:
        # Clear log files at startup
        clear_log_files()
        
        # Initialize scheduler
        scheduler.init_app(app)
        
        # Schedule database updates every 10 minutes
        scheduler.add_job(
            id='update_db',
            func=update_db_task,
            trigger='interval',
            days=1,
            next_run_time=datetime.now()  # Run once immediately
        )
        
        scheduler.start()

        # Initialize chatbot components
        logger.info("Starting application initialization...")
        initialize_chatbot()        
        
        logger.info(
            "Chatbot initialization and "
            "scheduler setup completed successfully")

        # Start Flask server
        logger.info("Starting Flask server on port 5000...")
        app.run(host='0.0.0.0', port=5000, debug=False, use_reloader=False)
    except Exception as e:
        logger.error("Critical server error: %s", str(e))
        sys.exit(1)

For the front-end, I mostly relied on Cursor composer to build the interface, and then I adjusted the HTML elements myself. However, it’s clear that you don’t need to be a front-end developer to create a ready-to-go application nowadays. Here’s how the app looks like:

Preview of the CryptoRAG application — image by author

For deployment, I’m currently using Replit. I simply connected my GitHub account, made a few configurations, and clicked the deploy button.

Do you need help with automation, web scraping, AI, data, or anything that my laptop can deliver? Feel free to reach out to me on Upwork! 👨‍💻

Let's build together!

Final thoughts

Building a RAG system doesn’t take much time today with the help of AI and the right tools. With so many frameworks and options available, it can be overwhelming to know where to start.

That’s why, even though it’s not the most user-friendly, I prefer sticking with Langchain. I need to install several libraries during the process, but at least I know they’re compatible and won’t break the pipeline. For vector databases, free and open-source solutions like Chroma DB work well, and when it comes to LLMs, just choose the one that fits your needs.

Concerning programming, there’s no going back, you either use Copilot, Cursor, or another AI-powered IDE, or your productivity will lag behind both developers and non-developers. It’s amazing to see how quickly we can build applications now!

I hope this tutorial helps you get started with your own RAG system and saves you time searching for the right tools. And remember, try CryptoRAG and leave your honest feedback/support if you find the chatbot useful!

Dad Half Dev

Discussion about this post