Building a GenAI PDF Query App with a RAG Architecture

September 19, 2024

This article will cover building a Generative AI PDF query web application that uses a retrieval-augmented generation (RAG) architecture. The PDF query web application will go through your PDF’s and provide you responses to your questions based on what is in your PDF’s. The importance of using RAG is the ability to scope the results of the generated response from the LLM in our case Claude 3.5 Sonnet with up-to-date, accurate, reliable responses. RAG allows for domain-specific contextually relevant responses tailored to your data rather than static training data.

The PDF query web application will leverage Facebook AI Similarity Search (FAISS) and Amazon Titan Embeddings to create vector representations of unstructured text and the storage/search of those embeddings. LangChain is utilized for the prompt template guiding the models response, RetrievalQA for pertinent data, and various PDF processing tools. We will use Amazon Bedrock to access Claude 3.5 Sonnet and Amazon Titan Embeddings.

PDF Query RAG-Based LLM GenAI Web Application Process Flow

Code:

GitHub repo with files referenced in this blog pdf-query-rag-llm-app

Prerequisites:

  • Amazon Web Services Account
  • Enable Amazon Bedrock Access (Specifically Amazon Titan Embeddings and Claude 3.5 Sonnet) see: Manage access to Amazon Bedrock foundation models
  • EC2 Instance Role with AmazonBedrockFullAccess Policy Attached (note you can make this more secure by making a custom policy)
  • Verified on EC2 Instance Ubuntu 22.04 and Ubuntu 24.04
  • Verified with Python 3.10, 3.11, 3.12
  • Virtualenv
  • AWS Default Region is set to us-east-1 you can change the region in the pdf_query_rag_llm_app.py file under region_name='us-east-1'

AWS Resource Cost:

As with most AWS services you will incur costs for usage.

EC2 Ubuntu Instance Setup:
(This article assumes you have a ubuntu user with /home/ubuntu)

Step 0
Install some dependencies.


sudo apt -y update

sudo apt -y install build-essential openssl

sudo apt -y install libpq-dev libssl-dev libffi-dev zlib1g-dev

sudo apt -y install python3-pip python3-dev

sudo apt -y install nginx

sudo apt -y install virtualenvwrapper

Step 1
Setup the Python environment.

cd /home/ubuntu

virtualenv pdf-query-rag-llm-app_env

source pdf-query-rag-llm-app_env/bin/activate

Step 2
Create a directory for your project.

mkdir /home/ubuntu/pdf-query-rag-llm-app

cd /home/ubuntu/pdf-query-rag-llm-app

Step 3
Create a file requirements.txt for your Python dependencies.

awscli
pypdf
streamlit
boto3
faiss-cpu
langchain-aws
langchain_community

Install your Python dependencies.

pip install -r requirements.txt

Step 4
Create a file pdf_query_rag_llm_app.py which has the PDF query application code.

cd /home/ubuntu/pdf-query-rag-llm-app

vim pdf_query_rag_llm_app.py

import os
import boto3
import streamlit as st
from langchain_aws import BedrockEmbeddings
from langchain_aws.chat_models import ChatBedrock
from langchain_community.vectorstores.faiss import FAISS
from langchain_community.document_loaders.pdf import PyPDFDirectoryLoader
from langchain.chains.retrieval_qa.base import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.globals import set_verbose
set_verbose(False)

bedrock = boto3.client(service_name='bedrock-runtime', region_name='us-east-1')
titan_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=bedrock)

# Data Preparation
def data_ingestion():
    loader = PyPDFDirectoryLoader("data")
    documents = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=500)
    docs = text_splitter.split_documents(documents)
    return docs

# Vector Store Setup
def setup_vector_store(documents):
    vector_store = FAISS.from_documents(
        documents,
        titan_embeddings,
    )
    vector_store.save_local("faiss_index")

# LLM Setup
def load_llm():
    llm = ChatBedrock(model_id="anthropic.claude-3-5-sonnet-20240620-v1:0", client=bedrock, model_kwargs={"max_tokens": 2048})
    return llm

# LLM Guidelines
prompt_template = """Use the following pieces of context to answer the question at the end. Follow these rules:
1. If the answer is not within the context knowledge, state that you do not know, and do not fabricate an answer.
2. If you find the answer, create a detailed, and concise response to the question. Aim for a summary of max 200 words.
3. Do not add extra information not within the context.

{context}

Question: {question}
Helpful Answer:"""

# Prompt Template
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

# Create QA Chain 
def get_result(llm, vector_store, query):
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=vector_store.as_retriever(
            search_type="similarity", search_kwargs={"k": 3}
        ),
        chain_type_kwargs={"prompt": prompt},
        return_source_documents=True,
    )
    # Apply LLM
    result = qa_chain.invoke(query)
    return result['result']

# Streamlit Frontend UI Section
def streamlit_ui():
    st.set_page_config("PDF Query RAG LLM Application")
    st.markdown("""
        
    """, unsafe_allow_html=True)
    st.header("PDF Query with Generative AI")

    user_question = st.text_input(r":gray[$\textsf{\normalsize Ask me anything about your PDF collection.}$]")

    left_column, middleleft_column, middleright_column, right_column = st.columns(4, gap="small")
    if left_column.button("Generate Response", key="submit_question") or user_question:
        # first check if the vector store exists
        if not os.path.exists("faiss_index"):
            st.error("Please create the vector store first from the sidebar.")
            return
        if not user_question:
            st.error("Please enter a question.")
            return
        with st.spinner("Generating... this may take a minute..."):
            faiss_index = FAISS.load_local("faiss_index", embeddings=titan_embeddings,
                                           allow_dangerous_deserialization=True)
            llm = load_llm()
            st.write(get_result(llm, faiss_index, user_question))
            st.success("Generated")
    if middleleft_column:
        st.empty()
    if middleright_column:
        st.empty()
    if right_column.button("New Data Update", key="update_vector_store"):
        with st.spinner("Updating... this may take a few minutes as we go through your PDF collection..."):
            docs = data_ingestion()
            setup_vector_store(docs)
            st.success("Updated")

if __name__ == "__main__":
    streamlit_ui()

Step 5
Create a data directory to hold your PDF’s. *note you will want to put your PDF’s in this directory that you want to query..

cd /home/ubuntu/pdf-query-rag-llm-app

mkdir data

Step 6
Create a Systemd service file. Systemd is a service manager and in this use case it will manage streamlit.

sudo vim /etc/systemd/system/pdf-query-rag-llm-app.service

Add the following to pdf-query-rag-llm-app.service

[Unit]
Description=PDF query with Generative AI Service
After=network.target
 
[Service]
User=root
Group=root
WorkingDirectory=/home/ubuntu/pdf-query-rag-llm-app
Environment="PATH=/home/ubuntu/pdf-query-rag-llm-app_env/bin"
ExecStart=/home/ubuntu/pdf-query-rag-llm-app_env/bin/streamlit run /home/ubuntu/pdf-query-rag-llm-app/pdf_query_rag_llm_app.py --server.port 8080 --server.headless true
 
[Install]
WantedBy=multi-user.target

Step 7
Start your application via Systemd and enable it to auto start after reboot

sudo systemctl start pdf-query-rag-llm-app.service

sudo systemctl enable pdf-query-rag-llm-app.service

Step 8
Create your applications NGINX config nginx_ai-image-generator.conf that makes NGINX listen on port 80 and proxies requests to Streamlit on port 8080. NGINX will help with connection handling increasing performance.

sudo vim /etc/nginx/sites-available/nginx_pdf-query-rag-llm-app.conf

Add the following to nginx_pdf-query-rag-llm-app.conf

server {
    listen 80 default_server;

    location / {
        include proxy_params;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Real-IP      $remote_addr;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_buffers 8 24k;
        proxy_buffer_size 4k;
        proxy_pass http://127.0.0.1:8080;
    }
    location /_stcore/stream {
        proxy_pass http://127.0.0.1:8080/_stcore/stream;
        proxy_http_version 1.1;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_read_timeout 86400;
    }
}

Step 9
Delete your default NGINX conf file, enable your new NGINX conf and restart NGINX (Port 80)


sudo rm /etc/nginx/sites-enabled/default

sudo ln -s /etc/nginx/sites-available/nginx_pdf-query-rag-llm-app.conf /etc/nginx/sites-enabled

sudo systemctl restart nginx

Step 10
Test your application, remember to put your PDF’s that you want to query in the data directory on the instance and click New Data Update before querying.

Notes:

  • Make sure to open up port 80 in your EC2 Security Group associated to the instance.
  • For HTTPS (TLS) you can use AWS ALB or AWS CloudFront.
  • Depending on how many PDF’s you have, how big the PDF’s are, and your CPU specifications using the New Data Update button can take awhile as it builds your vector embeddings.
  • Any time you add PDF’s or change them make sure to click “New Data Update” to update/build your vector embeddings.
  • This application does not take into consideration security controls, that is your responsibility.
  • Please read Amazon Bedrock FAQ’s for general questions about AWS LLM resources used.

Summary:
Hopefully this blog article helps you create a generative AI PDF query web application which is scoped to your specific data and context leveraging a RAG architecture. Retrieval-augmented generation (RAG) is a powerful architecture choice to both leverage a trained model in this case Claude 3.5 Sonnet, and your organizations specific data (vectorized) without re-training the LLM model, but rather augmenting its knowledge base.

If you like this blog article you may want to read my other article on Building an Asynchronous GenAI Image Generator API.

Comments are closed.