This article will cover building a Generative AI PDF query web application that uses a retrieval-augmented generation (RAG) architecture. The PDF query web application will go through your PDF’s and provide you responses to your questions based on what is in your PDF’s. The importance of using RAG is the ability to scope the results of the generated response from the LLM in our case Claude 3.5 Sonnet with up-to-date, accurate, reliable responses. RAG allows for domain-specific contextually relevant responses tailored to your data rather than static training data.
The PDF query web application will leverage Facebook AI Similarity Search (FAISS) and Amazon Titan Embeddings to create vector representations of unstructured text and the storage/search of those embeddings. LangChain is utilized for the prompt template guiding the models response, RetrievalQA for pertinent data, and various PDF processing tools. We will use Amazon Bedrock to access Claude 3.5 Sonnet and Amazon Titan Embeddings.
Code:
GitHub repo with files referenced in this blog pdf-query-rag-llm-app
Prerequisites:
- Amazon Web Services Account
- Enable Amazon Bedrock Access (Specifically Amazon Titan Embeddings and Claude 3.5 Sonnet) see: Manage access to Amazon Bedrock foundation models
- EC2 Instance Role with AmazonBedrockFullAccess Policy Attached (note you can make this more secure by making a custom policy)
- Verified on EC2 Instance Ubuntu 22.04 and Ubuntu 24.04
- Verified with Python 3.10, 3.11, 3.12
- Virtualenv
- AWS Default Region is set to us-east-1 you can change the region in the
pdf_query_rag_llm_app.py
file underregion_name='us-east-1'
AWS Resource Cost:
As with most AWS services you will incur costs for usage.
EC2 Ubuntu Instance Setup:
(This article assumes you have a ubuntu user with /home/ubuntu)
Step 0
Install some dependencies
sudo apt -y update sudo apt -y install build-essential openssl sudo apt -y install libpq-dev libssl-dev libffi-dev zlib1g-dev sudo apt -y install python3-pip python3-dev sudo apt -y install nginx sudo apt -y install virtualenvwrapper |
Step 1
Clone the GIT Repository
cd /home/ubuntu git clone https://github.com/nethacker/pdf-query-rag-llm-app.git |
Step 2
Setup the Python Environment
virtualenv pdf-query-rag-llm-app_env source pdf-query-rag-llm-app_env/bin/activate |
Step 3
Install the PDF Query RAG LLM application package dependencies
cd /home/ubuntu/pdf-query-rag-llm-app pip install -r requirements.txt |
Step 4
Setup systemd to daemonize and bootstrap the PDF Query RAG-Based LLM APP (Port 8080)
sudo cp systemd/pdf-query-rag-llm-app.service /etc/systemd/system/ sudo systemctl start pdf-query-rag-llm-app sudo systemctl enable pdf-query-rag-llm-app.service |
Step 5
Install NGINX to help scale and handle connections (Port 80)
sudo cp nginx/nginx_pdf-query-rag-llm-app.conf /etc/nginx/sites-available/nginx_pdf-query-rag-llm-app.conf sudo rm /etc/nginx/sites-enabled/default sudo ln -s /etc/nginx/sites-available/nginx_pdf-query-rag-llm-app.conf /etc/nginx/sites-enabled sudo systemctl restart nginx |
Step 6
Test your application, remember to put your PDF’s that you want to query in the data directory on the instance and click New Data Update before querying.
http://{yourhost} |
Notes:
- Make sure to open up port 80 in your EC2 Security Group associated to the instance.
- For HTTPS (TLS) you can use AWS ALB or AWS CloudFront.
- Depending on how many PDF’s you have, how big the PDF’s are, and your CPU specifications using the New Data Update button can take awhile as it builds your vector embeddings.
- Any time you add PDF’s or change them make sure to click “New Data Update” to update/build your vector embeddings.
- This application does not take into consideration security controls, that is your responsibility.
- Please read Amazon Bedrock FAQ’s for general questions about AWS LLM resources used.