We are currently exploring the role of local knowledge bases (KBs) in RAG (retrieval-augmented generation) AI processing. This post is part of a series documenting our “sandbox” knowledge bases (created over a period of about 20 years) and how we’re using them in various GenAI prototyping projects.
Hypotheses, observations and takeaways of our small language model (SLM) experiments
Our most recent RAG prototype projects were a set of three Python programs involving a small language model (SLM) and our “grocery shopping” knowledge base (KB #1).

Since our current MacBooks are fairly new and have GPUs, we had wanted to test out their ability to process RAG programs locally.
Here’s what we found:
- Being able to observe the small language model (SLM) process locally can be a good educational experience.
- However, frequent updates to currently available GenAI tools, as well as inter-tool incompatibility, make it difficult to manage the software stack on a local laptop.
- Also, response can be slow except where top-of-the-line local hardware with GPUs is available.
- Therefore, we decided to move back to a cloud-based large language model (LLM) processing environment.
Second thoughts about Thonny, our integrated development environment (IDE)
We also had concerns about our Python programming environment (Thonny), which was a little awkward for us to use and not ideal for educational purposes.


Colab, which we’re also familiar with, is free to use for small compute tasks and requires only a web browser to run the programming “notebooks.”
The code is easy to annotate and can be run in small chunks.

Taking our RAG prototyping environment in a new direction
We decided to backtrack a bit and get all of our disruptions over with at the same time, so we repeated our three prior RAG Python processing experiments using a large language model (gpt-3.5-turbo) instead of a small one, and using Colab instead of Thonny.
All of the three repeated RAG experiments used our “Grocery Shopping” knowledge base and had similar programming objectives; they were:
- “RAG processing with a single query”
- “RAG processing with chat”
- “RAG processing with chat and a system prompt”

To run our RAG processing experiments yourself
We hope you’ll find the new and improved versions of our Grocery Shopping-based RAG processing programs interesting and useful to look at.
Best of all, we believe that if you already have a little Colab experience (or you can gain that knowledge elsewhere), you ought to be able to run the shared programs yourself.
After you are already set up in Colab, click these links to access our notebooks on the Colab website:
(These links access our shared notebooks that have already been run and contain output. You can create your own copy, if you’d like to experiment further.)
RAG processing with a single query (shared Colab notebook)
RAG processing with chat (shared Colab notebook)
RAG processing with chat and system prompt (shared Colab notebook)
For more information
To see PDF versions of the Colab-based programs:
RAG_processing with a single query (PDF file)
RAG_processing with chat (PDF file)
RAG_processing with chat and system prompt (PDF file)
To read more about our original Python-based RAG processing programs (posted in November 2024):
RAG processing: Small language model (SLM) with single query (11 Nov 2024)
RAG processing: Small language model (SLM) with chat (15 Nov 2024)
RAG processing: Small language model (SLM) with chat and system prompt (21 Nov 2024)
What’s next?
We’re hopeful that our processing environment has finally stabilized, and that we can get on with many more interesting and enlightening RAG experiments.
We’ll be posting about them as we move forward.