Accessing the Cluster
All RAG components run inside the llama-stack-rag namespace on OpenShift.
This page walks through what you should see for pods, services, and routes — and how to reach the chatbot UI.
Checking Pods
Run the following to list all pods in the namespace:
oc get pods -n llama-stack-rag
A healthy cluster will show output similar to the following.
Every pod should be in Running status (or Completed for one-shot Job pods):
NAME READY STATUS RESTARTS AGE
llamastack-ui-xxxxxxxxx-xxxxx 1/1 Running 0 10m
llamastack-xxxxxxxxx-xxxxx 1/1 Running 0 10m
postgresql-xxxxxxxxx-xxxxx 1/1 Running 0 20m
rag-ingest-pipeline-xxxxx 0/1 Completed 0 12m
| Pod name contains | Purpose |
|---|---|
|
Streamlit chat interface served to the user |
|
LlamaStack server — orchestrates the LLM, embeddings, and tools |
|
KServe / vLLM model server for the generation model (e.g., |
|
KServe / vLLM model server for the guard/shield model (e.g., |
|
PostgreSQL + PGVector — stores document embeddings for the vector store |
|
Kubeflow Pipeline job that ingested the initial FantaCo documents — |
|
The predictor pods show |
Checking Services
Services expose each component internally within the cluster:
oc get svc -n llama-stack-rag
Expected output:
NAME TYPE CLUSTER-IP PORT(S) AGE
llamastack ClusterIP 172.30.x.x 5000/TCP 20m
llamastack-ui ClusterIP 172.30.x.x 8501/TCP 20m
llama-3-2-3b-instruct ClusterIP 172.30.x.x 8080/TCP 20m
postgresql ClusterIP 172.30.x.x 5432/TCP 20m
| Service | Purpose |
|---|---|
|
Internal REST API for the LlamaStack server (port 5000) |
|
Internal HTTP endpoint for the Streamlit UI (port 8501) |
|
vLLM inference endpoint for the generation model |
|
vLLM inference endpoint for the safety/shield model (if enabled) |
|
Internal PostgreSQL + PGVector endpoint (port 5432) |
Checking Routes
Routes expose the UI and API to the outside world. Retrieve them with:
oc get routes -n llama-stack-rag
Expected output:
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
llamastack llamastack-llama-stack-rag.apps.<cluster-domain> llamastack 5000 edge None
llamastack-ui llamastack-ui-llama-stack-rag.apps.<cluster-domain> llamastack-ui 8501 edge None
| Route | Purpose |
|---|---|
|
This is the URL you open in your browser. It serves the Streamlit chat interface over HTTPS. |
|
The LlamaStack REST API — useful for direct API access, scripting, or connecting external clients. |
Open the HOST/PORT value for llamastack-ui in your browser to launch the chat interface.
What to Expect in the UI
When the Streamlit UI loads you will see:
-
A chat input box at the bottom of the screen for submitting questions
-
A left sidebar with configuration options:
-
Knowledge Base — select which vector store collection to query
-
Mode — toggle between Chat (simple RAG) and Agent (agentic RAG)
-
Sampling parameters — adjust temperature, top-p, and max tokens
-
System prompt — override the default system prompt
-
The application comes pre-loaded with FantaCo’s internal HR, procurement, sales, and IT documents so you can begin asking questions immediately.