Accessing the Cluster

All RAG components run inside the llama-stack-rag namespace on OpenShift. This page walks through what you should see for pods, services, and routes — and how to reach the chatbot UI.

Checking Pods

Run the following to list all pods in the namespace:

oc get pods -n llama-stack-rag

A healthy cluster will show output similar to the following. Every pod should be in Running status (or Completed for one-shot Job pods):

NAME                                        READY   STATUS      RESTARTS   AGE
llamastack-ui-xxxxxxxxx-xxxxx               1/1     Running     0          10m
llamastack-xxxxxxxxx-xxxxx                  1/1     Running     0          10m
postgresql-xxxxxxxxx-xxxxx                  1/1     Running     0          20m
rag-ingest-pipeline-xxxxx                   0/1     Completed   0          12m

Table 1. Key pods to verify
Pod name contains	Purpose
`llamastack-ui`	Streamlit chat interface served to the user
`llamastack`	LlamaStack server — orchestrates the LLM, embeddings, and tools
`predictor` (LLM)	KServe / vLLM model server for the generation model (e.g., `Llama-3.2-3B-Instruct`)
`predictor` (safety)	KServe / vLLM model server for the guard/shield model (e.g., `Llama-Guard-3-8B`) — only present if safety was enabled
`postgresql`	PostgreSQL + PGVector — stores document embeddings for the vector store
`rag-ingest-pipeline`	Kubeflow Pipeline job that ingested the initial FantaCo documents — `Completed` is the expected final state

The predictor pods show 2/2 ready in the default configuration. If the system was configured with RAW_DEPLOYMENT=false, expect 3/3. The llamastack pod will not reach Running until all predictor pods are healthy.

Filtering by Component

To focus on just the model server pods:

oc get pods -l component=predictor -n llama-stack-rag

To check the LlamaStack orchestration pod specifically:

oc get pods -l app.kubernetes.io/name=llamastack -n llama-stack-rag

Checking Services

Services expose each component internally within the cluster:

oc get svc -n llama-stack-rag

Expected output:

NAME                            TYPE        CLUSTER-IP       PORT(S)     AGE
llamastack                      ClusterIP   172.30.x.x       5000/TCP    20m
llamastack-ui                   ClusterIP   172.30.x.x       8501/TCP    20m
llama-3-2-3b-instruct           ClusterIP   172.30.x.x       8080/TCP    20m
postgresql                      ClusterIP   172.30.x.x       5432/TCP    20m

Table 2. What each service does
Service	Purpose
`llamastack`	Internal REST API for the LlamaStack server (port 5000)
`llamastack-ui`	Internal HTTP endpoint for the Streamlit UI (port 8501)
`llama-3-2-3b-instruct`	vLLM inference endpoint for the generation model
`llama-guard-3-8b`	vLLM inference endpoint for the safety/shield model (if enabled)
`postgresql`	Internal PostgreSQL + PGVector endpoint (port 5432)

Checking Routes

Routes expose the UI and API to the outside world. Retrieve them with:

oc get routes -n llama-stack-rag

Expected output:

NAME              HOST/PORT                                                    PATH   SERVICES        PORT    TERMINATION   WILDCARD
llamastack        llamastack-llama-stack-rag.apps.<cluster-domain>                    llamastack      5000    edge          None
llamastack-ui     llamastack-ui-llama-stack-rag.apps.<cluster-domain>                llamastack-ui   8501    edge          None

Table 3. Route breakdown
Route	Purpose
`llamastack-ui`	This is the URL you open in your browser. It serves the Streamlit chat interface over HTTPS.
`llamastack`	The LlamaStack REST API — useful for direct API access, scripting, or connecting external clients.

Open the HOST/PORT value for llamastack-ui in your browser to launch the chat interface.

What to Expect in the UI

When the Streamlit UI loads you will see:

A chat input box at the bottom of the screen for submitting questions
A left sidebar with configuration options:
- Knowledge Base — select which vector store collection to query
- Mode — toggle between Chat (simple RAG) and Agent (agentic RAG)
- Sampling parameters — adjust temperature, top-p, and max tokens
- System prompt — override the default system prompt

The application comes pre-loaded with FantaCo’s internal HR, procurement, sales, and IT documents so you can begin asking questions immediately.