Accessing the Cluster

All RAG components run inside the llama-stack-rag namespace on OpenShift. This page walks through what you should see for pods, services, and routes — and how to reach the chatbot UI.

Checking Pods

Run the following to list all pods in the namespace:

oc get pods -n llama-stack-rag

A healthy cluster will show output similar to the following. Every pod should be in Running status (or Completed for one-shot Job pods):

NAME                                        READY   STATUS      RESTARTS   AGE
llamastack-ui-xxxxxxxxx-xxxxx               1/1     Running     0          10m
llamastack-xxxxxxxxx-xxxxx                  1/1     Running     0          10m
postgresql-xxxxxxxxx-xxxxx                  1/1     Running     0          20m
rag-ingest-pipeline-xxxxx                   0/1     Completed   0          12m
Table 1. Key pods to verify
Pod name contains Purpose

llamastack-ui

Streamlit chat interface served to the user

llamastack

LlamaStack server — orchestrates the LLM, embeddings, and tools

predictor (LLM)

KServe / vLLM model server for the generation model (e.g., Llama-3.2-3B-Instruct)

predictor (safety)

KServe / vLLM model server for the guard/shield model (e.g., Llama-Guard-3-8B) — only present if safety was enabled

postgresql

PostgreSQL + PGVector — stores document embeddings for the vector store

rag-ingest-pipeline

Kubeflow Pipeline job that ingested the initial FantaCo documents — Completed is the expected final state

The predictor pods show 2/2 ready in the default configuration. If the system was configured with RAW_DEPLOYMENT=false, expect 3/3. The llamastack pod will not reach Running until all predictor pods are healthy.

Filtering by Component

To focus on just the model server pods:

oc get pods -l component=predictor -n llama-stack-rag

To check the LlamaStack orchestration pod specifically:

oc get pods -l app.kubernetes.io/name=llamastack -n llama-stack-rag

Checking Services

Services expose each component internally within the cluster:

oc get svc -n llama-stack-rag

Expected output:

NAME                            TYPE        CLUSTER-IP       PORT(S)     AGE
llamastack                      ClusterIP   172.30.x.x       5000/TCP    20m
llamastack-ui                   ClusterIP   172.30.x.x       8501/TCP    20m
llama-3-2-3b-instruct           ClusterIP   172.30.x.x       8080/TCP    20m
postgresql                      ClusterIP   172.30.x.x       5432/TCP    20m
Table 2. What each service does
Service Purpose

llamastack

Internal REST API for the LlamaStack server (port 5000)

llamastack-ui

Internal HTTP endpoint for the Streamlit UI (port 8501)

llama-3-2-3b-instruct

vLLM inference endpoint for the generation model

llama-guard-3-8b

vLLM inference endpoint for the safety/shield model (if enabled)

postgresql

Internal PostgreSQL + PGVector endpoint (port 5432)

Checking Routes

Routes expose the UI and API to the outside world. Retrieve them with:

oc get routes -n llama-stack-rag

Expected output:

NAME              HOST/PORT                                                    PATH   SERVICES        PORT    TERMINATION   WILDCARD
llamastack        llamastack-llama-stack-rag.apps.<cluster-domain>                    llamastack      5000    edge          None
llamastack-ui     llamastack-ui-llama-stack-rag.apps.<cluster-domain>                llamastack-ui   8501    edge          None
Table 3. Route breakdown
Route Purpose

llamastack-ui

This is the URL you open in your browser. It serves the Streamlit chat interface over HTTPS.

llamastack

The LlamaStack REST API — useful for direct API access, scripting, or connecting external clients.

Open the HOST/PORT value for llamastack-ui in your browser to launch the chat interface.

What to Expect in the UI

When the Streamlit UI loads you will see:

  • A chat input box at the bottom of the screen for submitting questions

  • A left sidebar with configuration options:

    • Knowledge Base — select which vector store collection to query

    • Mode — toggle between Chat (simple RAG) and Agent (agentic RAG)

    • Sampling parameters — adjust temperature, top-p, and max tokens

    • System prompt — override the default system prompt

The application comes pre-loaded with FantaCo’s internal HR, procurement, sales, and IT documents so you can begin asking questions immediately.