A Singapore Government Agency Website

Abstract Architecture for LLM Application

In developing our abstract architecture for generic LLM applications below, we integrated the Gartner reference architecture1 and traditional IT components. This abstract architecture comprises a development and a runtime environment. Fine-tuning of LLM using training data takes place in the development environment, while deployment and utilisation of LLM take place in the runtime environment. To comprehensively cover typical LLM use cases, we also integrated retrieval augmented generation (RAG) components and LLM agents into the abstract LLM application architecture. Note that the abstract LLM application architecture is a conceptual diagram and not a blueprint.

Fig 1: Abstract Architecture for LLM Applications
Fig 1: Abstract Architecture for LLM Applications

The cybersecurity controls specific to LLMs are organised by architecture components in this playbook. The architecture components of LLM applications are labelled in black shaded boxes within bracket, as shown in Fig 1.

1Dennis Xu, Kevin Schmidt (2024, Jan 17). Gartner: Generative AI Adoption: Top Security Threats, Risks and Mitigations.

Architecture components of LLM Applications

ED Engineering device
TD Training data
FM Foundation model
TM Fine-tuned model
RM Runtime model
VS Vector store
LA LLM interface application
RL Retrieval and LLM agent

Government Agencies may refer to existing IM8 security policies for cybersecurity controls applicable to the traditional IT components of the LLM Applications, demarcated with dotted lines in Fig 1. For the same IT components, other organisations may refer NIST 800-53, CIS Benchmarks, ISO27001 and ISO27002. In addition, LLM components which are required to be run on accelerated (high performance) compute are demarcated in orange shaded boxes in Fig 1.

Additional Note of Architecture Components of LLM Applications

Foundation LLM refers to a pre-trained model that can be used as a baseline for deployment, or for further fine-tuning.
Fine-tuned LLM refers to LLM that has undergone training on a specific dataset or task.
Training Data refers to the dataset that is required for fine-tuning the LLM.
Secure Proxy refers to API Gateway or Application Load Balancer fronted by WAF, Message Queue, File transfer Service, etc.
Runtime LLM refers to LLM that the Agency deploys and retains full control over, which can alleviate privacy concerns, unlike Model-as-a-Service LLM.
Retrieval
(Query/Response)
responds to the user prompt and improve the accuracy of user prompt by integrating knowledge from LLM, Vector Store, Internet Resource, and Intranet Resource (e.g., knowledge base).
LLM Agent refers to the intermediary application that is used to invoke third party services to perform a specific task. For example, execute a line of code.
LLM User Interface Application refers to the intermediary application through which users communicate with the LLM.
Internet Resource refers to whitelisted internet accessible data sources for access by the Retrieval.
Intranet Resource refers to whitelisted Government Enterprise Network (GEN)-accessible data sources for access by the Retrieval.
Model as a Service refers to machine learning model that is offered as a cloud service.
Accelerated Compute refers to the underlying high-performance computing infrastructure required for running LLMs and Training Data.
Chat with me!
Bot Launcher