Get Started Building AI Apps With Spring AI

Get Started Building AI Apps With Spring AI

Artificial intelligence technologies are advancing at an unprecedented pace, yet until recently, Java developers faced limited options for seamlessly integrating these powerful capabilities directly into their Spring-based applications. Spring AI fundamentally alters this landscape by applying familiar Spring conventions, such as dependency injection and a configuration-first philosophy, to a modern AI development framework, thereby lowering the barrier to entry for millions of developers. This framework abstracts the complexities of interacting with various large language models (LLMs), enabling developers to focus on building innovative features rather than managing provider-specific SDKs. By learning to integrate AI into Spring applications, developers can unlock new possibilities for creating intelligent, responsive, and sophisticated software solutions. The following sections provide a comprehensive guide, starting with a simple example that sends a request to OpenAI, progressing to the use of prompt templates for handling user-generated queries, and culminating in an exploration of retrieval augmented generation (RAG) using a vector store to manage and query external documents.

1. An Introduction to The Spring AI Framework

Spring AI, which began as a project in 2023 and reached its first milestone version in early 2024 before its general availability release in May 2025, is designed to abstract the intricate processes involved in communicating with large language models. This abstraction is conceptually similar to how Spring Data simplifies database access procedures, allowing developers to work with a consistent API regardless of the underlying data source. The framework provides high-level abstractions for managing prompts, selecting different AI models, and processing the responses generated by these models. It includes robust support for a wide array of AI providers, including industry leaders like OpenAI and Anthropic, as well as open-source alternatives such as Hugging Face and Ollama for running local LLMs. This multi-provider support ensures that applications are not locked into a single vendor, offering flexibility and future-proofing against the rapidly changing AI ecosystem. By handling the low-level details of API calls, authentication, and data formatting, Spring AI empowers developers to integrate advanced AI functionalities with minimal boilerplate code, accelerating the development lifecycle and fostering innovation within the Java community.

The operational model of Spring AI allows developers to effortlessly switch between different AI providers by simply modifying configuration properties, a hallmark of the Spring ecosystem’s flexibility. In practice, a developer configures the necessary AI resources within the application’s application.yaml or application.properties file, specifying details like API keys and model preferences. Following this configuration, Spring’s dependency injection container automatically wires in beans that provide standard, pre-defined interfaces for AI operations. Developers then write their application logic against these consistent interfaces, such as ChatClient for conversational AI tasks. Spring transparently manages all the backend complexities of interacting with the specific models chosen in the configuration. This decoupling of application code from the AI provider’s implementation means that a project can be developed and tested using a local model via Ollama and later deployed to production using a more powerful, cloud-based model from OpenAI, with the only required change being a few lines in a properties file. This powerful approach significantly enhances code maintainability, testability, and adaptability in AI-driven projects.

2. Building a Foundational Spring Application to Query OpenAI

To begin building a simple Spring MVC application that can communicate with an external AI provider, the first step involves setting up the project and configuring the LLM provider. This can be achieved by creating a new project and including the necessary dependencies for “Spring Web” and “OpenAI.” The core of the configuration resides in the application.yaml file, where an ai section is defined under the main spring block. Within this section, a subsection for openai is created to hold provider-specific settings. A critical property is the api-key, which should be securely managed, for instance, by referencing an environment variable like ${OPENAI_API_KEY}. Additionally, chat options must be specified, the most important of which is the model. While the default might be a cost-effective model like gpt-4o-mini, a more advanced model such as gpt-5 can be selected for tasks requiring structured reasoning or multi-step logic. Other common configuration options include maxTokens to limit response length and temperature, which controls the randomness of the output. A low temperature value (e.g., 0.3) produces more deterministic and repeatable responses, ideal for code implementation, while a higher value (e.g., 0.7) encourages creativity, suitable for design tasks. It is important to note that certain models may have specific requirements; for example, gpt-5 requires a temperature of 1, and the framework will raise an error if a different value is set.

Once the application is configured, the next phase is to build the service layer that will handle the interaction with the LLM. Because the OpenAI details are present in the application.yaml file, Spring’s auto-configuration mechanism automatically creates a ChatClient.Builder bean. This builder can be injected directly into a service class, such as SpringAIService, and used to construct a ChatClient instance, which serves as the primary interface for interacting with chat-based models. To send a query, the ChatClient‘s prompt() method is invoked, passing the user’s question as a string. This method returns a ChatClientRequestSpec instance, which allows for further configuration of the LLM call. For a straightforward request, the call() method is invoked to dispatch the message to the LLM. The call() method returns a CallResponseSpec, from which the response can be extracted in two primary ways: as a raw text string by calling the content() method, or by mapping the response directly to a Java record or class by invoking the entity() method. To expose this functionality, a SpringAiController is created with a PostMapping endpoint, for example, at /simpleQuery. This controller accepts a request body, passes the query to the SpringAIService, and returns the structured response, completing the request-response cycle.

3. Handling Dynamic User Input with Prompt Templates

While sending a simple string to an LLM is a good starting point, real-world business applications often require more structured and controllable prompts that can incorporate user-specified parameters. This is where prompt templates become indispensable. Spring AI provides robust support for this pattern through its PromptTemplate class. Although templates can be defined inline within the code, the established convention is to create them as separate files with an .st extension located in the src/resources/templates directory. This practice promotes the separation of concerns, making prompts easier to manage, version, and internationalize. For example, to create a prompt that generates a joke based on user input, a file named joke-template.st could contain the text: Tell me a {type} joke about {topic}. Here, {type} and {topic} are placeholders that will be dynamically replaced with user-provided values, such as “silly” and “Java.” To use this template within the application, it can be loaded into a service class as a Resource using Spring’s @Value annotation with a classpath reference, such as @Value("classpath:/templates/joke-template.st"). This approach keeps the prompt logic external to the Java code, enhancing flexibility and maintainability.

With the template file created and referenced in the service, the next step is to integrate it into the application’s logic to process user input. A new method, such as tellMeAJoke(String type, String topic), can be added to the service class. Inside this method, a new PromptTemplate is instantiated using the Resource that was injected via the @Value annotation. To populate the placeholders, a Map is created where the keys correspond to the variable names in the template ("type", "topic") and the values are the user-provided arguments. This map is passed to the PromptTemplate‘s create() method, which returns a fully formed Prompt object. This Prompt object is then passed to the ChatClient‘s prompt() method instead of a raw string. The rest of the call chain remains the same: the call() method sends the request, and the entity() method maps the LLM’s response to a corresponding Java record like JokeResponse. To expose this functionality through an API, the controller is updated with a new PostMapping endpoint, such as /tellMeAJoke, which accepts a request body containing the type and topic fields. This entire workflow provides a scalable and robust pattern for building applications that can generate highly customized and context-aware responses from an LLM based on user interactions.

4. Implementing Retrieval Augmented Generation with Spring AI

Beyond basic queries and templated prompts, a significantly more powerful application of LLMs involves retrieval augmented generation (RAG), a technique that enables models to answer questions about proprietary or domain-specific information they were not originally trained on. The concept is straightforward: instead of relying solely on its internal knowledge, the LLM is provided with relevant context from external documents alongside the user’s question. A typical prompt structure for RAG would be: Use the following context to answer the user's question. If the question cannot be answered from the context, state that clearly. Context: {context} Question: {question}. The primary challenge in this process is efficiently storing and retrieving the correct context from a potentially vast corpus of documents, such as an internal corporate knowledge base. Sending thousands of pages to the LLM for every query would be prohibitively expensive and would likely exceed token limits. This is where a vector store becomes essential. A vector store is a specialized database that uses an embedding algorithm to convert documents into multi-dimensional numerical vectors. When a user asks a question, their query is also converted into a vector, and the store performs a similarity search to find the document chunks whose vectors are mathematically closest to the query vector. These top-matching chunks are then supplied as the context, effectively giving the LLM a highly relevant “cheat sheet” to answer the question accurately.

To implement RAG in a Spring AI application, the first step is to configure a vector store. For development and testing, Spring AI provides the SimpleVectorStore, an in-memory implementation that simplifies setup. This can be configured by creating a Spring @Configuration class that defines a SimpleVectorStore bean. This configuration bean will automatically receive an EmbeddingModel from Spring’s auto-configuration. Inside the bean definition method, the code can scan the classpath for document files (e.g., text files in a src/resources/documents directory), read their contents using Spring’s TextReader, and add the resulting Document objects to the vector store instance. While SimpleVectorStore is suitable for testing, Spring AI also supports production-grade vector stores like Pinecone or Qdrant, which can be configured directly in the application.yaml file. After setting up the store, a new service, such as SpringAIRagService, is created. This service injects both the ChatClient and the configured VectorStore. Its core logic resides in a query() method that takes a user’s question, builds a SearchRequest specifying the query and the number of desired results (e.g., topK(3)), and executes a similaritySearch() on the vector store. The text from the returned documents is then concatenated to form the context string for the RAG prompt.

With the service logic in place, the final steps involve creating the RAG prompt template and exposing the functionality through a controller. A rag-template.st file is created with the structure that instructs the LLM to use the provided context to answer the user’s question and to explicitly state if an answer cannot be found within that context. This instruction is crucial to prevent the LLM from “hallucinating” or falling back on its general training data. The SpringAIRagService uses this template along with the retrieved context and the original question to construct the final Prompt. This Prompt is then sent to the ChatClient to get the answer. A corresponding SpringAIRagController is implemented with a PostMapping endpoint, like /springAIQuestion, which accepts the user’s question and delegates it to the service. A critical aspect of this implementation is validation. Testing the system with a question that can be answered from the loaded documents should yield an accurate, context-derived response. Conversely, asking a question outside the scope of the documents, such as “Who created Java?”, should result in the LLM correctly reporting that the information is not available in the provided context. This confirms that the RAG pipeline is working as intended, using only the supplied information to generate its answers.

From Foundation to Advanced AI Integration

This exploration introduced the core principles of using Spring AI to integrate large language model capabilities into Spring-based applications. The journey began with the fundamental steps of configuring an LLM provider and making simple queries, then progressed to using prompt templates to handle dynamic user input, and finally culminated in the implementation of a retrieval augmented generation service using a vector store. The examples demonstrated how Spring AI provides a powerful abstraction layer, allowing developers to interact with complex AI technologies through familiar Spring paradigms, much like Spring Data abstracts database interactions. This approach significantly lowers the barrier to entry for Java developers looking to build sophisticated AI-powered features. The foundational knowledge acquired through these exercises provided the necessary skills to configure, access, and leverage LLMs within a structured application context. With this groundwork established, developers were positioned to explore more advanced AI programming paradigms, such as building autonomous AI agents capable of enhancing and automating complex business processes, moving beyond simple question-answering systems to create truly intelligent applications.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later