This article is a pragmatic guide of how we can leverage Spring AI in real enterprise environments and build intelligent systems to “automate the un-automatable”. In a corporate environment, we can use Spring AI to design end-to-end automation for production pipeline checks and troubleshooting application issues in a human interaction-free way.
When Spring AI first showed up in early 2024, it was promising but clearly early-stage. It made LLM integration easier, but the features weren’t quite robust enough to satisfy real-world requirements. Fast forward to the end of 2025 today, with the project evolving continuously I am gobsmacked with the breakneck speed this framework has matured. Today Spring AI integrates with dozens of LLMs and Vector stores and lets us create RAG pipelines and implement tool calling with some ease. If this sounds like Latin, this article is a great place for you to get started. Spring AI gives Java developers the ability to build applications that can think, reason, and take action.
Before diving into Spring AI and making our Java applications intelligent, it is imperative to understand a few key concepts and terms without which we’ll find ourselves shooting bullets in the dark. The second aspect is to understand the strengths and limitations of LLMs and apply it to our use cases correctly and efficiently. I also want to say that although Spring AI does some heavy lifting under the hood, it is the developer’s job to configure stuff correctly.
Understanding Large Language Models (LLMs)
Let’s go with 2 definitions here. One is from Wikipedia, which is rather scary and feels like Latin:
“A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation”
The second is my own, although not technically accurate, but enough for us to understand the concept in plain English: “An LLM is a piece of magic software that’s not necessarily intelligent, but can get stuff done for you”
Some famous LLMs are GPT by OpenAI, Llama by Meta, Claude by Anthropic, Gemini by Google etc.
Limitations of LLMs and How to Overcome Them
In my own definition of the LLM above, I used the phrase “not necessarily intelligent”. Here’s a justification why:
Questions LLMs Handle Well
- When did Neil Armstrong land on the moon?
- What is the population of Mumbai?
- Which team won the T20 Cricket World Cup in the West Indies in 2024?
- What are the best tourist spots in New York?
Questions LLMs Cannot Answer Without External Data
- What’s the current stock price of Tesla at the New York Stock Exchange?
- What’s the weather like in London today?
- Did my Uber ride reach the roundabout near my street?
- Does my Java application have memory issues?
For the first three questions which the LLM cannot answer, there is something called function calling. For the last question related to memory issues, in real JVM environments we can plug in external memory diagnostic tools like yCrash or GCeasy and Spring AI can tap into that data to automate decisions.
LLMs have incredible capabilities and can make some decisions on their own – we need to empower them, tune them to fit our use case. How to do this is a great question, we will see that in the first part of this article. After this we would have successfully “given a piece of our brain” to the LLM – it now understands our private corporate data, but it’s still probably a brain without arms – it’s still handicapped! It cannot do anything, take any actions based on the data you previously fed it with.
We must empower the LLM to conditionally execute pre-registered actions using Spring AI. If this sounds interesting, refer to the second part of the article. This gives the LLM executional capabilities like calling APIs, sending emails, triggering alert notifications to support teams, making DB insertions, retrying a failed job, flagging non-ideal memory consumption etc. Using the above two mentioned bits in tandem, our Java code can do probably more than you think – a significant milestone in building intelligent apps using Spring Boot.
Why LLMs Need Context and Actions
Lastly, I want to point out one more limitation. LLMs are stateless by default, meaning they don’t retain past conversations unless we explicitly provide that context. So chat memory management is something that needs to be taken care of – Spring AI as always, will make our lives easier here.
Wrapper Applications and How They Work with LLMs
You’ve probably heard or used ChatGPT by now. It is a wrapper application for Open AI models. Wrapper applications are user interfaces which make it easier for users like us to interact with the LLM. Wrapper applications do some magic behind the scenes like web search, chat memory management, prompt relay and response parsing etc.
| Wrapper application | Underlying LLM |
| ChatGPT | Open AI GPTs |
| Claude desktop | Anthropic Sonet |
| LeChat | Mistral AI |
| Perplexity AI | Llama among others |
Core Concepts: Tokens, Prompts, and Chat Memory
Before we start building with Spring AI, it’s important to understand three core concepts that shape how LLMs work: tokens, prompts, and chat memory.
What Tokens Are and Why They Matter
Tokens is a fancy team for units of data exchanged with the LLM. More the tokens, means more the cost for us. It is the exchange mechanism of messages from the user to the LLM and vice versa. Generally, one word would make a token for English language – although I want to call out this is not a thumb rule, not always would you see this. Often we will see a single English word split into 2 tokens. Tokenization is a vast topic in itself and is not in scope of this article. However, to shed some more light on it, we can practically see the tokens in our message by using online token calculators.
Token calculator by OpenAI : https://platform.openai.com/tokenizer
Understanding Prompts and System Messages
Prompts are the natural sentences which the user types into the LLM. This is called a user message. The user message can be enhanced by providing the LLM some more context via a system message. The system message sets the tone of the conversation between the LLM and user. We’ll see more of it in the examples further. The user message along with context together form the prompt. However, we can add a few more things to the prompt. The responses returned to the prompt are typically called assistant messages.
Managing Chat Memory Efficiently in Spring AI
Prompts must be saved in order to save money and reduce costs. Spring AI provides database support using the Spring JDBC chat memory repository. You can save all the user and LLM messages into a relational database table. This reduces token usage and helps retain context across requests.
Enhancing LLM Responses Through Prompt Stuffing
What Prompt Stuffing Is
We can enhance user prompt by providing additional context to the LLM. This context is typically constructed by the data retrieved from semantic search over a vector database. Spring AI comes to our rescue again, providing a fluent API for Prompt stuffing. Prompt templates are easy to create and use, keeps your code clean and makes the prompts reusable.
Creating and Using Prompt Templates in Spring AI
An example system message with prompt stuffing looks like:
You are a helpful assistant, answering questions on my retail website “ABC” based on the given context in the DOCUMENTS section and no prior knowledge. If the answer is not in the DOCUMENTS section, then reply “I do not know the answer to this question” in a polite way.
DOCUMENTS:
—-----------------
{ documents }
—-----------------
The { documents } would be stuffed at run time by running a similarity search over a vector store. If this does not make sense to you, it will simply select all relevant data to the conversation about which the conversation is going to happen.
Prompt stuffing is injected as follows:
@Value("classpath:/promptTemplates/system_prompt_template.st")
private Resource systemPromptTemplate;
Providing Custom Intelligence with RAG Pipelines
At a high level, embeddings or vectors are numerical representations of text. When we pass a sentence or even an entire document to an embedding model, it converts that text into an array of numbers. These numbers capture the meaning of the text in a way that allows computers to compare different pieces of information based on similarity. For example, phrases like “Java memory leak” and “Troubleshooting GC issues” would generate vectors that sit close together in a multidimensional space because their meanings are related.
These arrays of numbers are what we refer to as vectors. A vector is simply a list of float values, usually running into multiple dimensions. Specialized vector databases such as Pinecone, Redis Vector Similarity, Milvus support similarity searches. Postgres has come up with a pgvector extension as well. When a user asks our application a question, Spring AI converts the question into a vector using the same embedding model. It then performs a similarity search in the vector store to locate the closest matching embeddings. The documents associated with those embeddings are retrieved and fed into the LLM as additional context. This is where we connect the dots with prompt stuffing which we learnt earlier. It is important to note that we cannot provide “too much input” to the LLM in our prompt since tokens are expensive and there is a limit to the number of tokens we can pass to a LLM. Thus, we get only relevant information and stuff it in the prompt.
This process forms the backbone of Retrieval-Augmented Generation (RAG). Instead of relying solely on its trained knowledge, the LLM can now answer questions using your private data, be it support tickets, logs, documentation, or product manuals. Spring AI simplifies this workflow by providing convenient abstractions for creating embeddings, storing them, and performing similarity searches through its VectorStore APIs.
Generating embeddings in Spring AI is straightforward. We typically call an embedding client with a piece of text and receive a vector in return. That vector can then be stored in your chosen vector database for future lookups. When we later perform a similarity search, the most relevant vectors are retrieved and injected into our prompt template before sending the final request to the LLM. This ensures that the model responds with contextually correct and domain-specific information. Application logs, GC reports, and heap summaries generated by tools like fastThread, HeapHero, or yCrash can also become part of your RAG knowledge base, letting AI provide richer production insights.
More about RAG :
https://docs.spring.io/spring-ai/reference/api/retrieval-augmented-generation.html
Enabling LLMs to Take Action with Tool Calling
This is one of the most powerful and interesting capabilities Spring AI brings to Java applications. At its core, tool calling gives our LLM the ability to execute predefined actions in our Java codebase. Spring AI makes this feature extremely easy to integrate. We simply annotate a Java method with @Tool, describe what the tool does, and Spring AI automatically exposes that function to the LLM. When the model determines that our method is needed to answer the user’s request, it returns a JSON structure instructing Spring AI to call that tool with the appropriate parameters. Spring AI then executes the method, captures its output, and feeds that output back into the LLM as part of the final response. The entire cycle is handled programmatically, without the developer ever needing to build complex glue code.
For example, in Java systems, you could expose diagnostic tools such as yCrash or GCeasy as safe tool-calls, allowing Spring AI to fetch heap insights, analyze GC behavior, or trigger automated remediation tasks.
More about tool calling : https://docs.spring.io/spring-ai/reference/api/tools.html
Here is how we define our tools:
@Tool(name = "getCurrentLocalTime", description = "Returns the user's current local time")
public String getCurrentLocalTime() {
return LocalDateTime.now().toString();
}
@Tool(name = "getCurrentTimeByZone", description = "Get the current time in a particular time zone")
public String getCurrentTimeByZoneId(@ToolParam(description = "Value representing the time zone") String timeZone) {
return LocalDateTime.now(ZoneId.of(timeZone))
.toString();
}
A custom business logic can be wrapped under a tool as:
@Tool(description = "Fetch the status of the tickets based on a given username")
List<HelpDeskTicket> getTicketStatus(ToolContext toolContext) {
String username = getUsernameFromContext(toolContext);
List<HelpDeskTicket> tickets = helpDeskService.getTicketsByUsername(username);
log.info("Found {} tickets for user: {}", tickets.size(), username);
return tickets;
}
Building a ChatClient to Connect Your Java App with an LLM
Merging all the bits we learnt so far, we can create the ChatClient (an API to talk to the LLM) as:
@Bean("helpDeskChatClient")
public ChatClient helpDeskChatClient(ChatClient.Builder chatClientBuilder,
ChatMemory chatMemory,
HelpDeskTools helpDeskTools) {
return chatClientBuilder
.defaultSystem(systemPromptTemplate)
.defaultTools(helpDeskTools)
.build();
}
Testing Intelligent Java Applications Using Spring AI Evaluators
The Spring AI code written by us would yield responses out of an LLM and be proxied over to our end user. The question here is : “Would the responses of the LLMs be accurate enough and relevant to please our user?” That’s a fair ask and we obviously cannot have a human being check the responses before they are sent – that would be mental. Spring AI comes with an API called “Evaluators” for unit testing LLM responses. The unit tests we write “evaluate” the accuracy of the LLM response. How do they do that – is using another LLM model. Yes, it would mean more tokens and mean more cost. But as far as I see, this is the only way to test the intelligent Java apps you build.
Conclusion: Building Robust Intelligent Systems with Spring AI
In this article, we walked through the core ideas behind LLMs and Spring AI. We covered essential concepts like tokens, prompts, and chat memory, and saw how they come together when integrating LLMs with Spring Boot. With a solid grasp of these fundamentals, businesses can use AI more effectively to innovate, optimize costs, and accelerate their goals. As you continue exploring Spring AI, remember to lean on these foundations to guide your decisions. As enterprises build intelligent systems with Spring AI, combining AI reasoning with observability tools such as yCrash, GCeasy, fastThread, and HeapHero can open the door to automated production diagnostics and faster decision-making.

Share your Thoughts!