Spring AI - Building intelligent apps in Java

Page Contents

This article is a pragmatic guide of how we can leverage Spring AI in real enterprise environments and build intelligent systems to “automate the un-automatable”. In a corporate environment, we can use Spring AI to design end-to-end automation for production pipeline checks and troubleshooting application issues in a human interaction-free way.

When Spring AI first showed up in early 2024, it was promising but clearly early-stage. It made LLM integration easier, but the features weren’t quite robust enough to satisfy real-world requirements. Fast forward to the end of 2025 today, with the project evolving continuously I am gobsmacked with the breakneck speed this framework has matured. Today Spring AI integrates with dozens of LLMs and Vector stores and lets us create RAG pipelines and implement tool calling with some ease. If this sounds like Latin, this article is a great place for you to get started. Spring AI gives Java developers the ability to build applications that can think, reason, and take action.

Before diving into Spring AI and making our Java applications intelligent, it is imperative to understand a few key concepts and terms without which we’ll find ourselves shooting bullets in the dark. The second aspect is to understand the strengths and limitations of LLMs and apply it to our use cases correctly and efficiently. I also want to say that although Spring AI does some heavy lifting under the hood, it is the developer’s job to configure stuff correctly.

Understanding Large Language Models (LLMs)

Let’s go with 2 definitions here. One is from Wikipedia, which is rather scary and feels like Latin:

“A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation”

The second is my own, although not technically accurate, but enough for us to understand the concept in plain English: “An LLM is a piece of magic software that’s not necessarily intelligent, but can get stuff done for you”

Some famous LLMs are GPT by OpenAI, Llama by Meta, Claude by Anthropic, Gemini by Google etc.

Limitations of LLMs and How to Overcome Them

In my own definition of the LLM above, I used the phrase “not necessarily intelligent”. Here’s a justification why:

Questions LLMs Handle Well

When did Neil Armstrong land on the moon?
What is the population of Mumbai?
Which team won the T20 Cricket World Cup in the West Indies in 2024?
What are the best tourist spots in New York?

Questions LLMs Cannot Answer Without External Data

What’s the current stock price of Tesla at the New York Stock Exchange?
What’s the weather like in London today?
Did my Uber ride reach the roundabout near my street?
Does my Java application have memory issues?

For the first three questions which the LLM cannot answer, there is something called function calling. For the last question related to memory issues, in real JVM environments we can plug in external memory diagnostic tools like yCrash or GCeasy and Spring AI can tap into that data to automate decisions.

LLMs have incredible capabilities and can make some decisions on their own – we need to empower them, tune them to fit our use case. How to do this is a great question, we will see that in the first part of this article. After this we would have successfully “given a piece of our brain” to the LLM – it now understands our private corporate data, but it’s still probably a brain without arms – it’s still handicapped! It cannot do anything, take any actions based on the data you previously fed it with.

We must empower the LLM to conditionally execute pre-registered actions using Spring AI. If this sounds interesting, refer to the second part of the article. This gives the LLM executional capabilities like calling APIs, sending emails, triggering alert notifications to support teams, making DB insertions, retrying a failed job, flagging non-ideal memory consumption etc. Using the above two mentioned bits in tandem, our Java code can do probably more than you think – a significant milestone in building intelligent apps using Spring Boot.

Why LLMs Need Context and Actions

Lastly, I want to point out one more limitation. LLMs are stateless by default, meaning they don’t retain past conversations unless we explicitly provide that context. So chat memory management is something that needs to be taken care of – Spring AI as always, will make our lives easier here.

Wrapper Applications and How They Work with LLMs

You’ve probably heard or used ChatGPT by now. It is a wrapper application for Open AI models. Wrapper applications are user interfaces which make it easier for users like us to interact with the LLM. Wrapper applications do some magic behind the scenes like web search, chat memory management, prompt relay and response parsing etc.

Wrapper application	Underlying LLM
ChatGPT	Open AI GPTs
Claude desktop	Anthropic Sonet
LeChat	Mistral AI
Perplexity AI	Llama among others

Core Concepts: Tokens, Prompts, and Chat Memory

Before we start building with Spring AI, it’s important to understand three core concepts that shape how LLMs work: tokens, prompts, and chat memory.

What Tokens Are and Why They Matter

Tokens is a fancy team for units of data exchanged with the LLM. More the tokens, means more the cost for us. It is the exchange mechanism of messages from the user to the LLM and vice versa. Generally, one word would make a token for English language – although I want to call out this is not a thumb rule, not always would you see this. Often we will see a single English word split into 2 tokens. Tokenization is a vast topic in itself and is not in scope of this article. However, to shed some more light on it, we can practically see the tokens in our message by using online token calculators.

Token calculator by OpenAI : https://platform.openai.com/tokenizer

Understanding Prompts and System Messages

Prompts are the natural sentences which the user types into the LLM. This is called a user message. The user message can be enhanced by providing the LLM some more context via a system message. The system message sets the tone of the conversation between the LLM and user. We’ll see more of it in the examples further. The user message along with context together form the prompt. However, we can add a few more things to the prompt. The responses returned to the prompt are typically called assistant messages.

Managing Chat Memory Efficiently in Spring AI

Prompts must be saved in order to save money and reduce costs. Spring AI provides database support using the Spring JDBC chat memory repository. You can save all the user and LLM messages into a relational database table. This reduces token usage and helps retain context across requests.

Enhancing LLM Responses Through Prompt Stuffing

What Prompt Stuffing Is

We can enhance user prompt by providing additional context to the LLM. This context is typically constructed by the data retrieved from semantic search over a vector database. Spring AI comes to our rescue again, providing a fluent API for Prompt stuffing. Prompt templates are easy to create and use, keeps your code clean and makes the prompts reusable.

Creating and Using Prompt Templates in Spring AI

An example system message with prompt stuffing looks like:

You are a helpful assistant, answering questions on my retail website “ABC” based on the given context in the DOCUMENTS section and no prior knowledge. If the answer is not in the DOCUMENTS section, then reply “I do not know the answer to this question” in a polite way.

DOCUMENTS:
—-----------------
{ documents }
—-----------------

The { documents } would be stuffed at run time by running a similarity search over a vector store. If this does not make sense to you, it will simply select all relevant data to the conversation about which the conversation is going to happen.

Prompt stuffing is injected as follows:

@Value("classpath:/promptTemplates/system_prompt_template.st")
private Resource systemPromptTemplate;

Providing Custom Intelligence with RAG Pipelines

At a high level, embeddings or vectors are numerical representations of text. When we pass a sentence or even an entire document to an embedding model, it converts that text into an array of numbers. These numbers capture the meaning of the text in a way that allows computers to compare different pieces of information based on similarity. For example, phrases like “Java memory leak” and “Troubleshooting GC issues” would generate vectors that sit close together in a multidimensional space because their meanings are related.

These arrays of numbers are what we refer to as vectors. A vector is simply a list of float values, usually running into multiple dimensions. Specialized vector databases such as Pinecone, Redis Vector Similarity, Milvus support similarity searches. Postgres has come up with a pgvector extension as well. When a user asks our application a question, Spring AI converts the question into a vector using the same embedding model. It then performs a similarity search in the vector store to locate the closest matching embeddings. The documents associated with those embeddings are retrieved and fed into the LLM as additional context. This is where we connect the dots with prompt stuffing which we learnt earlier. It is important to note that we cannot provide “too much input” to the LLM in our prompt since tokens are expensive and there is a limit to the number of tokens we can pass to a LLM. Thus, we get only relevant information and stuff it in the prompt.

This process forms the backbone of Retrieval-Augmented Generation (RAG). Instead of relying solely on its trained knowledge, the LLM can now answer questions using your private data, be it support tickets, logs, documentation, or product manuals. Spring AI simplifies this workflow by providing convenient abstractions for creating embeddings, storing them, and performing similarity searches through its VectorStore APIs.

Generating embeddings in Spring AI is straightforward. We typically call an embedding client with a piece of text and receive a vector in return. That vector can then be stored in your chosen vector database for future lookups. When we later perform a similarity search, the most relevant vectors are retrieved and injected into our prompt template before sending the final request to the LLM. This ensures that the model responds with contextually correct and domain-specific information. Application logs, GC reports, and heap summaries generated by tools like fastThread, HeapHero, or yCrash can also become part of your RAG knowledge base, letting AI provide richer production insights.

More about RAG :

https://docs.spring.io/spring-ai/reference/api/retrieval-augmented-generation.html

Enabling LLMs to Take Action with Tool Calling

This is one of the most powerful and interesting capabilities Spring AI brings to Java applications. At its core, tool calling gives our LLM the ability to execute predefined actions in our Java codebase. Spring AI makes this feature extremely easy to integrate. We simply annotate a Java method with @Tool, describe what the tool does, and Spring AI automatically exposes that function to the LLM. When the model determines that our method is needed to answer the user’s request, it returns a JSON structure instructing Spring AI to call that tool with the appropriate parameters. Spring AI then executes the method, captures its output, and feeds that output back into the LLM as part of the final response. The entire cycle is handled programmatically, without the developer ever needing to build complex glue code.

For example, in Java systems, you could expose diagnostic tools such as yCrash or GCeasy as safe tool-calls, allowing Spring AI to fetch heap insights, analyze GC behavior, or trigger automated remediation tasks.

More about tool calling : https://docs.spring.io/spring-ai/reference/api/tools.html

Here is how we define our tools:

@Tool(name = "getCurrentLocalTime", description = "Returns the user's current local time")
public String getCurrentLocalTime() {
    return LocalDateTime.now().toString();
}

@Tool(name = "getCurrentTimeByZone", description = "Get the current time in a particular time zone")
public String getCurrentTimeByZoneId(@ToolParam(description = "Value representing the time zone") String timeZone) {
    return LocalDateTime.now(ZoneId.of(timeZone))
            .toString();
}

A custom business logic can be wrapped under a tool as:

@Tool(description = "Fetch the status of the tickets based on a given username")
List<HelpDeskTicket> getTicketStatus(ToolContext toolContext) {
    String username = getUsernameFromContext(toolContext);
    List<HelpDeskTicket> tickets = helpDeskService.getTicketsByUsername(username);
    log.info("Found {} tickets for user: {}", tickets.size(), username);
    return tickets;
}

Building a ChatClient to Connect Your Java App with an LLM

Merging all the bits we learnt so far, we can create the ChatClient (an API to talk to the LLM) as:

@Bean("helpDeskChatClient")
public ChatClient helpDeskChatClient(ChatClient.Builder chatClientBuilder,
                                     ChatMemory chatMemory,
                                     HelpDeskTools helpDeskTools) {
    return chatClientBuilder
            .defaultSystem(systemPromptTemplate)
            .defaultTools(helpDeskTools)
            .build();
}

Testing Intelligent Java Applications Using Spring AI Evaluators

The Spring AI code written by us would yield responses out of an LLM and be proxied over to our end user. The question here is : “Would the responses of the LLMs be accurate enough and relevant to please our user?” That’s a fair ask and we obviously cannot have a human being check the responses before they are sent – that would be mental. Spring AI comes with an API called “Evaluators” for unit testing LLM responses. The unit tests we write “evaluate” the accuracy of the LLM response. How do they do that – is using another LLM model. Yes, it would mean more tokens and mean more cost. But as far as I see, this is the only way to test the intelligent Java apps you build.

Conclusion: Building Robust Intelligent Systems with Spring AI

In this article, we walked through the core ideas behind LLMs and Spring AI. We covered essential concepts like tokens, prompts, and chat memory, and saw how they come together when integrating LLMs with Spring Boot. With a solid grasp of these fundamentals, businesses can use AI more effectively to innovate, optimize costs, and accelerate their goals. As you continue exploring Spring AI, remember to lean on these foundations to guide your decisions. As enterprises build intelligent systems with Spring AI, combining AI reasoning with observability tools such as yCrash, GCeasy, fastThread, and HeapHero can open the door to automated production diagnostics and faster decision-making.

Spring AI – Building intelligent apps in Java

Understanding Large Language Models (LLMs)

Limitations of LLMs and How to Overcome Them

Questions LLMs Handle Well

Questions LLMs Cannot Answer Without External Data

Why LLMs Need Context and Actions

Wrapper Applications and How They Work with LLMs

Core Concepts: Tokens, Prompts, and Chat Memory

What Tokens Are and Why They Matter

Understanding Prompts and System Messages

Managing Chat Memory Efficiently in Spring AI

Enhancing LLM Responses Through Prompt Stuffing

What Prompt Stuffing Is

Creating and Using Prompt Templates in Spring AI

Providing Custom Intelligence with RAG Pipelines

Enabling LLMs to Take Action with Tool Calling

Building a ChatClient to Connect Your Java App with an LLM

Testing Intelligent Java Applications Using Spring AI Evaluators

Conclusion: Building Robust Intelligent Systems with Spring AI

You may also like

Share your Thoughts!Cancel reply

Games & Giggles: Fun Moments from yCrash Retreat 2025

yCrash Retreat 2025: Delays, Detours & a Whole Lot of Fun

“Spring Boot 4.x + Java 25: Build Modern, High-Performance Apps” Webinar

About

Popular Topics

Troubleshooting Tools

Understanding Large Language Models (LLMs)

Limitations of LLMs and How to Overcome Them

Questions LLMs Handle Well

Questions LLMs Cannot Answer Without External Data

Why LLMs Need Context and Actions

Wrapper Applications and How They Work with LLMs

Core Concepts: Tokens, Prompts, and Chat Memory

What Tokens Are and Why They Matter

Understanding Prompts and System Messages

Managing Chat Memory Efficiently in Spring AI

Enhancing LLM Responses Through Prompt Stuffing

What Prompt Stuffing Is

Creating and Using Prompt Templates in Spring AI

Providing Custom Intelligence with RAG Pipelines

Enabling LLMs to Take Action with Tool Calling

Building a ChatClient to Connect Your Java App with an LLM

Testing Intelligent Java Applications Using Spring AI Evaluators

Conclusion: Building Robust Intelligent Systems with Spring AI

You may also like

Share your Thoughts!Cancel reply

About

Popular Topics

Troubleshooting Tools

Discover more from yCrash