Home Gotopia Articles Spring AI: Build...

Spring AI: Building Production-Ready AI Applications

Share on:
linkedin facebook
Copied!

About the experts

Josh Long

Josh Long ( expert )

Spring Developer Advocate at Pivotal, Java Champion and author of 6 books

Read further

Introduction

Hi, my name is Josh Long, and I'm here with the go to state of the art series looking at Spring AI. We're going to talk about how you can help you to advance your AI engineering use cases. It is GA as of 20th May 2025. And it comes with a lot of amazing features.

One of the things I think people take for granted and they don't quite understand is that actually, contrary to whatever you might have heard, Java is uniquely well situated. It's actually, in my estimation, arguably a much better choice for most AI sort of workloads today when it comes to dealing with chat models, image models, transcription models and the like.

Because it's a matter of integration, and integration with Rest APIs is often is not right. This is not something that is exclusively the province of, for example, a Python application. It's something that we've been able to do increasing as well. And remarkably well over the decade in the Java ecosystem. So we're gonna talk about spring AI.

Addressing AI Challenges

And, as much as we love AI technologies today, there are some issues. And so we're going to talk about how you can overcome some of those issues with support for certain patterns. One problem it has is that AI models don't tend to stay on track. They lack direction. They just enter and they respond to each prompt right in each request.

You can put them on track by giving them a system prompt, which aligns their responses to goals. They are stateless, that is to say, they forget things. So you can give them a chat memory. They are unable to interact with the outside world. You can give them access to that outside world via tool calling.

They don't know about your data, so you can put the data in the body of the request. That's called prompt stuffing. And you don't want to send all your data, just the data that's pertinent to the request. So you can use a retrieval augmented generation or Rag to do so by talking to a vector store, which is a data structure optimized for similarity search.

Then finally, given that their chat models and they're just designed to chat, sometimes they'll chat constantly, even about things about which they have no idea. That's called hallucination. And so you can use evaluation to sort of validate that what has been produced is actually a response that matters given the input. So I'm going to see how spring AI makes it trivial to pull together patterns like this, to get a production worthy application.

Building a Dog Adoption Application

Let's go to startups' fingernails. We're going to build an application to support adopting dogs, and you can imagine wanting a shelter to support that effort there. I'm going to use a Postgres vector store capability. I use GraalVM, I use the web support, I use open AI, but there are dozens of different models that you can use there.

What have I done? I've got GraalVM. I've got OpenAI. So there's that. I'm going to need SQL data mapping technology. I'm going to need dev tools to make it trivial to do live reloads. I'll use what else we want to. We want the web support to build a web service.

So I think I'm happy with that stuff there. We can also imagine bringing in an actuator to do observability. So I'm going to call this an assistant. And it's going to talk to a SQL database and give us information about the dogs available in the shelter. And we've got one dog in particular named Prancer that went viral for all the wrong reasons when his owner tried to put up this ad, trying to find a new home for this dog.

She describes the dog as, there's not just a, there's not a very big market for neurotic man hitting, animal hitting children, hitting dogs that look like gremlins. Right. Very funny ad I mean, it was covered by all sorts of media and sort of including people in BuzzFeed and USA today and the New York Times. Right.

This is a very, very funny, well known ad about a dog. We can oh, well, whatever. So the point is we want to support adopting that dog. We've got an assistant here. I'm going to open this in my ID. And we're going to build an assistant. Now a couple things we need to specify here.

I'm going to add a dependency before we even get started doing anything. Here I'll use the spring I vector store capabilities. We'll come back to that in a bit. I'm going to make sure that we're using the snapshot versions of spring AI. Because remember, while it's going to come out GA in just a just a week or so, it's not yet.

I'm going to enjoy living on the edge here. So we'll paste that in there and then we'll upgrade this to spring AI snapshot. Oops. Okay, so there's this. Okay. So with that all in place, let's go ahead and build the application and connect it to our SQL data store which I'll do via some connectivity here.

Spring data source password is secret. Don't tell anybody. That's our little secret isn't it. So there's our username. And then finally the URL is Jdbc call in Postgres ql localhost my database. Yeah. And then this. So here's the assistant application okay. And what are we going to do. Well we're going to build an assistant.

It's going to be an Http controller. We'll have an assistant controller here. And I'm gonna have an Http endpoint that will have the username and the path variable. We can imagine using spring security. And we're going to just call it assistant okay. So a string assistant and it'll have the path variable as a user.

We'll have a question that gets sent from the user to ask questions about the dogs in the shelter. That's how normal dog adoption processes work. And in order to do that, some of the support responses to those questions, we're going to use the spring AI chat client, right. Which we can just build by injecting the builder like so.

So this dot I dot prompt dot to user pass in the user prompt. That's the question from the user. Get the content and set etc. etc.. Okay. Now where does that chat client come from? Well, we need to have specified an API key for OpenAI. I'm using OpenAI here. You can do whatever you want right. There's dozens of different supported models including Amazon, bedrock, Gemini, Google's Gemini, a llama, whatever AI, some local, some open source, some hosted, some public, whatever.

So we're going to do that. And, I think we've already got something here. Let's go ahead and start this up and see what that does. Okay. So now if I go here and I ask a very simple question, okay.

Tell it my name. It's complaining because that's the wrong URL. No problem. So we go there as a day-long assistant. Okay. It says do give me more context. So actually that's the, what I've done is I've accidentally sent the user path variable instead of the, the prompt. Right. So when we start that again and resubmit our question this time okay.

Hoping for a better answer. Okay, great. Let's see, does it remember? It doesn't have personal access, so it's forgotten me. I seem to have that effect on people. They meet me and then they forget me immediately. And the way we can get around that is by advising it with an advisor, which is like a filter on each request.

You can imagine when I create a prompt chat memory advisor and it's a filter that a preprocessor is the request, I'm going to do a message window chat memory builder passing in an in-memory chat memory that gets called okay. And I'm going to put that in a concurrent HashMap. So each user will have its own instance of that whole arrangement there.

So prompt chat memory advisor memory new concurrent HashMap okay. So we'll say this dot memory dot compute if absent user okay. And then that. And I'm going to use that now to get the current advisor. For users and configure that on the request being sent off to the model. Right. So advisor for user okay.

Now again, you can use in-memory just to keep it simple for a little demo here. But you can imagine using something backed by persistence like reddest and new for J and Jdbc and and whatever. So what's my name? It doesn't know of course. My name is Josh. Great. Now what's my name? There name is Josh.

So it's remembered. Good. Now I can ask about anything. No, that's the next problem we have to worry about is that it has no focus. So we're going to give it a system prompt to dictate that it's to be acting in terms of a dog adoption agency. And I just happened to have a system prompt that I've pre-written.

So I'll take that over here. All right. I'll put that in, the buffer paste that there and then reload. Okay. So it says you are an AI powered assistant to help people adopt a dog from the adoption and input files with locations in, you know, all these various cities, including London and San Francisco. So, let's see, control is.

There we go. It's going to fail because of the wrong URL. And we go here and we say Geelong assistant. Okay. It says, I'm sorry, but I don't have any information about specific dogs available for adoption. It puts Palace, right. So it doesn't know about the dogs, but at least it knows it's meant to be a representative of our fictitious dog adoption agency called Pooch Palace.

Okay, great. So now what I want to do is I want to give it access to that data, but not all that data. Remember, we don't want to send too much because we might overwhelm our token count. And token count is the approximation of how much data is allowed to be sent in and received from a given model.

Okay, so we'll say when to, talk to a SQL database here. I'll just create a simple spring data Jdbc model. Sorry, representation of the domain here. Okay. And I've got a record here that I'll create called dog.

Tag and ID string name string owner string description okay. And we'll say ID there we go. Okay. And we'll inject a dog repository. And we're going to write out the results that we get from the repository to this vector store on the program startup. Right. So we'll say repository dot find all dot for each dog that comes back okay.

We're going to create a document right now. Which is a string I document with some text in it. Name description okay. That formatted dogged name and description. And here's that. And we're going to say a vector store that adds a list of documents. And then we're going to tell Springer to consult the vector store with the newly encoded data by configuring, yes you guessed it, an advisor.

There's this. And if we run the program again it'll restart. And it's going to take a little while to finish because this time it's going to write data to a SQL table called a vector store.

Okay. The thing has finished. There you go. It's finished. Took a long time though. Notice that. So now, let's see. Do you have any neurotic dogs?

They go, we've got Prancer. So it now knows about our data. Let's make sure not to reinitialize that database. Okay, so we'll just comment this out in effect so it doesn't rerun. So I know about Prancer, which is good. And that means it's able to talk to our database to get the data before offering a response. What's the next thing you'd logically like to do?

Well, I can imagine wanting to adopt this dog couldn't do, so I would use what I would want my AI model to have access to tools to help it do scheduling. So the dog Adoption scheduler is okay. I'm going to create an endpoint here. Schedule adoption for a given dog ID, right. And dog ID and we'll have a strong dog name, and we'll just say, return.

Return. Instant dot. Now, dot plus. And I'll just say, you know, three days in the future. Doctor string okay. And we'll say system out scheduling adoption for that. And we're going to export this as a tool. So describe the, schedule an appointment to up or adopt a dog from a pooch Palace location. 

To a program, the name of the dog and to a param description. The ID of the dog. Okay. Okay, so let's give our model access to this tool just as before, just injecting it and using it accordingly. So we'll specify it as available as a default tool. Right. So scheduler there we go.

Now we're going to ask the same question. Again we should get the same response. Naturally okay. And then we want to say let's see. Control Control-c control our Http.

Okay. That looks right. But we want to change this URL again because I keep entering the old URL there. Okay. And so fantastic. When can I schedule an appointment to adopt cancer from the San Francisco location. It says May 16th. That's three days from today. We can confirm as much by going to the console here, command five, and confirming that indeed, our scheduling logic was invoked.

Now this is local tool calling. Now, one thing that might be of interest to people today is how to extract this logic out into a, into a reusable form via a protocol. And that is to say model Context protocol. So Smart Context Protocol is a protocol from anthropic makers of Cloud Desktop that debuted in November of 2020 for the spring team was among the very first to leap to support it.

And so we're going to make our code. Where's our model? Our AMQP we're going to make our code, talk to a service via NTP. This protocol. And again very, very useful to do. Let's go over here to the initializer, create a server. So I'll say scheduler we'll say web. We use MCP server. And again I'll use the snapshot here.

Okay. So let's just do 3545 okay. Hit enter, copy and paste that.

Scheduler. Okay. So first things first we want this to be a snapshot. So I want to go get the snapshot dependencies here. Paste them at the bottom. In the code itself we're going to extract that business logic, patent pending business logic from the assistant. And we're going to export it as an MVP service.

So let's go over here and now create a single bean called method to a callback provider. Okay. Return that and then this. And then we'll inject the dog adoption scheduler okay. Is this. And then the only thing left is to export this as a, running on port 8081. And we're going to restart this. Okay. So that's the scheduler and port 8081.

Let's rewrite this code over here to now take advantage of that newly stood up, scheduler service. You can see it's got the tool register there. We'll go over here. Bean. MCP sync client. VAR mcp dot mcp client. Dot sync. It'll be new. It'll be http url http localhost 8081 okay. And there's this. So we turned npy np dot initialize.

That build and initialize.

Good. So let's restart that. We're going to use that client now so instead of using the local tool we're going to call that remote one just as before. So good. And then we'll just replace this with a tool callback. Okay.

Okay. And we'll get rid of that and we'll say new sync there is again okay. All right. So we restart this on the left and we got that on the right. The code on the left is now calling out via this protocol to this other thing running on the other node. Right. So we can ask the same questions.

Do you have any news like dogs? Great. And, when can I schedule an appointment to adopt Prancer from the San Francisco location? There you go. So it's work. You actually did the interchange now via this protocol. So now we've got our MCP service. Easy. Up and running couldn't be more trivial. Obviously I want this to scale as well as anything.

So I'm going to use virtual threads right to, do non-blocking IO and then, you know, taking it a step further, you can also turn this into a VM native image. For example, one of these nodes could be a growl VM native image. You can add that support by specifying graalvm. I think I already have it over here, but we'll do the other one to be more efficient.

Okay. So just copy and paste that. Go over here to the plugins on the build side.

Paste. Then on the command line we do maven skip tests P native native colon compile. And when you do that, that'll create an optimized ahead of time, optimized and compiled native image that you can run at you know, some factor smaller than the footprint of a JVM application that requires no Geri because it's self-contained.

And that starts up, you know, usually many, many times faster than a dairy equivalent application. So that's why we've been running everything in the dairy. But GraalVM is a much more optimized form factor for your programs and for a lot of programs. It'll just work out of the box. There you can watch some old videos to learn how to get it done more efficiently.

I'll let that run. And I'm hoping the editors will just fast forward. Might take about 30 seconds.

Okay, it took 49 seconds. Target scheduler. So there. Oh, it's running. It's the ports already in use. We'll stop this. Okay. There's the application up in 34 milliseconds. Right. Here's the existing one. And you know, the same as before, right. Do you have any running dogs?

Fine. And then. They should be.

When can I adapt them? And there we go. We got an answer from three days in the future. So with that, we've looked at how to use spring AI to build a scalable services production worthy that's observable. It could be secure. It could be all sorts of things. Using the convenient spring, a project, we've demonstrated why it's as trivial as in, in any other language and any other platform, if not more, to build production for the AI system with Java and Spring.

And how we are in a uniquely great position to do so. Hopefully, you can see how things have changed over the last few years. It's been a roller coaster since spring AI and since I was in, you know, as we know it in the modern era with ChatGPT you what burst into the scene back in, but it was a 2023, 2022 system by that.

And you can see how all these patterns that I've sort of, sprouted up to support it and that are well supported in spring. AI the next natural step would of course, to do would be to build a generic system, right, to, use AI to drive the flow of AI processes. And, you can do that with spring here.

Naturally. Thank you for your time. And, see you next time.