On this article, you’ll discover ways to rework a primary tool-calling script right into a resilient agent that gracefully handles failures from misbehaving instruments, malformed mannequin outputs, and unavailable providers.
Subjects we’ll cowl embody:
- The best way to construction an iterative agent loop with a security cap on iteration rely.
- The 4 distinct classes of failure an agent encounters when calling instruments, and how you can deal with every one.
- The best way to design instrument error messages that educate the mannequin how you can recuperate, lowering wasted iterations.
Constructing a Multi-Software Gemma 4 Agent with Error Restoration
Introduction
In a earlier article, we wired up Gemma 4 to a handful of Python features utilizing Ollama’s tool-calling API. That gave us a working single-turn dispatcher: the mannequin picks a instrument, our code runs it, the mannequin solutions. It’s a helpful start line, however it’s a great distance from an agent.
One of many issues that turns a tool-calling demo into an precise agent is the way it handles issues going fallacious. Instruments fail. The mannequin hallucinates a perform title, or passes a string the place you wished a quantity, or asks a few metropolis your lookup desk has by no means heard of. An upstream API occasions out. A required argument is lacking. Within the earlier tutorial, any of those would both crash the script or get swallowed by a strive/besides that prints a message and provides up. That’s positive for a single path demo. It’s not positive for something you’d need to go away operating.
This text rebuilds the agent across the assumption that issues will go fallacious, and reveals how you can recuperate gracefully after they do. The sample is straightforward: catch errors on the boundary, convert them into messages the mannequin can learn, ship them again to the mannequin, and let the mannequin determine whether or not to retry, route round the issue, or clarify the failure to the consumer. We’ll additionally wrap the whole lot in a correct iterative agent loop with a security cap on iteration rely.
The full script could be discovered right here. This text walks via the elements that matter.
Rethinking the Software Loop
The unique dispatcher ran a single spherical: ship the consumer question, gather instrument calls, run them, ship the outcomes again, print the mannequin’s reply. That’s a one-shot interplay. It really works positive when the mannequin’s first response appropriately solutions the consumer’s query, however it has nowhere to go when one thing goes fallacious. If a instrument fails, the mannequin will get one probability to react after which we’re carried out. If the mannequin needs to name one other instrument after seeing the primary consequence, too dangerous; we already exited.
A correct agent loop is iterative. The construction is easy:
- Ship the present message historical past to the mannequin.
- If the mannequin produces instrument calls, execute every one, append each consequence to the historical past, and loop once more.
- If the mannequin produces a plain textual content response, that’s the ultimate reply. Return.
- Cap the loop at
MAX_ITERATIONSso a confused mannequin can’t burn via your CPU endlessly.
That final level is non-negotiable. Small fashions often get caught calling the identical instrument repeatedly, or oscillating between two instruments, and there’s nothing extra demoralizing than strolling again to your terminal to seek out your laptop computer’s followers screaming as a result of Gemma determined to search for the climate in London thirty occasions in a row.
Right here’s the loop:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
def run_agent(user_query): messages = [{“role”: “user”, “content”: user_query}]
for iteration in vary(1, MAX_ITERATIONS + 1): payload = { “mannequin”: MODEL_NAME, “messages”: messages, “instruments”: available_tools, “stream”: False, }
print(f“[EXECUTION — iteration {iteration}]”) print(” ● Querying mannequin…n”)
strive: response_data = call_ollama(payload) besides Exception as e: print(f” └─ [ERROR] Error calling Ollama API: {e}”) print(f” └─ Ensure Ollama is operating and {MODEL_NAME} is pulled.”) return
message = response_data.get(“message”, {}) tool_calls = message.get(“tool_calls”) or []
# Department A: the mannequin needs to make use of instruments if tool_calls: print(f“[TOOL EXECUTION — {len(tool_calls)} call(s)]”) messages.append(message) tool_messages = print_tool_calls(tool_calls) messages.prolong(tool_messages) print() proceed
# Department B: the mannequin produced a remaining reply print(“[RESPONSE]”) print(message.get(“content material”, “”) + “n”) return
# Security rail: we exhausted MAX_ITERATIONS and not using a remaining reply print(“[RESPONSE]”) print( f“Hit the {MAX_ITERATIONS}-iteration cap and not using a remaining reply. “ “This normally means the mannequin is caught in a tool-calling loop. “ “Attempt simplifying the question.n” ) |
The sample is price committing to reminiscence as a result of it reveals up in each agent framework you’ll ever learn: the message historical past is the state. For every iteration we ship your complete dialog (the unique consumer question, the mannequin’s tool-call request, our instrument outcomes, any follow-up mannequin messages) again to the mannequin. The mannequin is stateless; the listing is the agent’s reminiscence.
This iterative construction can be what makes error restoration attainable. When a instrument fails and we ship the error again as a instrument message, the mannequin will get to see that error and react to it on the following iteration. With out the loop, there’s nothing to react into.
Constructing the Software Registry
Right here we construct our 4 instruments, all deterministic, all offline. No API keys, no community calls, no flaky exterior providers to debug. The purpose of this text is the error-handling structure, not the instruments themselves, so we wish the instruments to behave predictably so we are able to give attention to the framework round them, and so we are able to intentionally set off each failure mode at will.
The instruments are:
get_weather(metropolis): seems to be up a metropolis in a small dict of canned climate informationget_local_time(metropolis): computes the true present time in that metropolis’s timezone utilizingzoneinfoconvert_currency(quantity, from_currency, to_currency): does the maths towards a hardcoded USD-anchored charge deskget_city_population(metropolis): one other lookup towards a small dict
The static information lives on the prime of the file:
|
CITY_DATA = { “london”: {“timezone”: “Europe/London”, “inhabitants”: 8_982_000}, “tokyo”: {“timezone”: “Asia/Tokyo”, “inhabitants”: 13_960_000}, “sao paulo”: {“timezone”: “America/Sao_Paulo”, “inhabitants”: 12_330_000}, “paris”: {“timezone”: “Europe/Paris”, “inhabitants”: 2_161_000}, “ny”: {“timezone”: “America/New_York”, “inhabitants”: 8_336_000}, “sydney”: {“timezone”: “Australia/Sydney”, “inhabitants”: 5_312_000}, “mumbai”: {“timezone”: “Asia/Kolkata”, “inhabitants”: 20_410_000}, }
EXCHANGE_RATES = { “USD”: 1.00, “EUR”: 0.92, “GBP”: 0.79, “JPY”: 156.40, “BRL”: 5.12, “CAD”: 1.37, “AUD”: 1.51, “INR”: 83.20, } |
The features are intentionally easy, however they increase on dangerous enter reasonably than returning error strings. Right here’s get_weather:
|
def get_weather(metropolis: str) -> str: “”“Returns present climate circumstances for a recognized metropolis.”“” key = metropolis.decrease().strip() if key not in WEATHER_DATA: increase ValueError( f“Unknown metropolis: ‘{metropolis}’. Identified cities: {‘, ‘.be a part of(sorted(WEATHER_DATA.keys()))}.” ) information = WEATHER_DATA[key] return f“The climate in {metropolis.title()} is {information[‘conditions’]} with a temperature of {information[‘temp_c’]}°C.” |
Two issues to name out about that error message. First, it’s particular: it tells the caller what went fallacious and what the legitimate choices are. Second, the instrument increases a ValueError reasonably than returning the error as a string. Don’t catch and string-format errors contained in the instrument; as an alternative, allow them to propagate. We wish the dispatcher to deal with each sort of failure in a single place, and we wish the message the mannequin sees on a foul enter to be informative sufficient that the mannequin can appropriate itself.
get_local_time does the one actual work — precise timezone-aware datetime arithmetic — and that’s additionally the instrument we’ll later use to exhibit sleek degradation towards a simulated upstream failure:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
def get_local_time(metropolis: str) -> str: “”“Returns the present native time for a metropolis, with a cached fallback.”“” key = metropolis.decrease().strip()
# Simulate an upstream geocoding service that will fail unpredictably if SIMULATE_GEOCODING_OUTAGE and random.random() < 0.6: if key in TIMEZONE_FALLBACK_CACHE: tz_name = TIMEZONE_FALLBACK_CACHE[key] now = datetime.datetime.now(ZoneInfo(tz_name)) return ( f“[cached] The present native time in {metropolis.title()} is “ f“{now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}). “ “Notice: geocoding service is presently unavailable; this worth is from the native cache.” ) increase ToolUnavailableError( f“Geocoding service is unavailable and ‘{metropolis}’ isn’t within the native cache. “ “Please strive once more later or use a metropolis from the cache: “ f“{‘, ‘.be a part of(sorted(TIMEZONE_FALLBACK_CACHE.keys()))}.” )
if key not in CITY_DATA: increase ValueError(f“Unknown metropolis: ‘{metropolis}’. Identified cities: {‘, ‘.be a part of(sorted(CITY_DATA.keys()))}.”) tz_name = CITY_DATA[key][“timezone”] now = datetime.datetime.now(ZoneInfo(tz_name)) return f“The present native time in {metropolis.title()} is {now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}).” That <code>SIMULATE_GEOCODING_OUTAGE</code> flag lets us reproduce a actual–world failure mode with out needing actual infrastructure to fail. We‘ll come again to it.
The instrument schemas are unchanged from the earlier tutorial’s</a> model: commonplace Ollama perform–calling format, with clear descriptions of what every instrument does and what arguments it expects.
<h2>The 4 Error Restoration Patterns</h2> Time to get severe. There are 4 distinct failure modes you‘ll encounter when an agent talks to instruments, and every one wants its personal technique. They’re dealt with in a single dispatcher perform, however it‘s price understanding them as separate ideas.
Sample 1: Software Execution ErrorsThe primary protection is the dispatcher itself. It wraps each instrument name in a structured
def dispatch_tool_call(tool_call): function_name = tool_call[“function”][“name”] arguments = tool_call[“function”][“arguments”] or {}
# Protection 1: validate the instrument title towards the registry if function_name not in TOOL_FUNCTIONS: return “error”, ( f”Unknown instrument ‘{function_name}‘. “ f”Legitimate instruments are: {‘, ‘.be a part of(TOOL_FUNCTIONS.keys())}.“ )
func = TOOL_FUNCTIONS[function_name]
# Protection 2: catch argument errors (fallacious sorts, lacking or additional args) strive: consequence = func(**arguments) return “okay“, str(consequence) besides TypeError as e: return “error“, f”Unhealthy arguments for {function_name}: {e}“ besides ValueError as e: return “error“, str(e) besides ToolUnavailableError as e: return “error“, f”Software quickly unavailable: {e}“ besides Exception as e: return “error“, f”Surprising error in {function_name}: {sort(e).__name__}: {e}“ |
The important thing perception: return the error to the mannequin as a instrument consequence as an alternative of elevating it again to the agent loop. The mannequin can learn the error, see that it requested for “Atlantis” and Atlantis isn’t a recognized metropolis, and pivot to a distinct metropolis, or apologize to the consumer. For those who increase as an alternative, you’ve stripped the mannequin of the power to recuperate.
Discover the 4 completely different exception sorts and the catch-all on the backside. Every one corresponds to an actual class of failure: area errors (ValueError), signature mismatches (TypeError), infrastructure outages (ToolUnavailableError), and the Don Rumsfeld unknown unknowns (Exception). Separating them provides you cleaner error messages, which give the mannequin higher alerts for restoration.
The catch-all is necessary and maybe controversial. Some model guides will inform you by no means to catch a naked Exception. In an agent dispatcher, the choice — letting an surprising exception kill the loop — is worse. The mannequin loses the prospect to recuperate, the consumer loses the response, and also you lose the dialog historical past you might have used to debug what occurred. Higher to catch, log, and hand the message to the mannequin.
Sample 2: Malformed Software Calls From the Mannequin
The mannequin often hallucinates a instrument title that doesn’t exist, or sends arguments underneath the fallacious keys (city as an alternative of metropolis, for instance). The primary protection within the snippet above handles the primary case: earlier than we even attempt to dispatch, we examine the title towards the registry and return a corrective message itemizing the legitimate names.
The incorrect-argument case is dealt with by the second protection. Python’s **arguments unpacking raises TypeError if the mannequin sends a key phrase the perform doesn’t settle for, or omits a required one. We catch the TypeError, format it cleanly, and the mannequin will get a helpful error on the following iteration:
|
[ERROR]: Unhealthy arguments for get_weather: get_weather() obtained an surprising key phrase argument ‘city’ |
That message incorporates the whole lot the mannequin must appropriate itself: the instrument title, the offending argument, and an implicit sign that the correct title is one thing else. In apply the mannequin normally fixes the decision on its subsequent flip.
There’s additionally a extra delicate argument-related failure: sort drift. The mannequin is aware of quantity must be a quantity, however in longer conversations it often begins sending "100" as a string. Letting convert_currency increase on that may drive an additional flip for the mannequin to appropriate itself. A greater strategy is defensive coercion within the instrument itself:
|
def convert_currency(quantity: float, from_currency: str, to_currency: str) -> str: # Defensive sort coercion: the mannequin typically sends numbers as strings strive: quantity = float(quantity) besides (TypeError, ValueError): increase ValueError(f“‘quantity’ have to be a quantity, obtained: {quantity!r}”) # … remainder of the perform |
This silently fixes the widespread case ("100" turns into 100.0) whereas nonetheless elevating a clear error for the genuinely damaged case ("fifty"). The precept: be liberal in what you settle for from the mannequin, and strict in what you complain about.
Sample 3: Area-Degree Errors
These are the errors the instrument itself raises when the inputs are well-formed however the request can’t be glad, resembling asking for the climate in Atlantis, or changing from a forex that isn’t within the charge desk. These ought to produce error messages that educate the mannequin how you can recuperate, not simply say “failed.”
Examine these two error messages:
|
Good: “Unknown metropolis: ‘Atlantis’. Identified cities: london, mumbai, ny, paris, sao paulo, sydney, tokyo.” |
The nice model provides the mannequin the whole lot it must both retry with a legitimate enter or clarify the limitation to the consumer. The dangerous model forces the mannequin to guess. Each error message within the instrument features follows this sample: say what went fallacious, and the place attainable, listing the legitimate alternate options.
This isn’t only a UX nicety. It straight impacts what number of iterations the agent loop will burn earlier than attending to an excellent reply. A obscure error can price you a full additional spherical journey whereas the mannequin gropes for a repair. A particular error normally will get corrected on the very subsequent flip or, when the enter is genuinely unrecoverable, lets the mannequin produce a clear clarification with out attempting once more in any respect.
Sample 4: Swish Degradation for Unavailable Instruments
The final sample is for the scenario the place a instrument isn’t damaged, simply gone — a geocoding service is down, an API quota is exhausted, a database is having a foul day. You will have three choices right here, roughly so as of how a lot you belief the mannequin to deal with the scenario:
- Return a cached or default worth and flag it within the consequence. Greatest when the instrument’s freshness isn’t vital.
- Skip the instrument completely and return a transparent message about what couldn’t be offered. Let the mannequin determine whether or not to retry or work round it.
- Floor the outage to the consumer by having the agent cease and ask for steerage.
get_local_time demonstrates possibility 1. When SIMULATE_GEOCODING_OUTAGE is on and the random examine journeys, the instrument first tries the native cache:
|
if SIMULATE_GEOCODING_OUTAGE and random.random() < 0.6: if key in TIMEZONE_FALLBACK_CACHE: tz_name = TIMEZONE_FALLBACK_CACHE[key] now = datetime.datetime.now(ZoneInfo(tz_name)) return ( f“[cached] The present native time in {metropolis.title()} is “ f“{now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}). “ “Notice: geocoding service is presently unavailable; this worth is from the native cache.” ) increase ToolUnavailableError( f“Geocoding service is unavailable and ‘{metropolis}’ isn’t within the native cache. “ “Please strive once more later or use a metropolis from the cache: “ f“{‘, ‘.be a part of(sorted(TIMEZONE_FALLBACK_CACHE.keys()))}.” ) |
If the town is within the cache, the instrument returns a profitable consequence tagged with [cached] and a observe explaining that the reside service is unavailable. The mannequin sees a superbly usable reply and a small caveat it may select to say to the consumer. If the town isn’t within the cache, the instrument falls via to possibility 2: it raises ToolUnavailableError with a message itemizing what is cached.
That ToolUnavailableError is deliberately a separate exception sort reasonably than a ValueError. The dispatcher provides it its personal catch arm with a definite error prefix (“Software quickly unavailable”) so the mannequin can inform the distinction between “you requested for one thing I don’t have” and “the service is down proper now.” These two failures have very completely different applicable responses — retry later versus decide a distinct enter — and giving the mannequin a transparent sign helps it decide the correct one.
In manufacturing, you’d prolong this sample with a retry-with-backoff coverage earlier than falling via to the fallback. The construction stays the identical: the dispatcher distinguishes recoverable from unrecoverable failures, and the mannequin is advised sufficient about every one to make a wise subsequent transfer.
Placing It All Collectively
Time to truly run the factor. Right here’s a question that workouts the whole lot — a number of cities, a number of instruments, and an intentional dangerous enter to set off error restoration in flight:
|
python important.py “What is the climate in London, Tokyo, and Atlantis proper now? And convert 50 GBP to JPY.” |
The precise iteration rely and tool-call ordering will fluctuate from run to run relying on how Gemma decides to sequence the work, however right here’s a consultant hint, barely trimmed:

Take a look at what occurred in iteration 3. The mannequin requested about Atlantis, the instrument raised ValueError, the dispatcher transformed it into an error message itemizing the legitimate cities, and the mannequin — on iteration 5 — folded that data right into a clear response. It didn’t retry Atlantis. It didn’t crash. It observed the failure, built-in it with the profitable outcomes, and produced a solution that acknowledged the limitation. That’s your complete payoff of the error-recovery structure in a single hint.
To see sleek degradation in motion, flip SIMULATE_GEOCODING_OUTAGE to True and run a question that asks for native time:
|
python important.py “What is the native time in London and Paris?” |
About 60% of the time you’ll see the [cached] prefix within the instrument consequence and the mannequin will point out the cached supply in its remaining response. The remainder of the time the instrument will return efficiently and the cached path gained’t set off. Both manner, the loop completes and the consumer will get a solution.
Conclusion
We constructed three issues on prime of the inspiration from the primary tutorial: an iterative agent loop with a tough iteration cap, a layered dispatcher that catches each class of instrument failure, and power features whose error messages educate the mannequin how you can recuperate. Collectively they’re the distinction between a tool-calling demo and an agent you’d truly need to go away operating unsupervised.
A number of pure subsequent steps embody:
- Persistent reminiscence throughout periods, so the agent can bear in mind what it realized about you final week
- Retry-with-backoff insurance policies for transient upstream failures
- Reincorporating the exterior APIs instead of the static lookup tables, which largely simply means accepting that timeouts and charge limits develop into a part of the traditional failure floor
The full script is on GitHub. Clone it, run it, break it intentionally to observe the restoration in motion, and incorporate the following steps above.

![Why Each Staff Wants a Content material Engineer [MozCon 2025 Speaker Series]](https://blog.aimactgrow.com/wp-content/uploads/2025/09/MozCon-25-Speaker-Profile-Cards-15-120x86.png)



![How creators and entrepreneurs are utilizing AI to hurry up & succeed [data]](https://blog.aimactgrow.com/wp-content/uploads/2025/06/Untitled20design-Apr-07-2023-08-24-35-4586-PM-120x86.png)


