Letting the Model Pick Its Own Tools: How Tool Use Inverts Control Flow

The Model Combined Two Searches on Its Own

A tool is a function you expose to the model. You define what it does (via a name and description) and what arguments it accepts (via a JSON schema). The model doesn't execute the function — it reads the definition, decides whether to call it, and returns the function name and arguments. Your code runs the function and sends the result back. A tool call is one of these model-requested function executions.

Loading diagram...

I built a sourcing agent with four tools — keyword search, semantic search, candidate detail lookup, and a signal to flag when the candidate pool is too thin. No instructions on which tools to call or in what order. Just tool definitions and a job description.

The model made four calls autonomously:

Step	Tool Called	Why	Result
1	`search_candidates_by_skills`	Precise stack matching (Python, PostgreSQL, AWS, Docker, 5+ yr)	2 candidates
2	`search_candidates_semantic`	Keyword results were thin — broaden with semantic search	8 candidates (overlaps + new)
3	`get_candidate_detail`	Verify Principal Engineer's Docker gap	Kubernetes covers Docker — credited
4	`get_candidate_detail`	Verify a borderline candidate's fit	Confirmed as weak match

Final shortlist: 4 candidates sourced — two exact skill matches (0.95, 0.93), one credited via Kubernetes-Docker overlap (0.82), one borderline (0.61).

Loading diagram...

Two things worth noting: the model excluded the junior frontend developer without being told to, and credited the Principal Engineer's Kubernetes experience against the Docker requirement after verifying via detail lookup. I also tested a niche JD (Ray, Triton Inference Server, CUDA) where keyword search returned zero hits — it extended to four steps: keyword search → semantic search → re-search with relaxed parameters → flag_sourcing_gap. Same code, different strategies.

	Who decides	Sequence	Adapts to input?
Pipeline	Code	Step 1 → 2 → 3 → ... → N	No — same every time
Tool use	Model	Code offers tools → Model picks → Code executes	Yes — strategy varies per input

The Agentic Loop

The agentic loop is the mechanism that makes tool use work:

Loading diagram...

One detail the diagram doesn't show: the loop needs a MAX_TOOL_CALLS safety limit¹. Without it, the model can loop indefinitely if it never converges on a final answer.

Tool Descriptions Are Decision-Making Instructions

This was the most important finding. The description field in a tool definition isn't documentation for humans — it's an instruction the model reads when deciding which tool to call and when to stop calling. Every word in the description shapes the model's strategy the same way a system prompt shapes a chatbot's behavior, except here it controls tool selection instead of conversation tone.

# Bad: model doesn't know when to use this vs. semantic search
"description": "Search candidates"
 
# Good: model knows the precise use case and when to prefer alternatives
"description": (
    "Search the candidate pool by required skills. Returns candidates "
    "whose skill list contains ALL of the specified skills. Use this "
    "for precise technical stack matching."
)

I assumed the phrase "Use this for precise technical stack matching" was what caused the model to call keyword search first. So I tested it — swapped to vague descriptions ("Search candidates", "Find candidates using AI") and ran 3 times. The model still called keyword search first every time. Routing order didn't change. What changed was stopping behavior:

	Specific descriptions	Vague descriptions
Routing order	Keyword first	Keyword first (same)
Detail lookups	2.0 avg	5.3 avg (2.6x more)
Total tool calls	4.3 avg	8.0 avg (1.9x more)

The model didn't know when to stop examining candidates because the vague descriptions gave no implicit stopping guidance. It verified every candidate it found — even clearly strong matches.

Where Tool Use Breaks Down

Over-calling tools. Without stopping criteria, the model verified every candidate it found. Adding one sentence to the system prompt made a measurable difference:

	Without stopping criteria	With stopping criteria
Detail lookups	6.3 avg	3.0 avg (52% fewer)
Total tool calls	8.3 avg	4.7 avg (43% fewer)

The sentence: "Use get_candidate_detail ONLY for ambiguous borderline candidates. Stop once you have 3-5 confident candidates." Each unnecessary call adds latency and cost — the difference between a 30-second sourcing step and a minute-plus one. The token overhead compounds it: tool definitions are sent on every iteration of the agentic loop, not once per request.

Tools sent	Extra input tokens	At 10k requests/day (4 iterations each)
1 tool	~637	~25M tokens/day
3 tools	~900	~36M tokens/day
4 tools	~1,200	~48M tokens/day

The first tool is expensive (~637 tokens) because the API injects scaffolding. Each additional tool adds ~240–290 tokens. A token/cost budget per request — short-circuiting the loop when exhausted and returning the best result so far — is the right fix.

No fallback on failure. If all tools return empty results, the model has no deterministic path to fall back on. In the fixed pipeline, sourcing always produces output — even if it's bad output. Tool use can produce nothing, which means downstream agents have nothing to work with. The fix: if the tool-use agent returns empty results or exceeds its cost budget, fall back to the fixed pipeline. The pipeline is less adaptive but always produces output — a reliable floor beneath the flexible ceiling.

Takeaways

Write tool descriptions like you're briefing a colleague — specific about what the tool does, when to use it, and when to prefer alternatives. Vagueness doesn't break routing, it breaks stopping.
Use temperature=0 for tool selection — it stabilizes tool order but not tool count. For fully deterministic behavior, combine it with explicit stopping criteria in the system prompt.
Keep fixed pipelines for steps that always run the same way. Tool use shines when the optimal sequence genuinely varies by input. The practical pattern is mixing both — fixed pipeline for the overall flow, tool use for the adaptive steps.

The agentic loop in code. The loop sends messages to the model, checks for tool calls, executes them, and repeats until the model returns a final answer. MAX_TOOL_CALLS caps the iterations to prevent runaway loops:

messages = [system_msg, user_msg]
for _ in range(MAX_TOOL_CALLS):
    response = client.chat.completions.create(
        messages=messages, tools=TOOLS, temperature=0
    )
    msg = response.choices[0].message
    if not msg.tool_calls:
        return parse_result(msg.content)  # done
 
    messages.append(msg)
    for tc in msg.tool_calls:
        result = execute_tool(tc.function.name, json.loads(tc.function.arguments))
        messages.append({
            "role": "user",
            "content": f"Tool: {tc.function.name}\nResult:\n{result}"
        })

Each iteration appends the model's response and tool results to messages, so the model sees the full conversation history — including its own prior tool calls and their results — on every round-trip. ↩

The Model Combined Two Searches on Its Own

Loading diagram...

The model made four calls autonomously:

Step	Tool Called	Why	Result
1	`search_candidates_by_skills`	Precise stack matching (Python, PostgreSQL, AWS, Docker, 5+ yr)	2 candidates
2	`search_candidates_semantic`	Keyword results were thin — broaden with semantic search	8 candidates (overlaps + new)
3	`get_candidate_detail`	Verify Principal Engineer's Docker gap	Kubernetes covers Docker — credited
4	`get_candidate_detail`	Verify a borderline candidate's fit	Confirmed as weak match

Final shortlist: 4 candidates sourced — two exact skill matches (0.95, 0.93), one credited via Kubernetes-Docker overlap (0.82), one borderline (0.61).

Loading diagram...

	Who decides	Sequence	Adapts to input?
Pipeline	Code	Step 1 → 2 → 3 → ... → N	No — same every time
Tool use	Model	Code offers tools → Model picks → Code executes	Yes — strategy varies per input

The Agentic Loop

The agentic loop is the mechanism that makes tool use work:

Loading diagram...

One detail the diagram doesn't show: the loop needs a MAX_TOOL_CALLS safety limit¹. Without it, the model can loop indefinitely if it never converges on a final answer.

Tool Descriptions Are Decision-Making Instructions

# Bad: model doesn't know when to use this vs. semantic search
"description": "Search candidates"
 
# Good: model knows the precise use case and when to prefer alternatives
"description": (
    "Search the candidate pool by required skills. Returns candidates "
    "whose skill list contains ALL of the specified skills. Use this "
    "for precise technical stack matching."
)

	Specific descriptions	Vague descriptions
Routing order	Keyword first	Keyword first (same)
Detail lookups	2.0 avg	5.3 avg (2.6x more)
Total tool calls	4.3 avg	8.0 avg (1.9x more)

The model didn't know when to stop examining candidates because the vague descriptions gave no implicit stopping guidance. It verified every candidate it found — even clearly strong matches.

Where Tool Use Breaks Down

Over-calling tools. Without stopping criteria, the model verified every candidate it found. Adding one sentence to the system prompt made a measurable difference:

	Without stopping criteria	With stopping criteria
Detail lookups	6.3 avg	3.0 avg (52% fewer)
Total tool calls	8.3 avg	4.7 avg (43% fewer)

Tools sent	Extra input tokens	At 10k requests/day (4 iterations each)
1 tool	~637	~25M tokens/day
3 tools	~900	~36M tokens/day
4 tools	~1,200	~48M tokens/day

Takeaways

Write tool descriptions like you're briefing a colleague — specific about what the tool does, when to use it, and when to prefer alternatives. Vagueness doesn't break routing, it breaks stopping.
Use temperature=0 for tool selection — it stabilizes tool order but not tool count. For fully deterministic behavior, combine it with explicit stopping criteria in the system prompt.
Keep fixed pipelines for steps that always run the same way. Tool use shines when the optimal sequence genuinely varies by input. The practical pattern is mixing both — fixed pipeline for the overall flow, tool use for the adaptive steps.

messages = [system_msg, user_msg]
for _ in range(MAX_TOOL_CALLS):
    response = client.chat.completions.create(
        messages=messages, tools=TOOLS, temperature=0
    )
    msg = response.choices[0].message
    if not msg.tool_calls:
        return parse_result(msg.content)  # done
 
    messages.append(msg)
    for tc in msg.tool_calls:
        result = execute_tool(tc.function.name, json.loads(tc.function.arguments))
        messages.append({
            "role": "user",
            "content": f"Tool: {tc.function.name}\nResult:\n{result}"
        })