An intro to AI function calls and Cloudflare embedded function calls

6 Dec 2024 ai cloudflare workers

In this guide I'll introduce you to AI function calls, which are a means of integrating generative AI into wider logic flows.

AI function calls are supported by multiple AI providers such as OpenAI and Cloudflare, but it's the latter we'll be using in this article, in the context of Cloudflare Workers, Cloudflare's compute platform.

The choice of Cloudflare isn't coincidental; it turns out that Cloudflare has an ace up its sleeve in the arena of AI function calls, something called "embedded" function calls, as we'll see later. For now, let's start with industry-standard, "traditional" AI function calls.

The problem 🔗

If you've ever used generative AI (e.g. GPT or other LLMs) you might at some point have wondered, "this is all great, but how do I get a specific response in a specific format?"

Suppose you need your LLM to give you JSON. You probably tried to ensure this via prompt engineering, e.g.

Fromthewords in the input, extract the commonest three themes and return them as a JSON array, each theme an object with a property, "theme".

Thisisproblematic simply because LLMs aren't generally in the business of obeying your commands 100% of the time (assuming they even fully understand the instruction you gave.) Depending on the model and the instruction, you might get what you want some of the time, but sometimes AI will simply disobey.

Sometimes, even though you explicitly tell it not to, the LLM will insist on prefixing your JSON with a line of conversational text, like "Sure, here's your JSON!", which you'd then have to parse out of the response.

Meet AI function calls 🔗

AI functions are the solution to that problem. They harness generative AI not to produce conversational responses but responses of a type and content that you specify (actually, demand), such that your upstream code can rely on them and use them.

Commonly, this upstream code talks to an external API, but first needs info from the LLM before it can do so. As such, it needs to depend on receiving certain arguments. AI functions allow you to specify what those arguments are, and how they should be derived.

AI function calls are supported only by certain models - you can't just use any LLM. With Cloudflare this (currently) means its hermes-2-pro-mistral-7b, for example.

Set up a Worker 🔗

Let's get cracking. First, create a Cloudflare Worker, which we'll call "ai-funcs-tut". Somewhere on your machine, run the following to trigger the Worker bootstrapper:

npm create cloudflare@latest -- ai-funcs-tut

This will guide you through a series of questions. Answer as follows:

Start with = Hello World example
Template = Worker only
Language = JavaScript
Version control = no
Deploy = no

Now enter the newly-created directory.

cd ai-funcs-tut

Next up, let's install a few dependencies. We'll install a router, Itty Router, to handle our app's endpoints, and also Cloudflare's AI utils package, which we'll need later on.

npm i itty-routernpm @cloudflare/ai-utils

The Worker setup process will have created a config file for your worker, wrangler.jsonc. Open it up and add a binding to Cloudflare's Workers AI product, under the "Bindings" comment (but don't forget to put a comma after the last line, so you have valid JSON):

{
    "ai": {
        "binding": "AI"
    }
}

For Workers AI, the binding is always exactly "AI", unlike with other types of binding, where the binding would be the name of the item you're binding e.g. the name of your R2 (S3) bucket.

Uncomment it (remove # from the start of each line). This will give our Worker access to Cloudflare's AI suite. Then save and close.

Let's run our Worker! Run:

npm run dev

This will launch our Worker on http://localhost:8787. Head to there, and you'll see the Hello World!

Set up the AI function call 🔗

For this example we're going to make a traditional AI function call that identifies a fruit denoted by a cryptic clue passed in by a user, and returns its name and shape, such that we can pass these to a function to output the fruit info.

Let's open up our Worker's src/index.js file and replace its entire contents with the following:

//prep
import { AutoRouter } from 'itty-router'
import { runWithTools } from '@cloudflare/ai-utils'
const model = '@hf/nousresearch/hermes-2-pro-mistral-7b'

//upstream function
const fruitInfo = ({name, shape}) =>
    `The fruit, ${name}, is ${shape}!`

//create router
const router = AutoRouter()

//route
//TODO

//export router
export default router

As well as bootstrapping our router, we also declare two things of note:

model, which stores the LLM we'll use. As mentioned earlier, AI functions work only with supported models, and with Cloudflare that means the hermes-2-pro-mistral-7b model.
fruitInfo, our upstream function. We'll pass this the info derived from AI.

Let's set up our route first. It'll accept a query string parameter, hint,into which we'll pass the cryptic clue for a fruit. Swap out the //TODO under our route with the following.

router.get('/fruitInfo', async (req, env) => {
    const hint = req.query.hint;
    if (!hint) return 'No incoming fruit hint...';
    const response = await env.AI.run(model, {
        messages: [{
            role: 'system',
            content: `
                The user message is a cryptic clue about a fruit.
                Extract the name and shape of the fruit.
            `
        }, {
            role: 'user',
            content: req.query.hint
        }],
        tools: [{
            name: 'fruitInfo',
            description: 'Extract the fruit name and shape',
            parameters: {
                type: 'object',
                properties: {
                    name: {
                        type: 'string',
                        description: 'The name of the fruit'
                    },
                    shape: {
                        type: 'string',
                        description: 'The shape of the fruit'
                    }
                },
                required: ['name', 'shape']
            }
        }]
    });
    return response;
})

What's happening there is we're building the config of our AI call. Most of this will be familiar if you've used generative AI, in that we pass in an array of messages, either system, user message or both. The new part is tools, an array of tool definitions we want to harness.

We're not yet using our function, fruitInfo - we'll plumb that in shortly - for now we're just returning the AI response so we can see the structure of what we get back.

In the browser, visit http://localhost:8787/fruitInfo?hint=garden of eden. You should see output like this:

{
  "response": null,
  "tool_calls": [{
    "arguments": {
      "name": "apple",
      "shape": "round"
    },
    "name": "fruitInfo"
  }
]}

As you can see, AI functions (Cloudflare or otherwise) return a tool_calls property, containing an array of objects, one per tool call we specified. It's this object that (hopefully) contains the arguments we asked for. We hinted at an apple, with our "garden of eden" clue, and it did the business.

Let's finish up by replacing our return response; line with a call to our function, passing along the arguments.

if (!response.tool_calls[0]) return 'Unknown fruit...';
return fruitInfo(response.tool_calls[0].arguments);

Refresh the browser, and you should see:

"The fruit, apple, is round!"

Embedded function calling 🔗

I mentioned up top that Cloudflare has an ace up its sleeve when it comes to AI function calls, and that ace is called embedded function calls.

The idea here is that, because Cloudflare is a compute platform not just an AI one, we can actually send our function alongside the call to the LLM and Cloudflare will execute our function alongside it.

Compute and latency aren't the only differences with embedded function calls - there's one other major one which we'll come to shortly, but first, let's look at an example.

Duplicate the route we set up above and change the route path to /fruitInfo2. Then make the following changes to the route's callback.

First, change:

const response = await env.AI.run(model, {

To:

return await runWithTools(env.AI, model, {

Then, hook our function onto the object in the tools array:

tools: [{
    ...
    function: fruitInfo
}]

Lastly, remove the last two lines, the ones beginning with if and return.

Now point your browser at http://localhost:8787/fruitInfo2?hint=garden of eden and you should see output, but it won't be your function's return value, like last time, it'll be something the LLM has generated.

"The fruit mentioned in the cryptic clue is an "apple," and its shape is "round.""

Why? This points to a major caveat with embedded function calls.

Important caveat / which to use? 🔗

Why did we get conversational output from the LLM, rather than our function's return value, with the embedded function call example above? Because - and this is a key difference between traditional and embedded functional calls - your function's return value is fed back into the LLM.

Essentially, the function's return value is appended to message history, messages. This differs from traditional function calling, where the AI is used only to derive the arguments to pass to our function. We then call the function ourselves, by which time AI has been and gone.

This caveat is an improtant factor in choosing which type of AI function you ultimately use:

If you want your function to return something, e.g. to output, use traditional functions
If you want your function to merely do something, e.g. call an API, use embedded functions

Suppose that, instead of our route simply outputting info about our fruit, we wanted instead to send the fruit name off to some imaginary fruit API, to retrieve further data about it (export value, main cultivating countries, etc.), and output that data. That would be a job for a traditional AI function call.

If, in contrast, we didn't need the API response, or we wanted to return something the LLM said about the API response, that's a good fit for an embedded function call.

I tend to stick with traditional AI functions since they give more visibility and control, which seems a fair price for the slightly higher latency.

---

There's more to AI functions - you can chain them, for one thing, and CF also provides a way to quickly spin up tools based on an Open API spec - but that'll do for now. I hope you found this useful.

Did I help you? Feel free to be amazing and buy me a coffee on Ko-fi!