Most of the really valuable things that pretty much all automation products (and this includes ChatGPT) do is allow language models to use tools. For instance, you can currently ask ChatGPT to let you know, once per week, what new restaurants have opened in your area. You could make an automation to do that... but why would you because you can just "ask" ChatGPT. ChatGPT can do this because it has access to a bunch of external APIs that it can call when it deems necessary.
We can already achieve similar functionality to tool use in Runchat using multiple nodes. For instance, to build the recommendation app described above in a language-model-first fashion, we could have a prompt node take a user request (”find me new restaurants in my area”) and format that as a request to some third party API (yelp? tripadvisor? google search?). We then make the request, and pass the output to another prompt node to format it as a pretty response in Markdown. If we run this app on a schedule then voila, we have the same functionality. But this has several UX downsides over “just asking ChatGPT”:
We need to know which apis to use
We need to know how to format the inputs to those apis
We need to hard-code our prompt to explicitly produce datastructures for these apis
It would be cool for prompt nodes in Runchat to be able to decide when they need to call external tools to respond to a user question, and this is exactly what the Function Calling spec of large language models aims to do. Normally, with function calling, the function itself needs to be defined somewhere in code. The nice thing about Runchat is we should be able to define the function as a regular old runchat, and just teach the language model how to run these via headless API. What is even nicer is that any other automation platform could use Runchat to build tools, and then define them as functions within their own language model integrations. In other words - Runchat is the tool building platform. These tools can be used by other runchats, or other agents on the internet.
Runchat has the advantage of allowing users to fully control exactly how these tools work rather than them being black boxed, and to be able to easily add more if they want to. Users can easily copy and edit existing toools to customize them for their needs. For instance, we might have a Search tool that users could then tweak to search for only academic papers... or only for pdfs, or only for content on a specific website etc etc. Creating good tools quickly and easily and letting language models decide how to use them could be really powerful. Especially because in Runchat, calling a tool could lead to automatically calling other tools as every Runchat with a prompt node that also has access to tools can continue to delegate tasks for as long as is required. We do need to watch out for the potential for infinite loops, but the concept is exciting.
Adding Function Calling in Runchat in should be pretty easy since we already have msot of the building blocks:
Runchats already define their own apis, names, descriptions and input types
Runchats keep track of their reference to other Runchats to make it easy to detect and avoid infinite loops
We can call the headless API to run a Runchat within a prompt calculation
The UI Implementation
We already have a spec that allows any Runchat to become a tool that can be called by the Function Calling API in the prompt node. We also already have a pattern that allows users to “install” collections of frequently used Runchats (called Libraries) and place them on the canvas from the search menu Installing libraries seems like a good way of avoiding the needle in the haystack problem of finding a Runchat to perform a particular task, and we can use installed libraries as a starting point for the list of tools to make available to our prompt node.
We want to keep the prompt node pretty space efficient so we don’t want to show a big menu of libraries. Instead, we show a search bar that filters installed libraries and displays them as a condensed list of buttons. Clicking a button installs the tool. Installed tools are added to the space above the search bar where clicking them will uninstall them. There is still a lot to do here in terms of making it obvious to a user what this feature does (for now there are very few clues), but it’s a good starting point for testing.
Calling Tools
The prompt node is pretty simple in its implementation: it makes a request to a language model and displays the content
part of the result. Function Calling adds quite a bit of complexity to this process as we need to:
Check if the model wants to call a function (s)
Run the Runchat(s) via headless api and await the response
Make a second request to the language model providing the function responses
This process adds a lot of latency to the prompt request. At a minimum prompts that use tools will take twice as long, as we need to make two requests. Then we also need to factor in the time it takes to actually call the tool. We can make these requests in parallel so the latency added will be whatever the longest response time is.
A better user experience would to stream responses to the user and display these intermediate results. However, Runchat doesn’t currently support intermediate responses from nodes, so this would be a lot of overhead to integrate.
Another idea would be to check if the prompt node calls any functions, and then simply add these runchat nodes to the canvas and run them. The advantage of this approach is it would help with discoverability for users developing their own runchats, but the disadvantage is that this pattern would be difficult to implement in the headless api.
As an intermediate step we could have the prompt node output the object that it wants to use to call the function and display this to the user. The user could then go and create this node themselves leading to an OK UX for building workflows with LLM guidance. Node outputs look a bit like this:
And, if we want to use this to actually plug into a runchat node, then we need to grab the args object and create parameters for each of its values:
Not too bad. Because we always expect function arguments in the input
object we can make a Runchat node that performs our parameter grabbing operation for us and avoids the need for users to write code. Sometimes the models will split the task up into several function requests, and we want to ensure that we handle this and pass lists to Runchat input parameters if required. Interestingly, the Llama models have more of a tendency to create verbose lists than Gemini.
Hang on, this could be a Runchat Copilot!
One of the issues we have with Runchat is that non-developers can find some of the nodes difficult to use because of the nomenclature. What is “format” exactly? What is an api method? etc. If we also passed our four building block nodes (Prompt, API, Code and Image) as tools then we could have the prompt node format api requests for us, or generate images, or even write code directly. This seems like a nice value add so let’s implement that and see how it feels:
Which works pretty well for basic requests. It would be just as easy to wrap the API node in some other Runchat that provides more specific notes on how to use it, hard codes variables (like api keys) into the api node rather than exposing them as parameters and so on, but as a starting point to help users work with nodes this is useful. For example, now we can head to fal.ai, copy a cURL command and connect everything straight through to the api node:
Does Function Calling solve new problems?
Not yet. There are several reasons for this:
Using functions / tools is still unreliable, especially when using multiple tools
Orchestrating several tool calls still requires a lot of planning
Creating inputs to nodes is a bandaid solution as a copilot: we really want Runchat to simple create these nodes for us.
These create a few tantalizing opportunities for design and development for the coming weeks.