Introducing EdgeRunner Command: A 7B Function Calling Model for Air-Gapped Workflow Execution

We’re excited to announce the release of EdgeRunner Command, a SOTA 7B parameter language model designed specifically for function calling. Initialized from EdgeRunner-Tactical-7B, EdgeRunner Command offers performance comparable to much larger models while maintaining efficiency and speed at the edge.

What We’re Solving For:

Currently, LLMs are limited to the information contained within their pre-defined datasets. However, with the advent of function calling, these models can now perform tasks that extend beyond their static training data. This advancement enables language models to interact with external systems via API calls, enhancing their utility. Function calling allows models to invoke and execute predefined functions, thereby streamlining workflows and significantly expanding their range of applications.

Some examples of predefined functions:

  • API calls (both internal and external)
  • Better data strategy per augmented database operations and ability to access real time data
  • Context capabilities - can help maintain longer-term context by storing and retrieving relevant information, enhancing continuity of responses
  • Expert in predefined tasks (functions)

How This Works:

  1. Task Identification: The LLM detects when a task requires using an external API or executing specific code.
  2. Function Selection: It identifies the appropriate function to execute from a predefined set of available functions.
  3. Parameter Assignment: The LLM determines and assigns the correct parameters needed for the chosen function.
  4. Interaction and Execution: It communicates with the external tool or API, passing the parameters and retrieving the results of the function call.
  5. Output Generation: Finally, the LLM produces formatted text output that developers can use to integrate with other parts of their system from natural language prompts.

Highlights:

  • Advanced Function Calling: Excels at interpreting, executing, and chaining function calls, such as image generation, browser integration, and external APIs.
  • Dual-Mode Functionality:
    • Tool Router: Can serve as intelligent middleware for request analysis and routing.
    • Standalone Chat Agent: Capable of human-like conversations and independent task completion.
  • Efficient Performance: Delivers rapid response times, suitable for real-time applications and resource-constrained environments at the edge.
  • Guaranteed Helpful Ability: Achieved strong scores on popular benchmarks, including Arena Hard Benchmark and MT-Bench.

We’re releasing EdgeRunner Command under an Apache 2.0 license. It can be used without restriction and found on our Hugging Face page here.

Training Method and Data:

For training, we gathered, synthesized, and filtered data to compile around ~200,000 samples. Our function-calling dataset is organized into several key categories following the Berkeley Function-Calling Leaderboard.

  • Simple Function: Focuses on evaluating a single function call from one JSON function document.
  • Multiple Function: Involves selecting the most appropriate function to invoke from 2 to 4 JSON documents.
  • Parallel Function: Requires handling multiple function calls in parallel based on a single user query.
  • Parallel Multiple Function: Combines both parallel and multiple function scenarios, adding complexity.
  • Function Relevance: Ability to recognize when no function call is necessary, preventing unnecessary or hallucinated outputs.

We fine-tuned the EdgeRunner Tactical using Supervised Fine-Tuning (SFT), ensuring a solid foundation for function call handling. We constructed a Direct Preference Optimization (DPO) dataset to further refine the model. During the DPO phase, we meticulously annotated common mistakes such as the wrong number of functions, incorrect variable names, or calling the wrong function as rejected responses. This rigorous approach allowed us to train the model to avoid these errors, resulting in a final version that excels in managing diverse and complex function calling scenarios with high accuracy.

Benchmarking:

Our model was evaluated on the Berkeley Function-Calling Leaderboard Benchmark, achieving the following scores across different categories:

Function Calling Task

Accuracy ( % )

Multiple Function

94

Parallel Multiple Function

83

Parallel Function

77

Simple

91

Our model achieved a strong overall score of 86.25% on the AST Summary task, positioning it among the top-performing models on the Berkeley Function-Calling Leaderboard. Compared to the "arcee-ai/Arcee-Agent," which also utilizes the Qwen2-7B as its base model and attained a score of 82.76%, our model significantly outperforms.

Other Benchmarks:

Benchmark

Score

Arena Hard

31.99

MMLU-Redux

67.82

GSM

80.89

MT-Bench

8.32

We welcome developers, researchers, and commercial partners to leverage EdgeRunner Command’s function calling capabilities for the edge. The Hugging Face model card is available here.