Machine learning workloads have always had an infrastructure problem. The data lives on one machine, the compute lives on another, the researcher is on a third, the results need to go somewhere else. And coordinating all of that — getting data to the training node, launching the job, monitoring progress, retrieving results — is a substantial piece of operational overhead that has nothing to do with the actual research or engineering.

I want to explain why I think the combination of the Smart Automation Framework and the Agentic API is particularly well-suited to this problem — and work through some concrete scenarios to make the argument tangible.

The Core Structural Problem in ML Infrastructure

In a typical ML workflow, you have some combination of the following:

  • Raw data stored on local lab servers, edge devices, or researcher workstations — rarely in a central cloud bucket
  • GPU compute that is either on-premises hardware (expensive to provision in the cloud at scale), a shared university cluster, or local workstations with gaming GPUs that happen to be powerful enough for mid-sized training runs
  • A researcher who is not always physically present at the machine where the training is happening
  • A need to run multiple experiments in sequence or in parallel, retrieve results, compare metrics, and iterate

The traditional answer to this is a combination of SSH tunnels, manual scp transfers, screen sessions, and a lot of hoping nothing breaks overnight. It works, technically. But it is fragile, requires constant attention, and does not scale.

The Agentic API as the Access Fabric for ML

The Agentic API solves the access problem cleanly. Register your GPU workstation, your lab server, or your edge device with awaBerry. Create a project with a Project Key and a precisely scoped set of permissions — the directories your training scripts need to read from, the commands they are allowed to execute, whether root access is required. You now have programmatic, authenticated access to that device from anywhere, over an outbound-only HTTPS tunnel with no open firewall ports.

What this means for ML is that your training node does not need to be on the same network as you, or even in the same country. You access it exactly the same way whether it is on your desk or in a remote lab. And crucially, you can grant the Smart Automation Framework access to it as part of an automated workflow — which is where things get interesting.

The Smart Automation Framework as the ML Orchestration Layer

If the Agentic API is the access fabric, the Smart Automation Framework is the orchestration brain. It handles the logic: what data to move, where to move it, which training command to run, how to monitor progress, when to retrieve results, and what to do with them.

Here is a concrete scenario. Suppose you are running a series of hyperparameter tuning experiments on a GPU workstation in your lab. Each experiment takes several hours. You want to:

  1. Prepare the training data by pulling the latest version from a data collection device
  2. Launch the training run with a specific configuration
  3. Monitor the training log until the run completes
  4. Evaluate the resulting model on a validation set
  5. Write the metrics to a results file, including the configuration that produced them
  6. Start the next experiment automatically

In the traditional approach, this requires a person to be available at each transition point, or a significant investment in custom MLOps tooling. With the Smart Automation Framework and Agentic API combined, you describe this workflow in plain English, the framework writes the orchestration logic, and it runs unattended — over the zero-trust tunnel, on the actual hardware, without you being present.

Distributed Data Collection for Training Sets

One of the ML use cases I find most compelling is distributed data collection. If your training data comes from sensors, log files, or application outputs on multiple remote devices — a common situation in IoT ML, federated research, and healthcare AI — the Agentic API gives you a way to access each of those devices programmatically, and the Smart Automation Framework gives you a way to automate the collection.

A framework project on a central aggregation machine can, on a schedule:

  • Connect to each registered data source device via the Agentic API
  • Pull the latest data files from the configured directory
  • Apply preprocessing steps — normalisation, format conversion, outlier filtering — locally on the aggregation machine
  • Append to a growing training dataset, ready for the next training run

The data never leaves the controlled infrastructure. The preprocessing runs locally. The whole pipeline runs on a schedule with no manual involvement after initial setup.

Automated Model Evaluation and Result Reporting

The evaluation phase of ML work is often just as tedious as the training phase. Running the validation suite, collecting metrics, formatting a comparison table, archiving the model checkpoint — these are straightforward tasks that nonetheless take time and attention away from the actual analysis.

The framework handles evaluation pipelines well. Describe what you want: run the test suite against the model checkpoint in the specified directory, extract the relevant metrics (accuracy, F1, loss, latency), write a summary to the results log, and archive the checkpoint if the metrics exceed a defined threshold. The framework generates the script, and it runs automatically after each training job completes.

Why This Matters at Scale

A single researcher running experiments on a single machine does not necessarily need all of this automation. But the moment you have multiple machines, multiple researchers, or multiple simultaneous experiments, the operational overhead of manual coordination grows quickly. The combination of the Smart Automation Framework and the Agentic API gives you an ML infrastructure that scales without scaling your operational overhead — because the tedious coordination work runs itself.

This is what I mean when I say version 2 changes the nature of what awaBerry is. It is not just remote access. It is the infrastructure layer that makes your devices autonomous — including the devices doing your machine learning.

Explore the Power of Combination →