What if we're not sure what AI can do for us?

That's exactly what the free strategy session is for. We map your workflows, identify the highest-ROI opportunities, and give you a concrete picture.

How long does implementation take?

Most projects go from concept to production in weeks, not months.

Yes. Enterprise-grade security with SOC 2 compliance, end-to-end encryption, and full audit trails. For LLM fine-tuning, your data never leaves your hardware.

Apple Silicon LLM Fine-Tuning

PMetal

High-performance LLM fine-tuning built for Apple Silicon. 18-crate Rust framework with native Metal GPU and Apple Neural Engine support. LoRA, QLoRA, DoRA, GRPO, knowledge distillation, and 20+ architectures — all on your hardware.

View on GitHub View on Crates.io

20+

LLM Architectures

Specialized Crates

Metal

GPU + ANE Native

v0.3.13

Current Version

Own Your AI

No cloud dependency. No per-token fees. No data leaving your hardware. Fine-tune production LLMs at the cost of electricity.

Data Never Leaves

Your training data, your model weights, your hardware. On-premises by design with zero telemetry.

Cost = Electricity

No per-token API charges, no subscription fees, no cloud egress costs. Run on Mac hardware you already own.

Apple Silicon Native

Optimized for M1 through M5. Metal GPU kernels and Apple Neural Engine acceleration auto-detected at runtime.

Production-Ready

Enterprise security, distributed training, quantization, and model merging — not a research prototype.

Fine-Tuning Methods

State-of-the-art parameter-efficient fine-tuning with sequence packing and reasoning training. Every method runs natively on Metal.

LoRA / QLoRA / DoRA

Full suite of parameter-efficient fine-tuning methods with sequence packing for maximum GPU utilization on Apple Silicon.

Low-Rank Adaptation (LoRA)

Quantized LoRA (QLoRA)

Weight-Decomposed LoRA (DoRA)

Sequence packing

Rank-adaptive training

Metal-accelerated adapters

GRPO / DAPO Reasoning

Group Relative Policy Optimization and Direct Alignment from Preference with custom reward function support for reasoning model training.

GRPO implementation

DAPO alignment

Custom reward functions

Verifiable reward signals

Reasoning chain training

RLHF pipelines

Knowledge Distillation

Transfer knowledge from large teacher models to efficient student models with multiple distillation strategies and RLKD support.

Online distillation

Offline distillation

Progressive distillation

Reinforcement Learning KD (RLKD)

Intermediate layer matching

Response-based distillation

20+ Model Architectures

First-class support for all major LLM families with architecture-specific optimizations baked into Metal kernels.

Llama 3.x series

Qwen 2.5 / QwQ

DeepSeek R1 / V3

Mistral / Mixtral

Gemma 2 / 3

15+ additional architectures

Quantization & GGUF

Export fine-tuned models to GGUF with 13 quantization format options. Directly compatible with llama.cpp and Ollama.

13 GGUF format variants

Q4_K_M, Q8_0, F16, BF16

Dynamic quantization

llama.cpp compatibility

Ollama-ready exports

Size vs quality tradeoffs

Model Merging

Combine multiple fine-tuned adapters or base models using 12 merge strategies including TIES, DARE, and linear interpolation.

12 merge strategies

TIES merging

DARE pruning

Linear interpolation

Task arithmetic

Model soup ensembles

Every Interface, Every Workflow

From polished desktop GUI to scriptable Python SDK — use PMetal the way your workflow demands.

Tauri + Svelte

Desktop GUI

Native macOS desktop application built with Tauri and Svelte. Visual training dashboards, real-time loss curves, hyperparameter controls, and model management — no terminal required.

Real-time training visualization

Hyperparameter configuration UI

Dataset management

Checkpoint browser

Export wizard

Native macOS integration

9 tabs

Terminal TUI

Full-featured terminal user interface with 9 dedicated tabs covering training, evaluation, hardware monitoring, logs, and more. Keyboard-driven with vi-style navigation.

9 specialized tabs

Training progress & metrics

Hardware utilization monitor

Log viewer with filtering

Dataset inspector

vi-style keyboard navigation

20+ commands

CLI

Comprehensive command-line interface with 20+ commands for scripting, CI/CD pipelines, and headless server workflows. Shell completion included.

20+ subcommands

Shell completion (zsh/bash/fish)

JSON output mode

Config file support

Environment variable overrides

Scriptable & composable

pip install pmetal

Python SDK

Pythonic interface for Jupyter notebooks, research scripts, and ML pipelines. Full feature parity with the Rust core through PyO3 bindings.

PyO3 native bindings

Jupyter notebook support

Async/await API

HuggingFace datasets integration

Weights & Biases logging

Type hints throughout

Built for Apple Silicon

PMetal doesn't use Metal as an afterthought — the entire training pipeline is designed around the unified memory architecture of M-series chips.

Metal GPU Kernels

Custom Metal Shading Language kernels for matrix multiplication, attention, and gradient computation. Auto-tuned per chip generation.

M1 through M5MSL kernelsAuto-tuned

Apple Neural Engine

Offload inference and certain training operations to the ANE. Runtime detection routes operations to the fastest compute unit.

ANE offloadingRuntime routingZero config

Unified Memory

Full utilization of Apple Silicon unified memory — no host↔device copies. 16–192 GB addressable depending on your Mac.

Zero copyUp to 192 GBShared bandwidth

Chip Generation Support

Runtime auto-detection selects the optimal execution strategy per chip.

M1 / M1 Pro / Max / Ultra

M2 / M2 Pro / Max / Ultra

M3 / M3 Pro / Max

M4 / M4 Pro / Max

M5 (preview)

18-Crate Modular Architecture

Every concern is a focused crate. Compose exactly what your project needs.

pmetal

Main facade crate

Prelude re-exports

Feature flags

Ergonomic APIs

pmetal-metal

Metal GPU backend

MSL compute kernels

Command queue mgmt

Chip auto-detection

pmetal-ane

Apple Neural Engine

ANE operation routing

CoreML bridge

Runtime fallback

pmetal-lora

LoRA / QLoRA / DoRA

Adapter injection

Rank scheduling

Sequence packing

pmetal-grpo

GRPO / DAPO training

Policy gradient

Custom rewards

Group sampling

pmetal-distill

Knowledge distillation

Online / offline

Progressive stages

RLKD support

pmetal-quant

Quantization & GGUF

13 GGUF formats

Dynamic quant

Export pipelines

pmetal-merge

Model merging

12 strategies

TIES / DARE

Task arithmetic

pmetal-dist

Distributed training

mDNS discovery

Ring All-Reduce

Fault tolerance

pmetal-models

Architecture definitions

20+ architectures

Weight loaders

Attention variants

pmetal-data

Dataset pipeline

Streaming loaders

Tokenizer support

Augmentation

pmetal-gui

Tauri desktop app

Svelte frontend

Training dashboards

Checkpoint browser

pmetal-tui

Terminal interface

9 tab layout

Ratatui backend

vi navigation

pmetal-cli

Command-line tools

20+ subcommands

Shell completions

JSON output

pmetal-py

Python bindings

PyO3 bridge

Async support

HuggingFace compat

pmetal-eval

Evaluation suite

Benchmark runners

Perplexity

Task metrics

pmetal-checkpoint

Checkpoint management

SafeTensors format

Resume training

Version tracking

pmetal-telemetry

Metrics & logging

Prometheus metrics

W&B integration

TensorBoard

Distributed Training

Scale Across Multiple Macs

Connect multiple Apple Silicon machines over your local network with zero configuration. mDNS discovery finds peers automatically; Ring All-Reduce synchronizes gradients efficiently.

Zero-Config Discovery: mDNS finds peers on your LAN automatically — no static IPs, no manual configuration.

Ring All-Reduce: Bandwidth-efficient gradient synchronization scales linearly with the number of nodes.

Fault Tolerance: Automatic checkpoint recovery and peer re-connection if a node drops during training.

Mixed Hardware: Combine M1, M2, M3, M4, and M5 machines in the same training cluster.

Node Discovery

mDNS Automatic

Gradient Sync

Ring All-Reduce

Fault Recovery

Checkpoint Resume

Topology

Peer-to-Peer Mesh

On-Premises

Your data never leaves your network. Cost = electricity.

Quickstart

From zero to fine-tuned model in minutes. Metal acceleration is automatic.

terminal

Install PMetal and verify Apple Silicon detection

# Install via cargo
cargo add pmetal

# Or add to Cargo.toml
[dependencies]
pmetal = "0.3"

# Install the CLI tool
cargo install pmetal-cli

# Verify installation + hardware detection
pmetal info

# Output:
# PMetal v0.3.13
# Hardware: Apple M3 Max
# Metal GPU: 40-core GPU (detected)
# Neural Engine: 16-core ANE (detected)
# Unified Memory: 128 GB
# Compute Strategy: Metal + ANE hybrid

Enterprise Security by Default

Every PMetal deployment is an air-gapped deployment. There is no opt-out because there is no cloud component to opt out of.

Zero Telemetry

PMetal makes zero outbound network requests. No usage metrics, no crash reports, no model weight syncing. Offline-first.

Air-Gapped Ready

Works entirely offline after initial model download. No CDN dependencies, no license checks, no cloud validation.

Data Stays Local

Training data, fine-tuned weights, evaluation results — everything stays on disk under your control.

MIT / Apache-2.0

Dual-licensed. Use commercially, modify freely, audit the full source. No proprietary blobs or binary-only components.

Auditable Codebase

18 focused crates means every piece of the stack is inspectable, replaceable, and independently verifiable.

No Subscription

Buy or build your hardware once. Fine-tune as many times as you need. Your cost model is electricity, nothing else.

Ready to own your AI?

PMetal is open-source and ready today. Talk to us about custom deployment, enterprise support, or fine-tuning your specific domain.

View on GitHub Talk to Us