PMetal
High-performance LLM fine-tuning built for Apple Silicon. 18-crate Rust framework with native Metal GPU and Apple Neural Engine support. LoRA, QLoRA, DoRA, GRPO, knowledge distillation, and 20+ architectures — all on your hardware.
Own Your AI
No cloud dependency. No per-token fees. No data leaving your hardware. Fine-tune production LLMs at the cost of electricity.
Data Never Leaves
Your training data, your model weights, your hardware. On-premises by design with zero telemetry.
Cost = Electricity
No per-token API charges, no subscription fees, no cloud egress costs. Run on Mac hardware you already own.
Apple Silicon Native
Optimized for M1 through M5. Metal GPU kernels and Apple Neural Engine acceleration auto-detected at runtime.
Production-Ready
Enterprise security, distributed training, quantization, and model merging — not a research prototype.
Fine-Tuning Methods
State-of-the-art parameter-efficient fine-tuning with sequence packing and reasoning training. Every method runs natively on Metal.
LoRA / QLoRA / DoRA
Full suite of parameter-efficient fine-tuning methods with sequence packing for maximum GPU utilization on Apple Silicon.
GRPO / DAPO Reasoning
Group Relative Policy Optimization and Direct Alignment from Preference with custom reward function support for reasoning model training.
Knowledge Distillation
Transfer knowledge from large teacher models to efficient student models with multiple distillation strategies and RLKD support.
20+ Model Architectures
First-class support for all major LLM families with architecture-specific optimizations baked into Metal kernels.
Quantization & GGUF
Export fine-tuned models to GGUF with 13 quantization format options. Directly compatible with llama.cpp and Ollama.
Model Merging
Combine multiple fine-tuned adapters or base models using 12 merge strategies including TIES, DARE, and linear interpolation.
Every Interface, Every Workflow
From polished desktop GUI to scriptable Python SDK — use PMetal the way your workflow demands.
Desktop GUI
Native macOS desktop application built with Tauri and Svelte. Visual training dashboards, real-time loss curves, hyperparameter controls, and model management — no terminal required.
Terminal TUI
Full-featured terminal user interface with 9 dedicated tabs covering training, evaluation, hardware monitoring, logs, and more. Keyboard-driven with vi-style navigation.
CLI
Comprehensive command-line interface with 20+ commands for scripting, CI/CD pipelines, and headless server workflows. Shell completion included.
Python SDK
Pythonic interface for Jupyter notebooks, research scripts, and ML pipelines. Full feature parity with the Rust core through PyO3 bindings.
Built for Apple Silicon
PMetal doesn't use Metal as an afterthought — the entire training pipeline is designed around the unified memory architecture of M-series chips.
Metal GPU Kernels
Custom Metal Shading Language kernels for matrix multiplication, attention, and gradient computation. Auto-tuned per chip generation.
Apple Neural Engine
Offload inference and certain training operations to the ANE. Runtime detection routes operations to the fastest compute unit.
Unified Memory
Full utilization of Apple Silicon unified memory — no host↔device copies. 16–192 GB addressable depending on your Mac.
Chip Generation Support
Runtime auto-detection selects the optimal execution strategy per chip.
18-Crate Modular Architecture
Every concern is a focused crate. Compose exactly what your project needs.
Main facade crate
Metal GPU backend
Apple Neural Engine
LoRA / QLoRA / DoRA
GRPO / DAPO training
Knowledge distillation
Quantization & GGUF
Model merging
Distributed training
Architecture definitions
Dataset pipeline
Tauri desktop app
Terminal interface
Command-line tools
Python bindings
Evaluation suite
Checkpoint management
Metrics & logging
Scale Across Multiple Macs
Connect multiple Apple Silicon machines over your local network with zero configuration. mDNS discovery finds peers automatically; Ring All-Reduce synchronizes gradients efficiently.
Quickstart
From zero to fine-tuned model in minutes. Metal acceleration is automatic.
# Install via cargo
cargo add pmetal
# Or add to Cargo.toml
[dependencies]
pmetal = "0.3"
# Install the CLI tool
cargo install pmetal-cli
# Verify installation + hardware detection
pmetal info
# Output:
# PMetal v0.3.13
# Hardware: Apple M3 Max
# Metal GPU: 40-core GPU (detected)
# Neural Engine: 16-core ANE (detected)
# Unified Memory: 128 GB
# Compute Strategy: Metal + ANE hybridEnterprise Security by Default
Every PMetal deployment is an air-gapped deployment. There is no opt-out because there is no cloud component to opt out of.
Zero Telemetry
PMetal makes zero outbound network requests. No usage metrics, no crash reports, no model weight syncing. Offline-first.
Air-Gapped Ready
Works entirely offline after initial model download. No CDN dependencies, no license checks, no cloud validation.
Data Stays Local
Training data, fine-tuned weights, evaluation results — everything stays on disk under your control.
MIT / Apache-2.0
Dual-licensed. Use commercially, modify freely, audit the full source. No proprietary blobs or binary-only components.
Auditable Codebase
18 focused crates means every piece of the stack is inspectable, replaceable, and independently verifiable.
No Subscription
Buy or build your hardware once. Fine-tune as many times as you need. Your cost model is electricity, nothing else.
Ready to own your AI?
PMetal is open-source and ready today. Talk to us about custom deployment, enterprise support, or fine-tuning your specific domain.