Type4Me macOS Voice Input
name: type4me-macos-voice-input
by adisinghstudent · published 2026-04-01
$ claw add gh:adisinghstudent/adisinghstudent-type4me-macos-voice-input---
name: type4me-macos-voice-input
description: MacOS voice input tool with local/cloud ASR engines, LLM text optimization, and fully local storage built in Swift
triggers:
- add a new ASR provider to type4me
- build and deploy type4me from source
- configure local voice recognition with sherpa
- set up volcengine speech recognition
- add custom prompt mode for voice input
- implement speech recognizer protocol
- troubleshoot type4me voice input not working
- extend type4me with new cloud ASR service
---
# Type4Me macOS Voice Input
> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.
Type4Me is a macOS voice input tool that captures audio via global hotkey, transcribes it using local (SherpaOnnx/Paraformer/Zipformer) or cloud (Volcengine/Deepgram) ASR engines, optionally post-processes text via LLM, and injects the result into any app. All credentials and history are stored locally — no telemetry, no cloud sync.
Architecture Overview
Type4Me/
├── ASR/ # ASR engine abstraction
│ ├── ASRProvider.swift # Provider enum + protocols
│ ├── ASRProviderRegistry.swift # Plugin registry
│ ├── Providers/ # Per-vendor config files
│ ├── SherpaASRClient.swift # Local streaming ASR
│ ├── SherpaOfflineASRClient.swift
│ ├── VolcASRClient.swift # Volcengine streaming ASR
│ └── DeepgramASRClient.swift # Deepgram streaming ASR
├── Bridge/ # SherpaOnnx C API Swift bridge
├── Audio/ # Audio capture
├── Session/ # Core state machine: record→ASR→inject
├── Input/ # Global hotkey management
├── Services/ # Credentials, hotwords, model manager
├── Protocol/ # Volcengine WebSocket codec
└── UI/ # SwiftUI (FloatingBar + Settings)Installation
Prerequisites
# Xcode Command Line Tools
xcode-select --install
# CMake (for local ASR engine)
brew install cmakeBuild & Deploy from Source
git clone https://github.com/joewongjc/type4me.git
cd type4me
# Step 1: Compile SherpaOnnx local engine (~5 min, one-time)
bash scripts/build-sherpa.sh
# Step 2: Build, bundle, sign, install to /Applications, and launch
bash scripts/deploy.shDownload Pre-built App
Download `Type4Me-v1.2.3.dmg` from releases (cloud ASR only, no local engine):
https://github.com/joewongjc/type4me/releases/tag/v1.2.3If macOS blocks the app:
xattr -d com.apple.quarantine /Applications/Type4Me.appDownload Local ASR Models
mkdir -p ~/Library/Application\ Support/Type4Me/Models
# Option A: Lightweight ~20MB
tar xjf ~/Downloads/sherpa-onnx-streaming-zipformer-small-ctc-zh-int8-2025-04-01.tar.bz2 \
-C ~/Library/Application\ Support/Type4Me/Models/
# Option B: Balanced ~236MB (recommended)
tar xjf ~/Downloads/sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2 \
-C ~/Library/Application\ Support/Type4Me/Models/
# Option C: Bilingual Chinese+English ~1GB
tar xjf ~/Downloads/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2 \
-C ~/Library/Application\ Support/Type4Me/Models/Expected structure for Paraformer model:
~/Library/Application Support/Type4Me/Models/
└── sherpa-onnx-streaming-paraformer-bilingual-zh-en/
├── encoder.int8.onnx
├── decoder.int8.onnx
└── tokens.txtKey Protocols
SpeechRecognizer Protocol
Every ASR client must implement this protocol:
protocol SpeechRecognizer: AnyObject {
/// Start a new recognition session
func startRecognition() async throws
/// Feed raw PCM audio data
func appendAudio(_ buffer: AVAudioPCMBuffer) async
/// Stop and get final result
func stopRecognition() async throws -> String
/// Cancel without result
func cancelRecognition() async
/// Streaming partial results (optional)
var partialResultHandler: ((String) -> Void)? { get set }
}ASRProviderConfig Protocol
Each vendor's credential definition:
protocol ASRProviderConfig {
/// Unique identifier string
static var providerID: String { get }
/// Display name in Settings UI
static var displayName: String { get }
/// Credential fields shown in Settings
static var credentialFields: [CredentialField] { get }
/// Validate credentials before use
static func validate(_ credentials: [String: String]) -> Bool
/// Create the recognizer instance
static func createClient(
credentials: [String: String],
config: RecognitionConfig
) throws -> SpeechRecognizer
}Adding a New ASR Provider
Step 1: Create Provider Config
Create `Type4Me/ASR/Providers/OpenAIWhisperProvider.swift`:
import Foundation
struct OpenAIWhisperProvider: ASRProviderConfig {
static let providerID = "openai_whisper"
static let displayName = "OpenAI Whisper"
static let credentialFields: [CredentialField] = [
CredentialField(
key: "api_key",
label: "API Key",
placeholder: "sk-...",
isSecret: true
),
CredentialField(
key: "model",
label: "Model",
placeholder: "whisper-1",
isSecret: false
)
]
static func validate(_ credentials: [String: String]) -> Bool {
guard let apiKey = credentials["api_key"], !apiKey.isEmpty else {
return false
}
return apiKey.hasPrefix("sk-")
}
static func createClient(
credentials: [String: String],
config: RecognitionConfig
) throws -> SpeechRecognizer {
guard let apiKey = credentials["api_key"] else {
throw ASRError.missingCredential("api_key")
}
let model = credentials["model"] ?? "whisper-1"
return OpenAIWhisperASRClient(apiKey: apiKey, model: model, config: config)
}
}Step 2: Implement the ASR Client
Create `Type4Me/ASR/OpenAIWhisperASRClient.swift`:
import Foundation
import AVFoundation
final class OpenAIWhisperASRClient: SpeechRecognizer {
var partialResultHandler: ((String) -> Void)?
private let apiKey: String
private let model: String
private let config: RecognitionConfig
private var audioData: Data = Data()
init(apiKey: String, model: String, config: RecognitionConfig) {
self.apiKey = apiKey
self.model = model
self.config = config
}
func startRecognition() async throws {
audioData = Data()
}
func appendAudio(_ buffer: AVAudioPCMBuffer) async {
// Convert PCM buffer to raw bytes and accumulate
guard let channelData = buffer.floatChannelData?[0] else { return }
let frameCount = Int(buffer.frameLength)
let bytes = UnsafeBufferPointer(start: channelData, count: frameCount)
// Convert Float32 PCM to Int16 for Whisper API
let int16Samples = bytes.map { sample -> Int16 in
return Int16(max(-32768, min(32767, Int(sample * 32767))))
}
int16Samples.withUnsafeBytes { ptr in
audioData.append(contentsOf: ptr)
}
}
func stopRecognition() async throws -> String {
// Build multipart form request to Whisper API
var request = URLRequest(url: URL(string: "https://api.openai.com/v1/audio/transcriptions")!)
request.httpMethod = "POST"
request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
let boundary = UUID().uuidString
request.setValue("multipart/form-data; boundary=\(boundary)",
forHTTPHeaderField: "Content-Type")
var body = Data()
// Append audio file part
body.append("--\(boundary)\r\n".data(using: .utf8)!)
body.append("Content-Disposition: form-data; name=\"file\"; filename=\"audio.raw\"\r\n".data(using: .utf8)!)
body.append("Content-Type: audio/raw\r\n\r\n".data(using: .utf8)!)
body.append(audioData)
body.append("\r\n".data(using: .utf8)!)
// Append model part
body.append("--\(boundary)\r\n".data(using: .utf8)!)
body.append("Content-Disposition: form-data; name=\"model\"\r\n\r\n".data(using: .utf8)!)
body.append("\(model)\r\n".data(using: .utf8)!)
body.append("--\(boundary)--\r\n".data(using: .utf8)!)
request.httpBody = body
let (data, response) = try await URLSession.shared.data(for: request)
guard let httpResponse = response as? HTTPURLResponse,
httpResponse.statusCode == 200 else {
throw ASRError.networkError("Whisper API returned error")
}
let result = try JSONDecoder().decode(WhisperResponse.self, from: data)
return result.text
}
func cancelRecognition() async {
audioData = Data()
}
}
private struct WhisperResponse: Codable {
let text: String
}Step 3: Register the Provider
In `Type4Me/ASR/ASRProviderRegistry.swift`, add to the `all` array:
struct ASRProviderRegistry {
static let all: [any ASRProviderConfig.Type] = [
SherpaParaformerProvider.self,
VolcengineProvider.self,
DeepgramProvider.self,
OpenAIWhisperProvider.self, // ← Add your provider here
]
}Credentials Storage
Credentials are stored at `~/Library/Application Support/Type4Me/credentials.json` with permissions `0600`. Never hardcode secrets — always load via `CredentialStore`:
// Reading credentials
let store = CredentialStore.shared
let apiKey = store.get(providerID: "openai_whisper", key: "api_key")
// Writing credentials
store.set(providerID: "openai_whisper", key: "api_key", value: userInputKey)
// Checking if configured
let isConfigured = store.isConfigured(providerID: "openai_whisper",
fields: OpenAIWhisperProvider.credentialFields)Custom Processing Modes with Prompt Variables
Processing modes use LLM post-processing with three context variables:
| Variable | Value |
|---|---|
| `{text}` | Recognized speech text |
| `{selected}` | Text selected in active app at record start |
| `{clipboard}` | Clipboard content at record start |
Example custom mode prompts:
// Translate selection using voice command
let translatePrompt = """
The user selected this text: {selected}
Voice command: {text}
Execute the command on the selected text. Output only the result.
"""
// Code review via voice
let codeReviewPrompt = """
Code to review:
{clipboard}
Review instruction: {text}
Provide focused feedback addressing the instruction.
"""
// Email reply drafting
let emailPrompt = """
Original email: {selected}
My reply intent (spoken): {text}
Write a professional email reply. Output only the email body.
"""Built-in Processing Modes
enum ProcessingMode {
case fast // Direct ASR output, zero latency
case performance // Dual-channel: streaming + offline refinement
case englishTranslation // Chinese speech → English text
case promptOptimize // Raw prompt → optimized prompt via LLM
case command // Voice command + selected/clipboard context → LLM action
case custom(prompt: String) // User-defined prompt template
}Session State Machine
The core recording flow in `Session/`:
[Idle]
→ hotkey pressed → [Recording] → audio streams to ASR client
→ hotkey released/pressed again → [Processing]
→ ASR returns text → [LLM Post-processing] (if mode requires)
→ [Injecting] → text injected into active app
→ [Idle]Updating After Source Changes
cd type4me
git pull
bash scripts/deploy.sh
# SherpaOnnx does NOT need recompiling unless engine version changedTroubleshooting
App won't open (security warning)
xattr -d com.apple.quarantine /Applications/Type4Me.appLocal model not recognized in Settings
Verify the directory structure exactly matches:
ls ~/Library/Application\ Support/Type4Me/Models/sherpa-onnx-streaming-paraformer-bilingual-zh-en/
# Must show: encoder.int8.onnx decoder.int8.onnx tokens.txtSherpaOnnx build fails
# Ensure cmake is installed
brew install cmake
# Clean and retry
rm -rf Frameworks/
bash scripts/build-sherpa.shNew ASR provider not appearing in Settings
Audio not captured / no floating bar
Credentials not saving
# Check file exists and has correct permissions
ls -la ~/Library/Application\ Support/Type4Me/credentials.json
# Should show: -rw------- (0600)
# Fix permissions if needed:
chmod 0600 ~/Library/Application\ Support/Type4Me/credentials.jsonExport history to CSV
Open Settings → History → select date range → Export CSV. The SQLite database is at:
~/Library/Application\ Support/Type4Me/history.db
# Direct query:
sqlite3 ~/Library/Application\ Support/Type4Me/history.db \
"SELECT datetime(timestamp,'unixepoch'), text FROM records ORDER BY timestamp DESC LIMIT 20;"System Requirements
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...