HIPAA Compliance Auditor
name: hipaa-compliance-auditor
by aipoch-ai · published 2026-04-01
$ claw add gh:aipoch-ai/aipoch-ai-hipaa-compliance-auditor---
name: hipaa-compliance-auditor
description: Automatically detect and de-identify PII (Personal Identifiable Information)
and PHI (Protected Health Information) from clinical/medical text to ensure HIPAA
compliance. Trigger when processing medical records, patient data, clinical notes,
insurance information, or any healthcare-related text containing potential patient
identifiers.
version: 1.0.0
category: Clinical
tags: []
author: AIPOCH
license: MIT
status: Draft
risk_level: Medium
skill_type: Tool/Script
owner: AIPOCH
reviewer: ''
last_updated: '2026-02-06'
---
# HIPAA Compliance Auditor
A clinical-grade PII/PHI detection and de-identification tool for healthcare text data.
Overview
This skill analyzes text for HIPAA-protected identifiers and automatically redacts or anonymizes them. It uses a combination of regex patterns, NLP entity recognition, and contextual analysis to identify 18 HIPAA identifier categories.
Features
Usage
Command Line
python scripts/main.py --input "patient_text.txt" --output "deidentified.txt"
python scripts/main.py --text "Patient John Doe, SSN 123-45-6789..." --audit-log audit.jsonPython API
from scripts.main import HIPAAAuditor
auditor = HIPAAAuditor()
result = auditor.deidentify("Patient John Doe was admitted on 2024-01-15...")
print(result.cleaned_text) # De-identified output
print(result.detected_pii) # List of found PII entitiesParameters
| Parameter | Type | Default | Required | Description |
|-----------|------|---------|----------|-------------|
| `--input`, `-i` | string | - | No | Path to input text file |
| `--text` | string | - | No | Direct text input (alternative to file) |
| `--output`, `-o` | string | - | No | Path for de-identified output file |
| `--audit-log` | string | - | No | Path for JSON audit log |
| `--confidence` | float | 0.7 | No | Minimum confidence threshold (0.0-1.0) |
| `--preserve-structure` | bool | true | No | Maintain document structure |
| `--custom-patterns` | string | - | No | Path to custom regex patterns JSON |
HIPAA Identifier Categories Detected
1. Names (patient, relatives, employers)
2. Geographic subdivisions smaller than state
3. Dates (except year) related to individual
4. Phone numbers
5. Fax numbers
6. Email addresses
7. SSN
8. Medical record numbers
9. Health plan beneficiary numbers
10. Account numbers
11. Certificate/license numbers
12. Vehicle identifiers
13. Device identifiers
14. URLs
15. IP addresses
16. Biometric identifiers
17. Full-face photos
18. Any other unique identifying numbers
Output Format
De-identified Text
Original identifiers replaced with semantic tags:
Audit Log JSON
{
"timestamp": "2024-01-15T10:30:00Z",
"input_hash": "sha256:abc123...",
"detections": [
{
"type": "PATIENT_NAME",
"position": [10, 18],
"confidence": 0.95,
"replacement": "[PATIENT_NAME_1]",
"original_length": 8
}
],
"statistics": {
"total_pii_found": 5,
"categories_detected": ["NAME", "DATE", "PHONE", "SSN"]
}
}Technical Architecture
1. **Preprocessing**: Normalize text encoding, handle line breaks
2. **Regex Engine**: Pattern matching for structured identifiers (SSN, phone, email, MRN)
3. **NLP Pipeline**: spaCy NER for names, organizations, locations
4. **Context Filter**: Remove false positives (e.g., "Dr. Smith" vs. "smith fracture")
5. **Replacement Engine**: Sequential replacement with semantic tokens
6. **Validation**: Ensure no original PII remains in output
Dependencies
See `references/requirements.txt` for full dependency list.
Limitations & Warnings
⚠️ **CRITICAL**: This tool is designed as a helper, not a replacement for human review.
References
Technical Difficulty: High
Complex NLP pipelines, contextual disambiguation, regulatory compliance requirements.
Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
Security Checklist
Prerequisites
# Python dependencies
pip install -r requirements.txtEvaluation Criteria
Success Metrics
Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
Lifecycle Status
- Performance optimization
- Additional feature support
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...