HomeBrowseUpload
โ† Back to registry
โšก
// Skill profile

๐ŸŽต Voice Note to MIDI

name: voice-note-to-midi

by danbennettuk ยท published 2026-03-22

ๅผ€ๅ‘ๅทฅๅ…ทAPI้›†ๆˆ
Total installs
0
Stars
โ˜… 0
Last updated
2026-03
// Install command
$ claw add gh:danbennettuk/danbennettuk-voice-note-to-midi
View on GitHub
// Full documentation

---

name: voice-note-to-midi

description: Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection and intelligent post-processing

author: Clawd

tags: [audio, midi, music, transcription, machine-learning]

---

# ๐ŸŽต Voice Note to MIDI

Transform your voice memos, humming, and melodic recordings into clean, quantized MIDI files ready for your DAW.

What It Does

This skill provides a complete audio-to-MIDI conversion pipeline that:

1. **Stem Separation** - Uses HPSS (Harmonic-Percussive Source Separation) to isolate melodic content from drums, noise, and background sounds

2. **ML-Powered Pitch Detection** - Leverages Spotify's Basic Pitch model for accurate fundamental frequency extraction

3. **Key Detection** - Automatically detects the musical key of your recording using Krumhansl-Kessler key profiles

4. **Intelligent Quantization** - Snaps notes to a configurable timing grid with optional key-aware pitch correction

5. **Post-Processing** - Applies octave pruning, overlap-based harmonic removal, and legato note merging for clean output

Pipeline Architecture

Audio Input (WAV/M4A/MP3)
    โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Step 1: Stem Separation (HPSS)     โ”‚
โ”‚ - Isolate harmonic content          โ”‚
โ”‚ - Remove drums/percussion           โ”‚
โ”‚ - Noise gating                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Step 2: Pitch Detection             โ”‚
โ”‚ - Basic Pitch ML model (Spotify)    โ”‚
โ”‚ - Polyphonic note detection         โ”‚
โ”‚ - Onset/offset estimation           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Step 3: Analysis                    โ”‚
โ”‚ - Pitch class distribution          โ”‚
โ”‚ - Key detection                     โ”‚
โ”‚ - Dominant note identification      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Step 4: Quantization & Cleanup      โ”‚
โ”‚ - Timing grid snap                  โ”‚
โ”‚ - Key-aware pitch correction        โ”‚
โ”‚ - Octave pruning (harmonic removal) โ”‚
โ”‚ - Overlap-based pruning             โ”‚
โ”‚ - Note merging (legato)             โ”‚
โ”‚ - Velocity normalization            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ†“
MIDI Output (Standard MIDI File)

Setup

Prerequisites

  • Python 3.11+ (Python 3.14+ recommended)
  • FFmpeg (for audio format support)
  • pip
  • Installation

    **Quick Install (Recommended):**

    cd /path/to/voice-note-to-midi
    ./setup.sh

    This automated script will:

  • Check Python 3.11+ is installed
  • Create the `~/melody-pipeline` directory
  • Set up the virtual environment
  • Install all dependencies (basic-pitch, librosa, music21, etc.)
  • Download and configure the hum2midi script
  • Add melody-pipeline to your PATH
  • **Manual Install:**

    If you prefer manual setup:

    mkdir -p ~/melody-pipeline
    cd ~/melody-pipeline
    python3 -m venv venv-bp
    source venv-bp/bin/activate
    pip install basic-pitch librosa soundfile mido music21
    chmod +x ~/melody-pipeline/hum2midi

    5. **Add to your PATH (optional):**

    echo 'export PATH="$HOME/melody-pipeline:$PATH"' >> ~/.bashrc
    source ~/.bashrc

    Verify Installation

    cd ~/melody-pipeline
    ./hum2midi --help

    Usage

    Basic Usage

    Convert a voice memo to MIDI:

    ./hum2midi my_humming.wav

    This creates `my_humming.mid` with 16th-note quantization.

    Specify Output File

    ./hum2midi input.wav output.mid

    Command-Line Options

    | Option | Description | Default |

    |--------|-------------|---------|

    | `--grid <value>` | Quantization grid: `1/4`, `1/8`, `1/16`, `1/32` | `1/16` |

    | `--min-note <ms>` | Minimum note duration in milliseconds | `50` |

    | `--no-quantize` | Skip quantization (output raw Basic Pitch MIDI) | disabled |

    | `--key-aware` | Enable key-aware pitch correction | disabled |

    | `--no-analysis` | Skip pitch analysis and key detection | disabled |

    Usage Examples

    #### Quantize to eighth notes

    ./hum2midi melody.wav --grid 1/8

    #### Key-aware quantization (recommended for tonal music)

    ./hum2midi song.wav --key-aware

    #### Require longer minimum notes

    ./hum2midi humming.wav --min-note 100

    #### Skip analysis for faster processing

    ./hum2midi quick.wav --no-analysis

    #### Combine options

    ./hum2midi recording.wav output.mid --grid 1/8 --key-aware --min-note 80

    Processing MIDI Input

    You can also process existing MIDI files through the quantization pipeline:

    ./hum2midi input.mid output.mid --grid 1/16 --key-aware

    This skips the audio processing steps and goes directly to analysis and quantization.

    Sample Output

    โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
      hum2midi - Melody-to-MIDI Pipeline (Basic Pitch Edition)
      [Key-Aware Mode Enabled]
    โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
    
    Input:  my_humming.wav
    Output: my_humming.mid
    
    โ†’ Step 1: Stem Separation (HPSS)
      Isolating melodic content...
      Loaded: 5.23s @ 44100Hz
      โœ“ Melody stem extracted โ†’ 5.23s
    
    โ†’ Step 2: Audio-to-MIDI Conversion (Basic Pitch)
      Running Spotify's Basic Pitch ML model on melody stem...
      โœ“ Raw MIDI generated (Basic Pitch)
    
    โ†’ Step 3: Pitch Analysis & Key Detection
      Notes detected: 42 total, 7 unique
      Note range: C3 - G4
      Pitch classes: C3, E3, G3, A3, C4, D4, G4
      Dominant note: G3 (23.8% of notes)
      Detected key: G major
    
    โ†’ Step 4: Quantization & Cleanup
      Octave pruning: removed 3 harmonic notes above 67 (median+12)
      Overlap pruning: removed 2 harmonic notes at overlapping positions
      Note merging: merged 5 staccato chunks into legato notes (gap<=60 ticks)
      Grid:   240 ticks (1/16)
      Notes:  38 notes
      Key:    G major
      Key-aware: 2 notes corrected to scale
      Tempo:  120 BPM
      โœ“ Quantized MIDI saved
    
    โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
      โœ“ Done! Output: my_humming.mid
    โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
    
    ๐Ÿ“Š ANALYSIS SUMMARY
    โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
      Detected Notes: C3, E3, G3, A3, C4, D4, G4
      Detected Key:   G major
      Quantization:   Key-aware mode (notes snapped to scale)
    
    MIDI Info: 38 notes, 7 unique pitches, 120 BPM
    Pitches: C3, E3, G3, A3, C4, D4, G4

    Notes & Limitations

    Audio Quality Matters

  • **Clear, loud melody** produces the best results
  • **Background noise** can cause false note detection
  • **Reverb and effects** may confuse pitch detection
  • **Close-mic'd vocals** work significantly better than room recordings
  • Musical Considerations

  • **Monophonic sources** work best (single melody line)
  • **Polyphonic audio** (chords, multiple instruments) will produce messy results
  • **Vibrato and pitch bends** may be quantized to stepped pitches
  • **Rapid note passages** may be missed or merged
  • Technical Limitations

  • **Tempo is fixed** at 120 BPM in output (time positions are preserved, but tempo may need adjustment in your DAW)
  • **Note velocities** are normalized but may need manual adjustment
  • **Very short notes** (<50ms) may be filtered out by default
  • **Extreme pitch ranges** may cause octave detection issues
  • Post-Processing Recommendations

    After generating MIDI, you may want to:

    1. **Import into your DAW** and adjust tempo to match your original recording

    2. **Quantize further** if stricter timing is needed

    3. **Adjust note velocities** for dynamics

    4. **Apply swing/groove** templates if the rigid grid sounds too mechanical

    5. **Edit individual notes** that were misdetected (common with fast runs)

    Supported Audio Formats

    Input formats supported via FFmpeg:

  • WAV, AIFF, FLAC (uncompressed, best quality)
  • MP3, M4A, AAC (compressed, acceptable)
  • OGG, OPUS (open source formats)
  • Most other formats FFmpeg supports
  • Troubleshooting

    No notes detected

  • Check that input file isn't silent or corrupted
  • Try increasing `--min-note` threshold
  • Verify audio has clear melodic content (not just noise)
  • Too many notes / messy output

  • Enable octave pruning and overlap pruning (on by default)
  • Use `--key-aware` to constrain to musical scale
  • Check for background noise in source audio
  • Wrong key detected

  • Key detection works best with at least 8-10 measures of music
  • Chromatic passages may confuse the detector
  • Manually review and adjust in your DAW if needed
  • Notes in wrong octave

  • Basic Pitch sometimes detects harmonics instead of fundamentals
  • The pipeline includes pruning, but some may slip through
  • Use your DAW's transpose function for simple octave shifts
  • References

  • [Basic Pitch](https://github.com/spotify/basic-pitch) - Spotify's polyphonic pitch detection model
  • [librosa HPSS](https://librosa.org/doc/latest/generated/librosa.decompose.hpss.html) - Harmonic-Percussive Source Separation
  • [Krumhansl-Kessler Key Profiles](https://rnhart.net/articles/key-finding/) - Key detection algorithm
  • License

    This skill integrates Basic Pitch by Spotify, which is licensed under Apache 2.0. The pipeline script and documentation are provided under MIT license.

    // Comments
    Sign in with GitHub to leave a comment.
    // Related skills

    More tools from the same signal band