When making a music theory library, one of the first questions you’ll face is: how do you represent pitches like A, Bb, or C##?

By pitch, I mean:

  • A letter name from A through G, and
  • An accidental, like a sharp, a flat, a double sharp, a triple sharp, … or a natural (no accidental)

A string? This isn’t JavaScript.

Some powerful music theory libraries, like Tonal, use strings ("Eb", "Ax"), and while that works, there’s a few problems.

First, strings are generally costly. Due to their variable size, strings need to allocated on the heap, which is usually orders of magnitude slower than allocating on the stack. Sometimes you can’t avoid this— but in this case, we definitely can! In addition to needing heap allocation, strings usually take 16/24 bytes of stack space. At this rate, the first movement of Beethoven’s “Moonlight Sonata” would take ~23Kb for notes alone! 1

assert_eq!(size_of::<String>(), 24); // both Rust and C++ use 24 bytes

Additionally, when using strings, every method must be prepared for invalid notes. For example, Interval.distance(from: string, to: string) => string from Tonal must always consider the case where either of the passed in strings are invalid.

// this works as you'd expect
Interval.distance("C4", "G4"); // => "5P"
// but if your string is invalid, you don't even get an error
Interval.distance("c flat", "G3") // => ""

Ideally, you’d want to the Pitch class/struct to make sure the note is valid. This also poses the issue of frequently needing to parse the string whenever it’s used, which isn’t exactly efficient.

A better representation

If you’re like me, the thought of using a string to model data like a musical pitch probably felt pretty uncomfortable. This was my first thought of modeling pitches.

First, half of a pitch is the letter, so let’s make an enum for that.

#[repr(u8)]
pub enum Letter {  
    C = 0,  
    D,  
    E,  
    F,  
    G,  
    A,  
    B,  
}

Next is the accidental— an enum should work for that too.

#[repr(i8)]
pub enum AccidentalSign {  
    DoubleFlat = -2,  
    Flat = -1,  
    Natural = 0,  
    Sharp = 1,  
    DoubleSharp = 2,  
}

Or does it? Music theory doesn’t forbid you from having a triple sharp; a Cb diminished seventh chord would have the notes Cb, Ebb, Bbb, Bbbb!

pub struct AccidentalSign {
    pub offset: i8, // If you need more than 127 sharps, I have some questions.
}

// Associated constants make the common accidentals easy!
impl AccidentalSign {
    pub const DOUBLE_FLAT = Self { offset: -2 };
    pub const FLAT = Self { offset: -2 };
    pub const NATURAL = Self { offset: 0 };
    pub const SHARP = Self { offset: 1 };
    pub const DOUBLE_SHARP = Self { offset: 2 };
}

Now that we have the building blocks, let’s make our pitch struct!

pub struct Pitch(pub Letter, pub AccidentalSign);

// Hey, that's pretty good!
assert_eq!(size_of::<Pitch>(), 2);

This is a much better representation of a pitch! For the small price of two bytes, you can represent any note with up to 128 flats or 127 sharps! Also, once you create a Pitch, you can use it without wondering if it’s invalid. Can you think of any issues?

Not really. I mean, technically we’re “wasting” five bits in our Letter enum. There’s only seven notes, which can be represented in 3 bits (23 = 8 ≥ 7). This leaves 5 other bits unused. If these bits could be used for the accidental, you could have 4096 flats/sharps! That being said, if your problem is that you need 4096 sharps/flats instead of 127, composing music might not be for you. :)

Let’s transpose our pitches!

Okay, now that we have our pitches, lets make some chords! Our favorite diminished seventh chord is built up from the intervals [P1, m3, d5, d7]. Let’s focus on the last interval, the diminished seventh.

If I had gave you a note, say G, how would you transpose up by a diminished seventh interval? A d7 has 9 semitones, and 9 semitones up from G is E/Fb/D##— what’s the right spelling? Well, seven letters after G is F, so the correct pitch must start with F. That means the correct pitch is Fb.

How would you write that in code? Admittedly, it’s not too terrible; pseudocode would look something like this:

1. calculate how many semitones the interval has
  a. map interval number to base semitones (1 -> 0, 2 -> 2, ... 7 -> 11)
  b. adjust the number by quality (minor -> -1, ...)
  c. for example, a d7 has 9 semitones
2. move the letter of the start by the interval number
  a. For example, G + 7 = F
3. Figure out how many semitones the start is from the new letter
  a. semitone_difference(G, F) == 10
4. Find the difference between the semitones you just found, and the semitones from the interval
  a. 9 - 10 = -1
5. Add that offset to the letter
  a. -1 -> flat, so Fb

I decided to go a different route however.

The Circle of Fifths

First, let’s briefly talk about the circle of fifths, a sequence of pitches perfect fifths away from each other.

Circle of Fifths diagram

There’s a slight misconception many circle of fifths diagrams carry. What happens at the bottom of the circle of fifths if you keep moving? This particular diagram is well made, showing that moving a perfect fifth up from F# (clockwise), lands you on C# / Db. 2 Some diagrams, however, omit C# here, since you’re unlikely to encounter the key of C# major in practice.

However, let’s look closer at that C#. What happens if you move up by another P5 (perfect fifth)? Moving clockwise lands us on Ab— or does it? Technically, it’s G#, not Ab. But before you go blaming whoever designed this diagram, realize the key of G# major has eight sharps— counting the double flat on F —and in nine times out of ten, you’d see the enharmonic equivalent, Ab.

And there’s the misconception: the circle of fifths isn’t technically a circle. You can go infinitely in any direction and keep getting distinct pitch spellings. And not only that, but you can get any combination of letter and accidental.

Wait, isn’t that what we’re looking for?

The new representation

The way my music theory library represents pitches is based on how many perfect fifths a pitch is from C. If that doesn’t make sense, don’t worry— it’s not immediately obvious. Here’s a helpful table I found that might help you visualize it: 3

Fbb Cbb Gbb Dbb Abb Ebb Bbb
-15 -14 -13 -12 -11 -10 -9
Fb Cb Gb Db Ab Eb Bb
-8 -7 -6 -5 -4 -3 -2
F C G D A E B
-1 0 1 2 3 4 5
F# C# G# D# A# E# B#
6 7 8 9 10 11 12
Fx Cx Gx Dx Ax Ex Bx
13 14 15 16 17 18 19

The number under each pitch shows how many perfect fifths it is away from C. By extending the table up and down, you get triple sharps or triple flats, and you can keep extending it indefinitely. This table makes the pattern easy to see: moving ±7 P5s adds a sharp/flat. And, using integer division and the modulo operator %, you can figure out the letter or accidental.

With this, here’s what the pitch struct looks like in my music theory library:

#[derive(Copy, Clone, Eq, PartialEq)]  
pub struct Pitch(i16);

Seriously, that’s it.

Since I don’t want to remember how many fifths away from C every pitch is, let’s add a few methods:

impl Pitch {
    pub fn from_letter_and_accidental(letter: Letter, accidental: AccidentalSign) -> Self {
        let col_offset = accidental_sign.offset;

        let pitch = letter.fifths_from_c() + 7 * col_offset;

        Self::from_fifths_from_c(pitch)
    }
    // ...
}

impl Letter {
    // ...
    pub(crate) fn fifths_from_c(self) -> i16 {
        match self {
            Self::C => 0,
            Self::D => 2,
            Self::E => 4,
            Self::F => -1,
            Self::G => 1,
            Self::A => 3,
            Self::B => 5,
        }
    }
}

Add to check what the letter & accidental is:

impl Pitch {
    // ...
    pub fn letter(self) -> Letter {
        // .rem_euclid() is used instead of the % operator
        // due to how the % operator handles negative numbers
        match (self.as_fifths_from_c() + 1).rem_euclid(7) {
            0 => Letter::F,
            1 => Letter::C,
            2 => Letter::G,
            3 => Letter::D,
            4 => Letter::A,
            5 => Letter::E,
            6 => Letter::B,
            _ => unreachable!("i16::rem_euclid(7) must be [0, 7)"),
        }
    }
    
    pub fn accidental(self) -> AccidentalSign {  
        AccidentalSign {
            // .div_euclid() is used instead of the / operator
            // due to how the / operator handles negative numbers
            offset: (self.as_fifths_from_c() + 1).div_euclid(7)  
        }  
    }
    // ...
}

Awesome! Since we’re using an i16, this new representation is the same size as the old representation. What’s more, this representation uses all the bits in the struct, allowing us to store ~4680 sharps/flats! If you still need more— why?

Earlier, I had mentioned transposition. What was that all about?

Transposition

Since our pitches are stored as a number of fifths from C, transposing up or down by a fifth is easy. If you look at the table, adding one will always move up a P5, since that’s how the table was built in the first place.

Fbb Cbb Gbb Dbb Abb Ebb Bbb
-15 -14 -13 -12 -11 -10 -9
Fb Cb Gb Db Ab Eb Bb
-8 -7 -6 -5 -4 -3 -2
F C G D A E B
-1 0 1 2 3 4 5
F# C# G# D# A# E# B#
6 7 8 9 10 11 12
Fx Cx Gx Dx Ax Ex Bx
13 14 15 16 17 18 19

What’s more, every interval can be mapped to a consistent jump across the table, no matter where you start! For example, a minor third is a jump of -3, so if you start a E# (11), adding -3 gets you to G# (8).

This also works for more complex intervals! A doubly augmented seventh is a jump of 16, so Bb (-2) + AA7 (+16) = Cx (14). Furthermore, the code for converting an interval into a jump isn’t that complicated:

impl Pitch {
    // ...
    pub fn transpose(self, interval: Interval) -> Self {  
        use IntervalQuality as Q;  
  
        let start = match interval.number().as_simple().get().abs() {  
            1 | 8 => 7,  
            2 => 9,  
            3 => 11,  
            4 => 6,  
            5 => 8,  
            6 => 10,  
            7 => 12,  
            _ => unreachable!("a simple interval can't be bigger than an octave")  
        };  
  
        let quality_offset = match interval.quality() {  
            Q::Augmented(n) => -(n.get() as i16 - 1),  
            Q::Perfect | Q::Major => 1,  
            Q::Minor => 2,  
  
            Q::Diminished(n) => match interval.number().is_perfect() {  
                true => n.get() as i16 + 1,  
                false => n.get() as i16 + 2,  
            }  
        };  
  
        let offset = start - 7 * quality_offset;  

        // `dir_offset` here is the jump
        let dir_offset = if interval.is_ascending() { offset } else { -offset };  
  
        self.transpose_fifths(dir_offset)  
    }
    // ...
}

That’s all!

Arguably, the code for transposing an interval using the normal representation, ^ isn’t much more complicated than this. And in practice, the difference between a max of 4680 sharps verses of 127 is unlikely to make or break any project. So why did I choose to represent pitches in this way?

Why not?


  1. according to music21, the first movement of Beethoven’s Moonlight Sonata has 958 notes. 24 bytes * 958 notes = 22992b ≈ 23Kb↩︎

  2. https://wikipedia.org/wiki/Circle_of_fifths ↩︎

  3. https://www.reddit.com/r/rust/comments/eu5fop/comment/ffnyh80/ ↩︎