Using a piano keyboard as a computer keyboard

keyboard midi

Alex Gordon · May 8, 2011 · Viewed 81k times · Source

I have RSI problems and have tried 30 different computer keyboards which all caused me pain. Playing piano does not cause me pain. I have played piano for around 20 years without any pain issues. I would like to know if there is a way to capture MIDI from a MIDI keyboard and output keyboard strokes. I know nothing at all about MIDI but I would like some guidance on how to convert this signal into a keystroke.

Answer

I haven't done any MIDI programming in years, but your fundamental idea is very sound (no pun).

MIDI is a stream of "events" (or "messages"), two of the most fundamental being "note on" and "note off" which carry with them the note number (0 = C five octaves below middle C, through 127 = G five octaves above the G above middle C, in semi-tones). These events carry a "velocity" number on keyboards that are velocity sensitive ("touch sensitive"), with a force of (you guessed it) between 0 and 127.

Between velocity, chording, and the pedals, I'd think you could come up with quite a good "typing" interface for the piano keyboard. Chording in particular could be a very powerful technique — as I mentioned in the comments, it's why rank-and-file stenographers can use a stenotype machine to keep up with people talking for hours in a row, when even top-flight typists wouldn't be able to for any length of time via normal typewriter-style keyboards. As with machine stenography, you'd need a "dictionary" of the meanings of chords and sequences of chords. (Can you tell I used to work in the software side of machine stenography?)

To do this, the fundamental pieces are:

Receiving MIDI input. Don't try to do this yourself, use a library. Edit: Apparently, the Java Sound API supports MIDI, including receiving events from MIDI controllers. Cool. This page may also be useful.
Converting that data into the keystrokes you want to send, e.g. via the dictionary I mentioned above.
Outputting the keystrokes to the computer.

To be most broadly-compatible with software, you'd have to write this as a keyboard device driver. This is a plug-in to the operating system that serves as a source for keyboard events, talking to the underlying hardware (in your case, the piano keyboard). For Windows and Linux, you're probably going to want to use C for that.

However, since you're just generating keystrokes (not trying to intercept them, which I was trying to do years ago), you may be able to use whatever features the operating system has for sending artificial keystrokes. Windows has an interface for doing that (probably several, the one I'm thinking of is SendInput but I know there's some "journal" interface that does something similar), and I'm sure other operating systems do as well. That may well be sufficient for your purposes — it's where I'd start, because the device driver route is going to be awkward and you'd probably have to use a different language for it than Java. (I'm a big fan of Java, but the interfaces that operating systems use to talk to device drivers tend to be more easily consumed via C and similar.)

Update: More about the "dictionary" of chords to keystrokes:

Basically, the dictionary is a trie (thanks, @Adam) that we search with longest-prefix matching. Details:

In machine stenography, the stenographer writes by pressing multiple keys on the stenotype machine at the same time, then releasing them all. They call this a "stroke" of the keyboard; it's like playing a chord on the piano. Strokes frequently (but not always) correspond to a syllable of spoken language. Like syllables, sometimes one stroke (chord) has meaning all on its own, other times it only has meaning combined with following strokes. (Think "good" vs. "good" followed by "bye"). Although they'll be heavily influenced by the school at which they studied, each stenographer will have their own "dictionary" of what strokes they use to mean what, a dictionary they will continuously hone over the course of their working lives. The dictionary will have entries where the stenographic part ("steno", for short) is one stroke long, or multiple strokes long. Frequently, there will be several entries with the same starting stroke which are differentiated by their length and by the subsequent strokes. For instance (and I won't use real steno here, just placeholders), there may be these entries:

A     = alpha
A/B   = alphabet
A/B/C = alphabetic
A/C   = air conditioning
B     = bee
B/C   = because
C     = sea
D     = dog
D/D   = Dee Dee

(Those letters aren't meant to be musical notes, just abstract markers.)

Note that A starts multiple entries, and also note that how you translate a C stroke depends on whether you've previously seen an A, a B, or you're starting fresh.

Also note that (although not shown in the very small sample above), there may be multiple ways to "play" the same word or phrase, rather than just one. Stenographers do that to make it easier to flow from a preceding word to the next depending on hand position. There's an obvious analogy to music there, and you could use that to make your typing flow more akin to playing music, in order to both prevent this from negatively affecting your piano playing and to maximize the likelihood of this actually helping with the RSI.

When translating steno into standard text, again we use a "longest-prefix match" search: The translation algorithm starts with the first stroke ever written, and looks for entries starting with that stroke. If there is only one entry, and it's one stroke long, then we can reliably say "that's the entry to use", output the corresponding text, and then start fresh with the next stroke. But more likely, that stroke starts multiple entries of varying lengths. So we look at the next stroke and see if there are entries that start with those two strokes in order; and so on until we get a match.

So with the dictionary above, suppose we saw this sequence:

A C B B C A B C A B D

Here's how we'd translate it:

A is the start of three entries of varying lengths; look at next stroke: C
A/C matches only one entry; output "air conditioning" and start fresh with next stroke: B
B starts two entries; look at next stroke: B
B/B doesn't start anything; take the longest previous match (B) and output that ("bee")
Having output B = "bee", we still have a B stroke in our buffer. It starts two entries, so look at the next stroke: C
B/C matches one entry; output "because" and start fresh with the next stroke: A
A starts three entries; look at the next stroke: B
A/B starts two entries; look at the next stroke: C
A/B/C only matches one entry; output "alphabetic" and start fresh with the next stroke: A
A starts three entries; look at next stroke: B
A/B starts two entries; look at next stroke: D
A/B/D doesn't match anything, so take the longest previous match (A/B) and use it to output "alphabet". That leaves us with D still in the buffer.
D starts two entries, so we would normally look at the next stroke — but we've processed all the strokes, so consider it in isolation. In isolation, it translates as "dog" so output that.

Aspects of the above to note:

You have a buffer of strokes you've read but haven't translated yet.
You always want to match the most strokes against a single entry that you can. A/B should be translated as "alphabet", not "alpha" and "bee".
(Not shown above) You may well have sequences of strokes that you can't translate, because they don't match anything in the dictionary. (Steno people use the noun "untranslate" -- e.g., with our dictionary, the strokes E would be an "untranslate".)
(Not shown above) Some theories of steno allow the same set of strokes to mean more than one thing, based on a broader context. Steno people call these "conflicts". You probably want to disallow them in your project, and in fact when steno used to be translated manually by the stenographer, conflicts were fine because they'd know just by where in the sentence they were what the right choice was, but with the rise of machine translation, conflict-free theories of steno arose specifically to avoid having to go through the resulting translated text and "fix" conflicts.
Translating in real time (which you'd be doing) means that if you receive a partial match, you'll want to hold onto it while waiting for the next chord — but probably only up to a timeout, at which point you'd translate what you have in the buffer as best you can. (Or maybe you don't want a timeout; it's your call.)
Probably best to have a stroke that says "disregard the previous stroke"
Probably best to have a stroke that says "completely clear the buffer without outputting anything"

Using a piano keyboard as a computer keyboard

Answer

Related questions