Custom Search

Monday, August 10, 2015

Unicode input with X11

Unicode!

Whether you like it or not, Unicode is here. Modern programming languages allow it to be used in variable names and for operator symbols, older ones are adding it as extensions, and inputting it is getting easier all the time.
And frankly, I think most of us would rather read x ≠ 23 instead of x /= 23 or x != 23 or even x =/= 23 . Or how about x ∈ A instead of element(x, A)?
So here's how I set up my X11 keyboard to allow me to input the more popular programming symbols - at least for Haskell - directly from the keyboard, without having to use some editor-specific magic.

Dead keys warning

Those of you already using non-ASCII characters - for instance, the various accented latin characters popular in Europe - may be using dead keys to get to those characters. These changes may either make the dead keys stop working, or the dead keys could prevent them from working, depending on your keymap.
If you aren't sure, you can check for them by running the command xmodmap -pke | grep dead. If you get output, you're using dead keys. You'll want to save that for later!
But onwards...

Setup

For any of this to work, your environment and applications must be set up for 8-bit input and Unicode output.
First, you need to be using a locale ending in UTF-8 or similar to get Unicode output. Normally, you can fix it by just setting the LANG environment variable. The format is a two-letter language code in lower case, an underscore, a two-letter country code in upper case, a period, and then the character encoding. So I use en_US.UTF-8.
The command locale will tell you your current settings. If it doesn't use a Unicode character map like UTF_8, use locale -a to get a list of locales you can use.
If you want this to work in a terminal emulator or text editor, you should check the documentation to make sure they are set up properly for 8 bit input and Unicode output. Modern applications should adopt properly once you've set the environment variable.
You'll also need a font that has the appropriate glyphs. There are a number of good choices available. Personally, I use the DejaVu font family.
Get all that installed, restart your editor or terminal emulator or IDE, and set the font to the one you've chosen for this.

Mode switch

You can configure a Mode switch key as another shift key, allowing you to enter two (or more) more characters for each key, with and without the shift key also being held down.
It's normal for the AltGraph key to already be assigned to this, if you have one. The xkeycaps tool can help you find out, or possibly find out if you have a key assigned for it. Start it, select your keyboard type, and select Ok. Then hover over the various shift/window/bucky keys, and the KeySym line will display Mode_switch for the key you want to use.
If not, I'll tell you how to add one. If so, you can use it as is, or follow the instructions below to change it.

Assign Mode_switch

If you couldn't find the mode_switch key, or want to use one other than the one you found, you need to create an xmodmap file to change things.

Find the keycode

First, we need to find the keycode for the key you want to use for this shift key. Keycodes aren't normally a good thing to use in an xmodmap file, because they are very specific to your keyboard and system. However, given that you may be picking some non-standard key to start with, the alternative - a keysym - isn't likely to be any more portable, so we might as well do it the simpler way.
Again, you can use the xkeycaps tool to find the key. Start it, and then press the key you're interested in. It should highlight on the keyboard image in xkeycaps. Pressing it will then display the keycode and keysym info you need.
Alternatively, the xev command can be used. Run xev -event keyboard, and a new window will open. Press the key, and you'll get a couple of blocks of text, one for the KeyPress event and one for the KeyRelease event, including the keycode and keysym values. This happens for the various shift keys as well.

Assign and activate it

Now that we've found the keycode for the key we want to use, we need to tell X11 about it. The xmodmap command will do that. If you are already using it, then just add them to your file, probably near the top. If not, the normal name is .xmodmap.
First, just set up the key to send the Mode_switch value with the line. I chose keycode 64, so I used:
keycode 64 = Mode_switch Mode_switch
To activate it, you need to assign a modifier to it. Run the command xmodmap -pm and find a line that has no keys assigned. For me, that was the one starting mod5. Any of the lines starting mod should be usable except for mod1, which some applications assume is the Alt key. So now add the lines (again, I chose mod5; if you chose something different modify the lines):
clear mod5
add mod5 = Mode_switch
The clear mod5 probably isn't needed, but I'm being paranoid about things.
While we're at it, let's turn on a couple keys with extra symbols:
! Contains as a member () and supserset of ()
keysym period = period greater U220B U2283

! element of () and subset of ()
keysym comma = comma less U2208 U2282
As the comments say, this makes the ,< and .> keys also send ∈⊂ and ∋⊃. The format for these lines is the keysym command, then the keysym, then an = followed by four more keysyms. The U2280 "keysym" is a Unicode character for the symbol we want. The first keysym is the unmodified key, then the shifted key, then the mode key, and finally the shifted mode key.
So once this is added to your .xmodmap file, just run xmodmap .xmodmap (adjusting if you used a different file) to install it. You should now be able to enter the '∈', '∋', '⊂' and '⊃' characters in it by holding down your Mode key and typing ',', '.', '<', and '>', in order.
If that doesn't work, start up xkeycaps again, and check that the key you want for Mode_switch has that symbol assigned, and that the < and > keys have all four characters assigned, as that's the most likely source of problems.

And now lots of characters

Ok, now that we can enter unicode characters, lets add some!
Here's a list of what I'm using. The only thing I'm not particularly happy with is the . Using mode-u for U makes sense because it's both the first character of union and looks like the symbol, but neither mode-shift-u nor mode-i for intersection seems right.
! not sign (¬)
keysym grave = grave asciitilde U00AC U00AC

! division sign (÷)
keysym slash = slash question U00F7 U00F7

! dot operator () & ring operator ()
keysym 8 = 8 asterisk U22C5 U2218

! Contains as a member () and supserset of ()
keysym period = period greater U220B U2283

! element of () and subset of ()
keysym comma = comma less U2208 U2282

! increment ()
keysym minus = minus underscore U2206 U2206

! Greek alpha (α) (Α)
keysym a = a A U03B1 U391

! Greek beta (β) (Β)
keysym b = b B U03B2 U0392

! Greek delta (δ) (Δ)
keysym d = d D U03B4 U0394

! Greek epsilon (ε) (Ε)
keysym e = e E U03B5 U0395

! Greek gamma (γ) (Γ)
keysym g = g G U03B3 U0393

! An alternative for intersection (), greek small letter iota (Λ)
keysym i = i I U2229 U03B9

! Greek lambda (λ) (Λ)
keysym l = l L U03BB U039B

! micro symbol (µ), greek capital mu (Μ)
keysym m = m M U00B5 U039C

! Greek omega (ω) (Ω)
keysym o = o O U03C9 U03A9

! pi (π) and unary product operator ()
keysym p = p P U03C0 U220F

! Rationals ()
keysym q = q Q U211A U211A

! Greek rho (ρ) & rationals ()
keysym r = r R U03C1 U211D

! Greek sigma (σ) and unary sum operator ()
keysym s = s S U03C3 U2211

! Nil/undefined/bottom (), greek captilal tau (Τ)
keysym t = t T U22A5 U03A4

! Union () & intersection () (latter needs to be better)
keysym u = u U U222A U2229

! Square root symbol ()
keysym v = v V U221A U221A

! Greek small letter zeta (ζ), Integers ()
keysym z = z Z U03B6 U2124

! Thumbs up and down, and a couple of surprises
keysym Up = Up NoSymbol U1F44D NoSymbol
keysym Down = Down NoSymbol U1F44E NoSymbol
keysym Left = Left NoSymbol U1F595 NoSymbol
keysym Right = Right NoSymbol U1F594 NoSymbol

Fix any dead keys

If you use dead keys and install this as is, the dead keys will either stop working, or the mode switch won't work for them. If the assignment here works, they will no longer be dead keys, and will stop working. If the keysym I used wasn't used, then the assignment didn't happen, so the mode switch won't work.
In either case, you can fix this by changing the ASCII symbol name to the appropriate dead_key name in the assignment. If the dead keys quit working, just change the ones on the right side of the = sign to those. If the mode key isn't working, change them both.
The only dead keys I think might cause problems - and they keysyms I used above - are dead_tilde, which should replace asciitilde, dead_grave, which should replace grave, and possibly dead_cdilla, which should replace comma.

Run it automatically

Now, you need to set things up to do the xmodmap when you start X. Given that every distribution seems to set things up differently, I'm not going to cover that. If you have a way to run programs when you start X, that's probably the place to put it. But first, try simply creating .xmodmap in your home directory, as many systems checked for that and used it by default, and some still do that.

Multi_key

But we're not done yet. You can use a slightly newer facility: Compose. This lets you assign the symbol Multi_key to a key, and pressing that key will cause multiple keyboard characters to be collected and translated into a single key.

Supported on your system?

First, check to see if it's supported at all. You're looking for the Compose file for your locale. This will be someplace like /usr/share/X11/locale, though that will vary from distribution to distribution. In it, you'll find directories whose name matches your locale. For me, that's en_US.UTF_8. If there's a Compose file there, then you're set. You may want to look that over as well, as you'll now be able to use all the things it defines as well.

Add the Multi_key

This is a little bit easier to set up. First, see if you already have one by running xmodmap -pk | grep Multi. Again, if you have one, you can use it. If not, or if you want to use a different one, here's how to change it.
First, find a key you want to use. Since I'm using a very small keyboard without a lot of extra keys, I chose to put it on the same key as the Mode_switch key, only shifted. So now the keycode I use is:
keycode 64 = Mode_switch Multi_key
You can do that yourself, or set up another key by setting all four keysyms to Multi_key.
The downside of setting using the Mode_switch key that way is that the order I press the two in matters! If I hit the Mode_switch key first, then I get that fourth character for the key. If I hit the Shift key first, then I just entered the Multi_key key, and need to release the Mode_switch key to start entering more keystrokes.

Test it

The Compose facility is a library, and any application that is linked with it will work. Exactly what happens if the glyph isn't available will depend on the application. But once you've set up your xmodmap file to enable the Multi_key and run xmodmap to install it, you can restart those applications to use them.
So restart your editor/IDE/terminal emulator, and try typing one of the sequences from the Compose file we found earlier. Mine has <o> <o> in it as the degree symbol, °, so try that one. If that doesn't work something like a' should get you an accented a. If that doesn't work - back to xkeycaps, to make sure your Multi_key key is properly enabled.

And a bunch more symbols!

Now you can install a bunch of math/programming symbols. These go into ~/.XCompose. Once you've set that file up, it will be loaded by applications that support it when they start.
Here's the set I use:
# Get system defaults. Might be useful things there as well.
include "%L"

# Note: There are a few cases where a two-character sequence is a
# prefix of a three-character sequence. The two-character sequence
# gets a <space> suffix to disambiguate them.

# Logical operators for programming languages that support them.
<Multi_key> <equal> <equal>               : "≡" U2261 # IDENTICAL TO
<Multi_key> <equal> <slash>               : "≢" U2262 # NOT IDENTICAL TO
<Multi_key> <equal> <U00AC>               : "≢" U2262 # NOT IDENTICAL TO
<Multi_key> <equal> <exclam>              : "≢" U2262 # NOT IDENTICAL TO
<Multi_key> <equal> <asciitilde>          : "≢" U2262 # NOT IDENTICAL TO
<Multi_key> <slash> <equal>               : "≠" U2260 # NOT EQUAL TO
<Multi_key> <U00AC> <equal>               : "≠" U2260 # NOT EQUAL TO
<Multi_key> <exclam> <equal>              : "≠" U2260 # NOT EQUAL TO
<Multi_key> <asciitilde> <equal>          : "≠" U2260 # NOT EQUAL TO
<Multi_key> <ampersand> <ampersand>       : "∧" U2227 # LOGICAL AND
<Multi_key> <bar> <bar> <space>           : "∨" U2228 # LOGICAL OR
<Multi_key> <less> <equal>                : "≤" U2264 # LESS-THAN OR EQUAL TO
<Multi_key> <greater> <equal>             : "≥" U2265 # GREATER-THAN OR EQUAL TO
<Multi_key> <slash> <less>                : "≮" U226E # NOT LESS-THAN
<Multi_key> <U00AC> <less>                : "≮" U226E # NOT LESS-THAN
<Multi_key> <exclam> <less>               : "≮" U226E # NOT LESS-THAN
<Multi_key> <asciitilde> <less>           : "≮" U226E # NOT LESS-THAN
<Multi_key> <slash> <greater>             : "≯" U226F # NOT GREATER-THAN
<Multi_key> <U00AC> <greater>             : "≯" U226F # NOT GREATER-THAN
<Multi_key> <exclam> <greater>            : "≯" U226F # NOT GREATER-THAN
<Multi_key> <asciitilde> <greater>        : "≯" U226F # NOT GREATER-THAN
<Multi_key> <greater> <greater> <space>   : "≫" U226B # MUCH GREATER-THAN
<Multi_key> <less> <less> <space>         : "≪" U226A # MUCH LESS-THAN
<Multi_key> <less> <less> <less>          : "⋘" U22D8 # VERY MUCH LESS-THAN
<Multi_key> <greater> <greater> <greater> : "⋙" U22D9 # VERY MUCH GREATER-THAN

# Symbols for set theory, using mode-shifted multikey goodies
<Multi_key> <slash> <U2208>               : "∉" U2209 # NOT AN ELEMENT OF
<Multi_key> <U00AC> <U2208>               : "∉" U2209 # NOT AN ELEMENT OF
<Multi_key> <exclam> <U2208>              : "∉" U2209 # NOT AN ELEMENT OF
<Multi_key> <asciitilde> <U2208>          : "∉" U2209 # NOT AN ELEMENT OF
<Multi_key> <slash> <U220B>               : "∌" U220C # DOES NOT CONTAIN AS MEMBER
<Multi_key> <U00AC> <U220B>               : "∌" U220C # DOES NOT CONTAIN AS MEMBER
<Multi_key> <exclam> <U220B>              : "∌" U220C # DOES NOT CONTAIN AS MEMBER
<Multi_key> <asciitilde> <U220B>          : "∌" U220C # DOES NOT CONTAIN AS MEMBER
<Multi_key> <U2282> <equal>               : "⊆" U2286 # SUBSET OF OR EQUAL TO
<Multi_key> <U2283> <equal>               : "⊇" U2287 # SUPERSET OF OR EQUAL TO
<Multi_key> <slash> <U2282> <space>       : "⊄" U2284 # NOT A SUBSET OF
<Multi_key> <U00AC> <U2282> <space>       : "⊄" U2284 # NOT A SUBSET OF
<Multi_key> <exclaim> <U2282> <space>     : "⊄" U2284 # NOT A SUBSET OF
<Multi_key> <asciitilde> <U2282> <space>  : "⊄" U2284 # NOT A SUBSET OF
<Multi_key> <slash> <U2283> <space>       : "⊅" U2285 # NOT A SUPERSET OF
<Multi_key> <U00AC> <U2283> <space>       : "⊅" U2285 # NOT A SUPERSET OF
<Multi_key> <exclaim> <U2283> <space>     : "⊅" U2285 # NOT A SUPERSET OF
<Multi_key> <asciitilde> <U2283> <space>  : "⊅" U2285 # NOT A SUPERSET OF
<Multi_key> <slash> <U2282> <equal>       : "⊈" U2288 # NEITHER A SUBSET OF NOR EQUAL TO
<Multi_key> <U00AC> <U2282> <equal>       : "⊈" U2288 # NEITHER A SUBSET OF NOR EQUAL TO
<Multi_key> <exclam> <U2282> <equal>      : "⊈" U2288 # NEITHER A SUBSET OF NOR EQUAL TO
<Multi_key> <asciitilde> <U2282> <equal>  : "⊈" U2288 # NEITHER A SUBSET OF NOR EQUAL TO
<Multi_key> <slash> <U2283> <equal>       : "⊉" U2289 # NEITHER A SUPERSET OF NOR EQUAL TO
<Multi_key> <U00AC> <U2283> <equal>       : "⊉" U2289 # NEITHER A SUPERSET OF NOR EQUAL TO
<Multi_key> <exclam> <U2283> <equal>      : "⊉" U2289 # NEITHER A SUPERSET OF NOR EQUAL TO
<Multi_key> <asciitilde> <U2283> <equal>  : "⊉" U2289 # NEITHER A SUPERSET OF NOR EQUAL TO

# More symbols.
<Multi_key> <braceleft> <braceright>      : "∅" U2205 # EMPTY SET
<Multi_key> <slash> <slash>               : "∖" U2216 # SET MINUS
<Multi_key> <less> <minus> <space>        : "←" U2190 # LEFTWARDS ARROW
<Multi_key> <minus> <greater>             : "→" U2192 # RIGHTWARDS ARROW
<Multi_key> <minus> <less>                : "⤙" U2919 # LEFTWARDS ARROW-TAIL
<Multi_key> <greater> <minus>             : "⤚" U291A # RIGHTWARDS ARROW-TAIL
<Multi_key> <equal> <less>                : "⇐" U21D0 # LEFTWARDS DOUBLE ARROW
<Multi_key> <equal> <greater>             : "⇒" U21D2 # RIGHTWARDS DOUBLE ARROW
<Multi_key> <colon> <colon>               : "∷" U2237 # PROPORTION
<Multi_key> <plus> <plus> <space>         : "⧺" U29FA # DOUBLE PLUS
<Multi_key> <less> <bar>                  : "⊲" U22B2 # NORMAL SUBGROUP OF
<Multi_key> <bar> <greater>               : "⊳" U22B3 # CONTAINS AS NORMAL SUBGROUP
<Multi_key> <greater> <less>              : "⋈" U22C8 # BOWTIE
<Multi_key> <less> <asterisk> <greater>   : "⊛" U229B # CIRCLED ASTERISK OPERATOR
<Multi_key> <less> <plus> <greater>       : "⊕" U2295 # CIRCLED PLUS
<Multi_key> <less> <minus> <greater>      : "⊖" U2296 # CIRCLED MINUS
<Multi_key> <asterisk> <asterisk> <asterisk> : "⁂" U2040 # ASTERISM
<Multi_key> <plus> <plus> <plus>          : "⧻" U29FB # TRIPLE PLUS
<Multi_key> <bar> <bar> <bar>             : "⧻" U2AF4 # TRIPLE VERTICAL BAR BINARY RELATIONSHIP
I've included some really obscure symbols, because the languages I'm working with use them. As the comments note, there are some character sequences - usually a pair of the same character - that are prefixes of some three-character sequence - usually three of that same sequence. For those to work The exception is that (<less> <minus> <space>) is a prefix of  (<less> <minus <greater>). I'm not happy about this space, but it's the best alternative I've found so far. If you've got a better solution, let me know.