Mi blog lah! Το ιστολόγιό μου

5Mar/082

Testing the updated IM support in GTK+

In Improving input method support in GTK+-based apps, we talked about some work to update the list of compose sequences that GTK+ knows to the latest version that comes from Xorg. From 691 compose sequences, we now support over 5000.

The patch has landed in GTK+ (trunk), and here are instructions for testing.

  1. If you have not used jhbuild before, read the jhbuild instructions and install it.
  2. Add the following to your ~/.jhbuildrc file
    branches['gtk+'] = None    # Makes sure you build from the trunk of GTK+
  3. Install gtk+ using the command (see the comment of James on this post on how to avoid Step 5 below)
    jhbuild build gtk+
  4. About 40 minutes later, and about 700MB of space (~600MB for source, ~100MB for installation of files) consumed, you should get a working copy of GTK+ 2.12.
  5. You can use this compiled version of GTK+ by running
    jhbuild shell

    This should give you a new shell, and whatever you run from here will use our fresh GTK+. Try running "gedit". You will notice that the theme is different; it uses the default theme due to the special GTK+. This shell has set special environment variables so that program that run will use the fresh GTK+. The rest of the libraries come from our distribution.

  6. If you try to type compose sequences, you will notice no improvement. This is because at the moment jhbuild builds the branch 2.12 of GTK+ and not trunk. We need to download GTK+ from trunk and rebuild.
    cd ~/checkout/gnome2/
    mv gtk+ gtk+-branch-2.12
    svn co svn://svn.gnome.org/svn/gtk+/trunk gtk+
    jhbuild build --no-network gtk+
  7. Perform Step 4 and get gedit running.

How to test?

  • Setup a keyboard layout that supports a good variety of dead keys. My preference is GBr (United Kingdom). Here, AltGr+[];'#/ and AltGr+{}:@~? produce different dead keys. You press one of these combinations and then you press a letter. If such a combination exists, then it gets printed. For example, the old GTK+ produces öõóôòx åōőxxx. The new GTK+ produces öõóôòọ åōőǒŏȯ (12 dead keys).
  • Setup Greek, Polytonic (Ancient Greek). The dead keys are [];' {}:@ AltGr+[] (10 dead keys). Produce characters such as ᾅᾂᾷῗὕὒᾥᾢῷ.
  • Try compose sequences as described from the upstream file at XOrg. For example,
    ComposeKey+(   1 0 )  produces ⑩. Try the same for 0-20, a-zA-Z.
  • Other miscellaneous, Ṩǟấẫǡ (using GBr layout)

The next step would be to parse the list of compose sequences and produce a documentation file.

20Feb/087

Keyboard layout for combining diacritics

Typically, if you want to type characters with accents, such as á, ë, ś, you need to configure a suitable keyboard layout that includes compose sequences for those characters. The produced characters are what we call as precomposed characters; which were included in the early stages of Unicode. Nowdays, the idea is that you do not need to define á as a distinct character because it can be represented as a and ´, where the latter is a combining diacritic.

When put together a character and a combining diacritic, they fuse together, producing a seemingly single character. á is a precomposed (really one character), while á is letter a and the combining diacritic called acute (two characters). You can type the latter á by

  1. Type a
  2. Press Ctrl+Shift+u, then type 301, then press space bar.

Western languages do not really require combining marks, so the existing keyboard layouts do not use them. Other scripts, such as the Congolese keyboard layout (based on Latin) make good use of them.

Gedit, pango and combining diacritics

This is gedit showing off pango and DejaVu fonts (default font in major distributions).

Line 3 is a bit of an extreme, showing a sandwich of combining diacritics.

Line 4 shows the base character a with the combining diacritics from the Unicode range 0x300 to 0x315.

Both lines 3 and 4 were produced easily with a modified keyboard layout, which is show below.

Line 5 is just me being silly. You can have combining diacritics that enclose your base character.

$ cat  /usr/share/X11/xkb/symbols/combining
partial alphanumeric_keys alternate_group
xkb_symbols "combining" {

    name[Group1] = "Combining diacritics";

    key.type[Group1] = "FOUR_LEVEL";

    key <AD11> { [ NoSymbol, NoSymbol, 0x1000300, 0x1000301 ] }; // à   á
    key <AD12> { [ NoSymbol, NoSymbol, 0x1000302, 0x1000303 ] }; // â   ã

    key <AC10> { [ NoSymbol, NoSymbol, 0x1000304, 0x1000305 ] }; // ā   a̅
    key <AC11> { [ NoSymbol, NoSymbol, 0x1000306, 0x1000307 ] }; // ă   ȧ
    key <BKSL> { [ NoSymbol, NoSymbol, 0x1000308, 0x1000309 ] }; // ä    ả

    key <AB08> { [ NoSymbol, NoSymbol, 0x1000310, 0x1000311 ] }; // a̐     ȃ
    key <AB09> { [ NoSymbol, NoSymbol, 0x1000312, 0x1000313 ] }; // a̒     a̓
    key <AB10> { [ NoSymbol, NoSymbol, 0x1000314, 0x1000315 ] }; // a̔     a̕
};
$ diff -u /usr/share/X11/xkb/symbols/us.ORIGINAL /usr/share/X11/xkb/symbols/us
--- /usr/share/X11/xkb/symbols/us.ORIGINAL      2008-02-20 11:11:13.000000000 +0000
+++ /usr/share/X11/xkb/symbols/us       2008-02-20 13:02:07.000000000 +0000
@@ -492,3 +492,12 @@
     name[Group1]= "U.S. English - Macintosh";
 };

+partial alphanumeric_keys modifier_keys
+xkb_symbols "combining_us" {
+
+    include "us"
+    include "combining"
+
+    key.type[Group1] = "FOUR_LEVEL";
+    name[Group1] = "U.S. English - Combining";
+};
$ diff -u /usr/share/X11/xkb/rules/xorg.xml.ORIGINAL /usr/share/X11/xkb/rules/xorg.xml
--- /usr/share/X11/xkb/rules/xorg.xml.ORIGINAL  2008-02-20 11:27:00.000000000 +0000
+++ /usr/share/X11/xkb/rules/xorg.xml   2008-02-20 11:27:48.000000000 +0000
@@ -3643,6 +3643,12 @@
             <description xml:lang="zh_TW">Macintosh</description>
           </configItem>
         </variant>
+        <variant>
+          <configItem>
+            <name>combining_us</name>
+            <description>Combining</description>
+          </configItem>
+        </variant>
       </variantList>
     </layout>
     <layout>
$ _

Then, you select this keyboard layout (U.S. English) and variant (Combining) in the Keyboard Indicator applet.

Unlike dead keys, with combining diacritics you first type the base character (such as a) and then any combining diacritics.
Our sample layout variant puts the diacritics in the physical keys for [];'#,./. For example,

  • a + AltGr+[ : à
  • a + AltGr+Shift+[ : á
  • a + AltGr+[ + AltGr+' : ằ

If your language has needs that can be solved with combining diacritics, this is how they are solved.

It is quite important to create keyboard layouts for all languages, and actually make good use of them.

12Jun/060

Can you read Coptic?

Coptic is the most recent phase of ancient Egyptian. It is the direct descendant of the ancient language written in Egyptian hieroglyphic, hieratic, and demotic scripts. The Coptic alphabet is a slightly modified form of the Greek alphabet, with some letters (which vary from dialect to dialect) deriving from demotic. As a living language of daily conversation, Coptic flourished from ca. 200 to 1100. The last record of its being spoken was during the 17th century. Coptic survives today as the liturgical language of the Coptic Orthodox Church. Egyptian Arabic is the spoken and national language of Egypt today.

Source: Wikipedia on Coptic Language

Coptic, as used today, has signs of influence from the Greek language. If you speak Greek, you should be able to recognise every entry in the screenshot (it comes from the dictionary that is available from http://copticlang.bizhat.com/).

There is a Coptic Unicode block and there are at least three Unicode fonts available with Coptic glyphs.

I am not aware of a keyboard definition to write Unicode Coptic; Coptic uses several combining diacritical marks (accents) and appears to surpass even Ancient Greek/Polytonic in this respect. An easy way to create (easy to write with?) method would be to start from the Greek keyboard layout and replace the codepoints with the Coptic ones. For the 9 combining diacritical marks, three keys should be dedicated, accessible through 1) pressing as is, 2) pressing with shift, 3) pressing with Alt. To avoid using dead keys, there would be a requirement to type first the letter and then the diacritical mark.

In modern Greek we use the ";:" key (on the right of L) to produce the acute and the diaeresis (with Shift) accents. The second suitable key could be the ' " key while the third the "/?" (debateable).

There are several efforts to convert non-Unicode fonts distributed by the Coptic Church. website. Moheb added the Coptic glyphs to the Freefonts. There is more work required to get them added by default to Linux distros. There is a discussion forum on Coptic.

Therefore, the most important task is to create a keyboard layout so that one can write in Unicode Coptic.

Then, existing (non-Unicode) text should be converted to Unicode Coptic so that there is material available. Moheb created support for this in iconv (glibc). There should be a bug report at http://sources.redhat.com/bugzilla/ under product glibc, component libc.

Source: Wikipedia (Coptic script)

There exist free Unicode fonts already to have the text displayed. The conversion of the Coptic Church fonts to Unicode would be beneficial as well. To have them included in Linux distros, the distribution license should be set to one of the FLOSS licenses. An option could be to add to the DejaVu fonts (allowed by the license) so that there is a general purpose open font that is easy to work with.

I, for one, would love to write Greek using a Coptic keyboard layout and a Coptic Unicode font. :)

Update: Screenshot that demonstrates how well Unicode Coptic fonts behave when combining marks are used.

Update #2: You can test the above on your system by opening this OpenDocument file using OpenOffice.org or any other OpenDocument-compatible application. OpenOffice.org was verified that it can show combining marks. Your mileage may vary, your comments will be appreciated.

Get Unicode fonts with Coptic coverage.

   

Switch to our mobile site