linux/drivers
Egmont Koblinger 2f1a2ccb9c console UTF-8 fixes
The UTF-8 part of the vt driver suffers from the following issues which are
addressed in my patch:

1) If there's no glyph found for a particular valid UTF-8 character, we try
   to display U+FFFD. However if this one is not found either, here's what
   the current kernel does:

   - First, if the Unicode value is less than the number of glyphs, use the
     glyph directly from that position of the glyph table. While it may be a
     good idea in the 8-bit world, it has absolutely no sense with Unicode
     in mind. For example, if a Latin-2 font is loaded and an application
     prints U+00FB ("u with circumflex", not present in Latin-2) then as a
     fallback solution the glyph from the 0xFB position of the Latin-2
     fontset (which is an "u with double accent" - a different character) is
     displayed.

   - Second, if this fallback fails too, a simple ASCII question mark is
     printed, which is visually undistinguishable from a real question mark.

   I changed the code to skip the first step (except if in non-UTF-8 mode),
   and changed the second step to print the question mark with inverse color
   attributes, so it is visually clear that it's not a real question mark,
   and resembles more to the common glyph of U+FFFD.

2) The UTF-8 decoder is buggy in many ways:

   - Lone continuation bytes (section 3.1 of Markus Kuhn's UTF-8 stress
     test) are not caught, they are displayed as some "random" (taken
     directly form the font table, see above) glyphs instead the replacement
     character.

   - Incomplete sequences (sections 3.2 and 3.3 of the stress test) emit no
     replacement character, but rather cause the subsequent valid character
     to be displayed more times(!).

   - The decoder is not safe: overlong sequences are not caught currently,
     they are displayed as if these were valid representations. This may
     even have security impacts.

   - The decoder does not handle D800..DFFF and FFFE..FFFF specially, it
     just emits these code points and lets it be looked up in the glyph
     table. Since these are invalid code points, I replace them by U+FFFD
     and hence give no chance for them to be looked up in the glyph table.
     (Assuming no font ships glyphs for these code points, this change is
     not visible to the users since the glyph shown will be the same.)

   With my fixes to the decoder it now behaves exactly as Markus Kuhn's
   stress test recommends.

3) It has no concept of double-width (CJK) characters. It's way beyond the
   scope of my patch to try to display them, but at least I think it's
   important for the cursor to jump two positions when printing such
   characters, since this is what applications (such as text editors)
   expect. Currently the cursor only jumps one position, and hence
   applications suffer from displaying and refreshing problems, and editing
   some English letters that are preceded by some CJK characters in the same
   line is a nightmare. With my patch an additional space is inserted after
   the CJK character has been printed (which usually means a replacement
   symbol of course). (If U+FFFD isn't availble and hence an inverse
   question mark is displayed in the first cell, I keep the inverted state
   for the space in the 2nd column so it's quite easy to see that they are
   tied together.)

4) There is a small built-in table of zero-width spaces that are not to be
   printed but silently skipped. U+200A is included there, but it's not a
   zero-width character, so I remove it from there.

Signed-off-by: Egmont Koblinger <egmont@uhulinux.hu>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:12 -07:00
..
acorn
acpi PNPACPI sets pnpdev->dev.archdata 2007-05-08 11:15:08 -07:00
amba
ata Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm 2007-05-06 13:20:10 -07:00
atm PCI: Cleanup the includes of <linux/pci.h> 2007-05-02 19:02:35 -07:00
auxdisplay
base fix hotplug for legacy platform drivers 2007-05-08 11:15:10 -07:00
block cciss: include scsi/scsi.h unconditionally 2007-05-08 11:15:10 -07:00
bluetooth [Bluetooth] Correct SCO buffer for another Broadcom based dongle 2007-05-05 00:36:22 +02:00
cdrom mm: remove destroy_dirty_buffers from invalidate_bdev() 2007-05-07 12:12:55 -07:00
char console UTF-8 fixes 2007-05-08 11:15:12 -07:00
clocksource
connector
cpufreq Add a new deferrable delayed work init 2007-05-08 11:15:05 -07:00
crypto [CRYPTO] padlock: Remove pointless padlock module 2007-05-02 22:08:26 +10:00
dio
dma
edac Fix 82875 PCI setup 2007-05-08 11:15:07 -07:00
eisa virtual_eisa_root_init() should be __init 2007-05-08 11:15:02 -07:00
fc4
firmware remove "struct subsystem" as it is no longer needed 2007-05-02 18:57:59 -07:00
hid header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
hwmon Apple SMC driver (hardware monitoring and control) 2007-05-08 11:15:00 -07:00
i2c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
ide ide-cs: recognize 2GB CompactFlash from Transcend 2007-05-05 22:03:51 +02:00
ieee1394 header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
infiniband header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
input header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
isdn header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
kvm KVM: Remove unused 'instruction_length' 2007-05-03 10:52:32 +03:00
leds
macintosh header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
mca
md Remove do_sync_file_range() 2007-05-08 11:15:04 -07:00
media header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
message remove unused header file: drivers/message/i2o/i2o_lan.h 2007-05-08 11:15:02 -07:00
mfd header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
misc Add keyboard blink driver 2007-05-08 11:15:10 -07:00
mmc Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm 2007-05-06 13:20:10 -07:00
mtd slab allocators: Remove SLAB_DEBUG_INITIAL flag 2007-05-07 12:12:57 -07:00
net header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
nubus
oprofile
parisc header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
parport parport_serial: fix PCI must_checks 2007-05-08 11:15:08 -07:00
pci header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
pcmcia fix hotplug for legacy platform drivers 2007-05-08 11:15:10 -07:00
pnp pnpbios: convert to use the kthread API 2007-05-08 11:15:11 -07:00
ps3 ps3av: Use __func__ instead of __FUNCTION__ 2007-05-04 17:59:09 -07:00
rapidio
rtc rtc: add RTC class driver for the Maxim MAX6900 2007-05-08 11:15:03 -07:00
s390 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 2007-05-05 13:30:44 -07:00
sbus header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
scsi Replace deprecated SA_xxx interrupt flags 2007-05-08 11:15:08 -07:00
serial serial_txx9: zap changelog from source code 2007-05-08 11:15:12 -07:00
sh
sn
spi layered parport code uses parport->dev 2007-05-08 11:15:05 -07:00
tc
telephony replace pci_find_device in drivers/telephony/ixj.c 2007-05-08 11:15:02 -07:00
usb header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
video header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
w1
zorro Amiga Zorro bus: kill resource_size_t warnings 2007-05-04 17:59:08 -07:00
Kconfig
Makefile i2c: Add i2c_board_info and i2c_new_device() 2007-05-01 23:26:31 +02:00