[links-list] UTF-8 terminal I/O recoding patch for links-2.0pre1 + notes

BC Sittler bsittler at iname.com
Wed May 22 15:52:29 PDT 2002


( also some notes on links -g, near the end )

First, I would like to congratulate everyone who worked on the new features in links-2.0pre1. It's great to have a graphical browser for the Linux framebuffer and X11 which shares cache & features with the text-mode web browser I use over ssh and from MacOS X Terminal.app. I look forward to really stress-testing that JavaScript engine, too ;)

Attached to this message is the UTF-8 terminal I/O recoding patch updated for links-2.0pre1. This allows UTF-8 terminals to be used with the text mode of Links.

This patch does not seem to affect the graphics mode (-g) at all; it only modifies Links's behavior in text mode, nor does it use any of the new UTF-8 features introduced with the graphics mode.


Notes on links-2.0pre1 -g

First, to compile on Mac OS X one must work around a cpp-precomp coredump:

make CC="cc -no-cpp-precomp" font_include.o

(cpp-precomp is a version of the C preprocessor which allows precompiled headers [with the ensuing speedup] but it's also a bit buggy [in this file it just coredumps!] and doesn't understand a lot of GCC syntax.)

A logical next step for UTF-8 terminals would be to implement a "UTF-8 terminal" version of the graphics mode interface to Links, perhaps using the xterm's window for image display (as w3m does with some patches.) This seems like the sane way to handle stuff like >1byte characters, combining characters, wide vs. narrow characters, bidirectional scripts, etc. A lot of the code is the same as for the graphics interface, and quite different from the existing terminal interface.

I also played around a bit with the UTF-8 support in this newest Links version, and noticed some bizarre stuff. For instance, some of the Japanese hiragana and katakana characters are mysteriously missing (hiragana ri, katakana a, katakana ka, etc.) when all surrounding characters [including similar ones like katakana small a and katakana ga) are present. Also some of the Latin-1 characters are mysteriously absent, for instance there's a masculine ordinal indicator but no feminine ordinal indicator, and all the vulgar fractions are missing. Was this intentional, an oversight, or just a problem of not having an appropriate source font? In the last case, you might consider using Unifont bitmaps as a low-resolution fallback font:

http://dvdeug.dhis.org/unifont.html

This covers a large portion of Unicode.

And finally, I found the "splat" for missing characters somewhat amusing :)

-Ben

-- 
_______________________________________________
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup

-------------- next part --------------
A non-text attachment was scrubbed...
Name: links-2.0pre1-utf-8.diff
Type: application/octet-stream
Size: 11146 bytes
Desc: not available
URL: <http://lists.linuxfromscratch.org/pipermail/links-list/attachments/20020523/37c57dca/attachment.obj>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: links-2.0pre1-utf-8.diff.txt
URL: <http://lists.linuxfromscratch.org/pipermail/links-list/attachments/20020523/37c57dca/attachment.txt>


More information about the links-list mailing list