Tuesday, March 6, 2012

Terminally Insane

This one is a long rambling mess.  Sorry in advance...

I found a simple program at the Linux Productivity Magazine to display the curses ACS special characters.  When I tried it I discovered that some looked wrong, and some were missing on the zipit.  All of the box drawing characters there but the arrows were garbled, the "lantern" or vertical tab char was missing, and all of the 1 3 5 7 9 single scan line glyphs were missing except for the ones that do double duty as the dash and the underline.

I took a peek at the unicode table from my console font and discovered the unicode mappings for the scan line glyphs were way up in the Chinese section of the giant unicode table.  So I moved them back down into the 0x2xxx range, rebuilt the font file and voila, the scan lines were there.  Turns out ACS_LANTERN was just missing from my mcurses code. The arrows were a different story.  They're not included in the 32 characters of the DEC special graphics set.  I need to print a unicode character in order to display them, so for now I went with the default standby ASCII substitutes of <>^v for the arrows and # for ACS_BOARD and ACS_BLOCK.

It drives me nuts that I've got all these nice glyphs in the font and yet have no way to display them.  So I went looking for some simple C code to print a unicode character.  This turns out to be all tangled up with a monster sized locale and nls system on my desktop linux box. 

There are about a dozen environment variables that control all sorts of settings.  So I tried setting these up for US english with UTF-8 according to the notes on the internet.  Not only did this have no effect, but when I try to set LC_CTYPE something (bash, I think) spits back an error message.  WTF?  It's just an environment variable!  Shut up and set it.  You might think its some sort of conspiracy to force a huge bloated locale database onto the zipit in /usr/lib/locale.  Ok, it's only 2.5M but that's too much for the jffs and I think the FAT filesystem of IZ2S will bloat that way up because it can't use soft or hard links to avoid duplication.  Plus I really don't need the zipit to know what order I want my socks sorted, or what time in the afternoon to serve me tea.  I just want to be able to print all of the characters I loaded into my font.  Is that too much to ask?

Well I did some soul searching, and some google searching and eventually unearthed the magic escape sequences to put the terminal into UTF-8 mode and back (ESC%G and ESC%@).  Armed with this knowledge I wrote a short C program to print the ACS_BLOCK character.  And it worked!

Well, it worked in my gnome term on the desktop linux, and it worked on the zipit linux console in both bash and the busybox ash shell.  But it didn't work in gnu screen or dvtm.  Fooey!.  Actually it turns out I can make it work in screen if I issue the ESC%G from a shell script first and then run "screen -U".  Ok.  I can live with that, but dvtm just eats the ESC% and spits out the G. (I can't remember, but I think this script may have finally let me type UTF-8 at the bash command prompt, instead of just in the busybox ash shell.)

Fortunately the dvtm code is pretty straightforward (unlike screen, which may have a secret built-in nethack mode?). I found the ESC handler code and added a little patch to eat the entire ESC%G sequence and toggle the internal is_utf8 flag.  But then dvtm just prints three nonsense characters for my 0xe2,0x86,0x88 sequence instead of converting to the proper unicode BLOCK glyph.  The man pages tell me mbrtowc and the reverse wcrtomb functions are under the influence of LC_CTYPE.  It's that darn conspiracy again! 

Back to the internet for a workaround.  This sorta spells out the problem and it gives a hint about the leader of the conspiracy: glibc.  I'd bet uclibc doesn't even know about those locale files.  Anyhow, I found some magic c code to convert utf8 to utf16 with no mention of the evil LC_CTYPE conspiracy.  It worked somewhat.  Debugging showed everything converted ok inside dvtm, but ncurses still eats the final result, no matter what form I try to push through.  Probably in league with the conspiracy.  So now what to do?

Maybe I should try to build a small ncurses example and get it working before moving back to dvtm. 

Or maybe I should do something about the nl_langinfo() function.  This function is used by both the dvtm vt.c init code and the ncurses ncurses/tinfo/lib_setup.c file to determine if the tty is in UTF-8 mode.  The nl_langinfo function is in the uClibc libc.so.0 library and I suspect it's just a stub, hardcoded to POSIX and C locales.  According to the uclibc FAQ you can compile in UTF-8 locales but that makes it significantly bigger, depending on the number of locales. (http://uclibc.org/FAQ.html)  The IZ2S libc is pretty small at just under 300K.  Plus you supposedly had to fetch uClibc-locale-030818.tgz manually to build the locale support, so I doubt it's in there.  Grep found the word POSIX, but no mention of UTF, LC, or LANG.  At least uClibc seems to be on my side, resisting the bloat, just not in a useful way that lets me use my font.

It looks like I could undefine HAVE_LANGINFO_CODESET and recompile ncurses so it uses an internal nc_get_locale() function instead.  But then I might have to undefine HAVE_LOCALE_H so nc_get_locale uses getenv instead of setlocal(LC_CTYPE, 0) which always returns "C" instead of "en_US.UTF-8" (or anything.UTF-8).  And what about the NCURSES_NO_UTF8_ACS environment variable hack?  I may need to set that once I'm in utf8 mode in ncurses.  There appear to be some special case hacks in the code to do this for gnu screen and when TERM=linux.

I ended up with a special version of nl_langinfo() that scans the environment vars for UTF-8.  I also bypassed setlocale() in the ncurses lib_setup.c file and now I see a UTF-8 locale.  But the meta and control conversions are taking precedence over the characters.  e2 94 a4 is printed as M-b ~T M-$.  And the block glyph e2 96 88 is printed as M-b ~V ~H.  Darn it!  Ncurses is a tough nut to crack!

Anyhow, dvtm is off the hook.  I can't even get a simple standalone ncursesw program to print utf8.  Worse, I can't even get the test programs that come with ncurses to print utf8!  I'm either building it wrong, or something in uclibc is broken.  Right now, I can't even tell if ncursesw is even using the wide char routines, though it is 20K larger than the non-wide ncurses.  Stupid speed optimization macros make it really hard to decypher the code.  Besides, it's more likely that some other fn like isprint or isascii from uclibc is steering ncurses the wrong way.  Apparently ncurses uses just about every single function that deals with text somewhere.  No wonder it's so fat.  And yes, it uses isprint() in multiple places, and the man page implies isprint() needs a working setlocale() to handle utf8.  However the uclibc implementation of isprint is a friggin ugly macro in ctype.h, so I can't easily replace it.  The macro doesn't look like it even considers utf-8.  That means multiple hacks will be needed in the obtuse ncurses code instead.  Ouch.

Meanwhile the ncurses source directory has a wcwidth.h file that looks suspiciously like the substitute wcwidth fn I dug up.  And there are tantalizing hints in the test directory that just maybe there's a configure option to use libutf8 instead of the libc defaults.  Is that just for the test programs?  Maybe I should try it.  Then if it works try it with my test program, and then dvtm if that works.  Unfortunately I think it's a hidden configure option.  configure --help=short gives no clue how to change it.

My list of hacks was getting sorta big, so I decided to try libutf8 to replace the whole pile of crap.  It's kinda big though at 160K, and the easier to use shared lib LD_PRELOAD plug version wouldn't link in my scratchbox toolchain.  So in the end, all I can say is that I'm still working on on it...

Someday when I get things together I'll post the updated fonts, and maybe an new dvtm-wide.

No comments:

Post a Comment