Macro Fig: utf8

Showing posts with label utf8. Show all posts

Friday, April 13, 2012

Latin Lessons (and Other Things)

Wow, it's been over a month since I did any sort of real blog post. This must be what writers block feels like. I've been off tinkering with so many different things that it's difficult to put enough coherent thoughts together for a decent post. After I put up my previous "Where's the Beef?" post, I went back and edited the dead links in all my old posts to redirect them to the "Where's the Beef?" message. It's quite a relief to be done with that. But that's just clean up. Nothing new.

So why am I so scatterbrained? Well, I'm currently lugging 5 zipits around, all with different stuff on the internal flash. In addition, I've got a pocket full of micro/mini SD cards with various versions of IZ2S, Flashstock, openwrt, etc. And the pocket is really rough on the labels I've made for them. They keep fading or falling off, so I can no longer remember what's on what card. At least some of the zipit's are still labelled, and I'm testing some of the official zipit skin stickers to see if they're more resiliant, or easier to remember.

Zipit 1. I finally loaded Mozzwalds openwrt rescue onto a virgin zipit, as a prelude to loading slug's shiny looking gmenu/gmu rescue image. But now this particular zipit no longer likes any of my SD cards, so I'm back to being disappointed with uboot again. I could probably leave it at home in a drawer somewhere, but I keep hoping I'll come up with a brilliant plan to restore the blob bootloader and the stock kernel via the wifi.

Zipit 2. After I gave up on zipit 1, I decided to update a zipit from Mozzwalds initramfs uboot rescue image to slugs gmenu/gmu image. The initramfs was functional, but limited due to the high memory use of the filesystem in RAM. This zipit seems to have less trouble with the SD cards, even though it has the exact same uboot image. Weird. Anyhow, aside from my general misgivings about uboot, this one looks quite promising. However testing revealed a problem where some component of gmu (sdl, alsa, pthreads, nanosleep, or whatever) is causing it to use 70% of the CPU instead of the expected 17% or so. I did a test build of rockbox for openwrt a while back which also performed poorly. I wonder if it had the same problem...

Zipit 3. I've been updating my gmenu/gmu image for the blob bootloader and the stock kernel. It's not as pretty as slugs, but it's still fun to work on. I'm currently fiddling with the wifi setup scripts in an effort to add options for multiple profiles. I've also started poking through slugs gmenu2x updates to see how hard it would be to backport the wifi setup dialogs and other changes. I've also done some work on running a 2nd tty so I can have gmu in the background and a clock on the screen, but running in another tty. Both ttyclock and dgclock seem to function with gmu in the background. However this required me to map the home key as F1 so I could enter the magic ctrl-alt-F1 sequence recognized by SDL programs to escape to tty1. I created a t2 script to launch a new tty and a cls script to rotate the tty, turn on the cursor, and clear the screen, just in case it gets hosed somehow. I also added a deallocvt utility to the lcdval mcb so I can clean up abandoned ttys.

Zipit 4. I've been tinkering with the rockbox/glinks image that dronz whipped up. I built a smaller partial static glinks for sdl and added latin1 keyboard support so now there's a little bit of room to grow. Here it is with a totally fictional foreign URL.

I also built a slightly smaller mpg123 in case I want to grow this jffs into something with internet radio support. And I've been tinkering with the keymap because I really want to standardize on one sticky latin15 keymap and one non-stick keymap for all my IZ2S derivatives. I still need to do something about the other Fn keys though.

Zipit 5. This has my original jffs base image and plenty of free space for testing things while I attempt to shrink them down. For example, I used this one to test a latin1 build of pspmaps. I recycled the latin1 kbd routines from glinks and made pspmaps accept latin1 locations. Then I discovered RFC 3986 which says you now need to convert URLs to utf-8 before percent encoding and assembling your http GET request. So I finally got my foot in the door a utf-8 app (well half utf-8 anyhow). My son just went overseas with his classmates so I offered the zipit, just in case he needed to use it for directions. He took his ipod instead. C'est la vie...

With the new pspmaps you can travel from Paris to Mâcon, without crossing the Atlantic to wind up in Macon, GA.

I also used htop on zipit 5 to do some baseline performance testing of gmu and mpg123 on IZ2S based systems. Mpg123 seems to use about 13% to 18% of the CPU to play and gmu about 2% more. While testing I discovered that I need a .bashrc file in the home directory on the jffs to set the terminfo path if I want to run curses programs like htop in an ssh session. Gotta add that to the next revision of all the IZ2S based jffs systems.

Finally, I'm currently investigating toybox and it's oneit utility to see if it's more efficient than own-tty or this busybox sh trickery. Toybox is a tiny busybox alternative with some interesting features for my jffs images. It's fairly compact and comes with a nice set of utilities that complement the stock busybox without all the overhead of the full busybox replacement from IZ2S. I could use it to replace nc, killall, tee, and a few other standalone programs that I've built and get quite a bit more for just a few K. Since toybox comes with a mini bzcat, I figured I might as well look for mini versions of zip and gzip to round out the compression options. I found some, but more testing is needed.

There will be goodies when I track down where I posted them on the #zipit IRC channel...

links-latin-sdl-iz2s.zip
Here's the source, finally: lnks-sdl-iz2s-latin-src.tar.bz2
Here's a comprehensive patch for links-2.3 on the zipit: links-2.3-zipit.patch

pspmaps-utf8latin-iz2s.zip
pspmaps-utf8latin-iz2s-src.zip

mpg123-iz2jffs.zip

Here's toybox, deallocvt, and related scripts for using multiple vts (as shown below).
toybox-iz2jffs.zip

Source for the new lcdval mcb with deallocvt: ez2sutils-src.zip

For some reason I felt compelled to compile the lynx browser. lynx-iz2s.zip

Tuesday, March 6, 2012

Terminally Insane

This one is a long rambling mess. Sorry in advance...

I found a simple program at the Linux Productivity Magazine to display the curses ACS special characters. When I tried it I discovered that some looked wrong, and some were missing on the zipit. All of the box drawing characters there but the arrows were garbled, the "lantern" or vertical tab char was missing, and all of the 1 3 5 7 9 single scan line glyphs were missing except for the ones that do double duty as the dash and the underline.

I took a peek at the unicode table from my console font and discovered the unicode mappings for the scan line glyphs were way up in the Chinese section of the giant unicode table. So I moved them back down into the 0x2xxx range, rebuilt the font file and voila, the scan lines were there. Turns out ACS_LANTERN was just missing from my mcurses code. The arrows were a different story. They're not included in the 32 characters of the DEC special graphics set. I need to print a unicode character in order to display them, so for now I went with the default standby ASCII substitutes of <>^v for the arrows and # for ACS_BOARD and ACS_BLOCK.

It drives me nuts that I've got all these nice glyphs in the font and yet have no way to display them. So I went looking for some simple C code to print a unicode character. This turns out to be all tangled up with a monster sized locale and nls system on my desktop linux box.

There are about a dozen environment variables that control all sorts of settings. So I tried setting these up for US english with UTF-8 according to the notes on the internet. Not only did this have no effect, but when I try to set LC_CTYPE something (bash, I think) spits back an error message. WTF? It's just an environment variable! Shut up and set it. You might think its some sort of conspiracy to force a huge bloated locale database onto the zipit in /usr/lib/locale. Ok, it's only 2.5M but that's too much for the jffs and I think the FAT filesystem of IZ2S will bloat that way up because it can't use soft or hard links to avoid duplication. Plus I really don't need the zipit to know what order I want my socks sorted, or what time in the afternoon to serve me tea. I just want to be able to print all of the characters I loaded into my font. Is that too much to ask?

Well I did some soul searching, and some google searching and eventually unearthed the magic escape sequences to put the terminal into UTF-8 mode and back (ESC%G and ESC%@). Armed with this knowledge I wrote a short C program to print the ACS_BLOCK character. And it worked!

Well, it worked in my gnome term on the desktop linux, and it worked on the zipit linux console in both bash and the busybox ash shell. But it didn't work in gnu screen or dvtm. Fooey!. Actually it turns out I can make it work in screen if I issue the ESC%G from a shell script first and then run "screen -U". Ok. I can live with that, but dvtm just eats the ESC% and spits out the G. (I can't remember, but I think this script may have finally let me type UTF-8 at the bash command prompt, instead of just in the busybox ash shell.)

Fortunately the dvtm code is pretty straightforward (unlike screen, which may have a secret built-in nethack mode?). I found the ESC handler code and added a little patch to eat the entire ESC%G sequence and toggle the internal is_utf8 flag. But then dvtm just prints three nonsense characters for my 0xe2,0x86,0x88 sequence instead of converting to the proper unicode BLOCK glyph. The man pages tell me mbrtowc and the reverse wcrtomb functions are under the influence of LC_CTYPE. It's that darn conspiracy again!

Back to the internet for a workaround. This sorta spells out the problem and it gives a hint about the leader of the conspiracy: glibc. I'd bet uclibc doesn't even know about those locale files. Anyhow, I found some magic c code to convert utf8 to utf16 with no mention of the evil LC_CTYPE conspiracy. It worked somewhat. Debugging showed everything converted ok inside dvtm, but ncurses still eats the final result, no matter what form I try to push through. Probably in league with the conspiracy. So now what to do?

Maybe I should try to build a small ncurses example and get it working before moving back to dvtm.

Or maybe I should do something about the nl_langinfo() function. This function is used by both the dvtm vt.c init code and the ncurses ncurses/tinfo/lib_setup.c file to determine if the tty is in UTF-8 mode. The nl_langinfo function is in the uClibc libc.so.0 library and I suspect it's just a stub, hardcoded to POSIX and C locales. According to the uclibc FAQ you can compile in UTF-8 locales but that makes it significantly bigger, depending on the number of locales. (http://uclibc.org/FAQ.html) The IZ2S libc is pretty small at just under 300K. Plus you supposedly had to fetch uClibc-locale-030818.tgz manually to build the locale support, so I doubt it's in there. Grep found the word POSIX, but no mention of UTF, LC, or LANG. At least uClibc seems to be on my side, resisting the bloat, just not in a useful way that lets me use my font.

It looks like I could undefine HAVE_LANGINFO_CODESET and recompile ncurses so it uses an internal nc_get_locale() function instead. But then I might have to undefine HAVE_LOCALE_H so nc_get_locale uses getenv instead of setlocal(LC_CTYPE, 0) which always returns "C" instead of "en_US.UTF-8" (or anything.UTF-8). And what about the NCURSES_NO_UTF8_ACS environment variable hack? I may need to set that once I'm in utf8 mode in ncurses. There appear to be some special case hacks in the code to do this for gnu screen and when TERM=linux.

I ended up with a special version of nl_langinfo() that scans the environment vars for UTF-8. I also bypassed setlocale() in the ncurses lib_setup.c file and now I see a UTF-8 locale. But the meta and control conversions are taking precedence over the characters. e2 94 a4 is printed as M-b ~T M-$. And the block glyph e2 96 88 is printed as M-b ~V ~H. Darn it! Ncurses is a tough nut to crack!

Anyhow, dvtm is off the hook. I can't even get a simple standalone ncursesw program to print utf8. Worse, I can't even get the test programs that come with ncurses to print utf8! I'm either building it wrong, or something in uclibc is broken. Right now, I can't even tell if ncursesw is even using the wide char routines, though it is 20K larger than the non-wide ncurses. Stupid speed optimization macros make it really hard to decypher the code. Besides, it's more likely that some other fn like isprint or isascii from uclibc is steering ncurses the wrong way. Apparently ncurses uses just about every single function that deals with text somewhere. No wonder it's so fat. And yes, it uses isprint() in multiple places, and the man page implies isprint() needs a working setlocale() to handle utf8. However the uclibc implementation of isprint is a friggin ugly macro in ctype.h, so I can't easily replace it. The macro doesn't look like it even considers utf-8. That means multiple hacks will be needed in the obtuse ncurses code instead. Ouch.

Meanwhile the ncurses source directory has a wcwidth.h file that looks suspiciously like the substitute wcwidth fn I dug up. And there are tantalizing hints in the test directory that just maybe there's a configure option to use libutf8 instead of the libc defaults. Is that just for the test programs? Maybe I should try it. Then if it works try it with my test program, and then dvtm if that works. Unfortunately I think it's a hidden configure option. configure --help=short gives no clue how to change it.

My list of hacks was getting sorta big, so I decided to try libutf8 to replace the whole pile of crap. It's kinda big though at 160K, and the easier to use shared lib LD_PRELOAD plug version wouldn't link in my scratchbox toolchain. So in the end, all I can say is that I'm still working on on it...

Someday when I get things together I'll post the updated fonts, and maybe an new dvtm-wide.