Creating an ASCII table

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Message
Author
User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

Creating an ASCII table

#1 Post by MochiMoppel »

My next update of MMview will add more viewing options for binary files. Dealing with binaries involves identification of control characters and hex values and I often felt the need to consult ASCII maps to find out, what a specific hex value represents.

It's easy to find ASCII tables online but it seems that Puppy doesn't include an offline version, so I wrote one myself.

I started with a table generated with awk before I found that in older Puppies the file /usr/share/cups/charmaps/windows-1252.txt and in newer Puppies the file /usr/lib/aspell/cp1252.cset include descriptions for each character (though not the characters themselves). They also contain Unicode codepoints for characters in extended ASCII range 0x80-0x9F of codepage 1252. These codepoints do not correspond with the hex values of this range and therefore can't easily be generated otherwise.

I intend to integrate following script into MMview but before I do I would like to know

1) if it contains unexpected bugs
2) if it could possibly be simplified
3) if either one of the 2 files I mentioned is indeed present in all Puppies, which would eliminate the need for the fall-back option /tmp/asciimap.tmp

Thanks.

Code: Select all

#!/bin/bash
LOCATION_1=/usr/share/cups/charmaps/windows-1252.txt
LOCATION_2=/usr/lib/aspell/cp1252.cset
LOCATION_3=/usr/lib64/aspell/cp1252.cset
if   [[ -s $LOCATION_1 ]];then CHARMAP=$LOCATION_1
elif [[ -s $LOCATION_2 ]];then CHARMAP=$LOCATION_2
elif [[ -s $LOCATION_3 ]];then CHARMAP=$LOCATION_3
else #may never be needed as one of above should always be present
    CHARMAP=/tmp/asciimap.tmp
    awk '
    BEGIN {
    for (i=0  ;i<=127;i++) printf "%02X\t%04X\t%s\n",i,i,"#"
    for (i=128;i<=159;i++) printf "%02X\t%04X_<control_character>\t%s\n",i,i,"#"
    for (i=160;i<=255;i++) printf "%02X\t%04X\t%s\n",i,i,"#"
    }' > $CHARMAP
fi
LC_ALL=C awk '
BEGIN {
hline="---------------------------------------------------------"
print "=========== ASCII Table ==========\n\nDEC\tHEX\tCHR\tCODEPNT NAME\n"hline
}
/^[0-9A-F]/{                        # process only lines starting with hex characters
text=substr($0, match($0,/#/)+1)    # extract from position of first encountered #, plus 1 to excl. #
sub(/^ /,"",text)                   # remove leading space from text (cp1252.cset)
$1=substr($1, match($1,/x/)+1)      # remove 0x from 0xnn (windows-1252.txt) or keep nn (cp1252.cset)
dec=strtonum("0x" $1)               # hex to dec 
cpnt="U+"substr($2, match($2,/x/)+1)# remove 0x from 0xnnnn (windows-1252.txt) or keep nnnn (cp1252.cset)
if (cpnt ~ /#/) cpnt="U+00"$1       # if cpnt rendered as U+#UNDEFINED (windows-1252.txt)
if (dec<32||dec==127) char=""; else char=sprintf("%c",dec)
if (dec==7) text="BELL (esc \\a)"
if (dec==8) text="BACKSPACE (esc \\b)"
if (dec==9) {text="HORIZONTAL TABULATION (esc \\t)";char="TAB"}
if (dec==10)    {text="LINE FEED (esc \\n)";char="LF" }
if (dec==11)    text="VERTICAL TABULATION (esc \\v)"
if (dec==12)    text="FORM FEED (esc \\f)"
if (dec==13)    {text="CARRIAGE RETURN (esc \\r)";char="CR"}
if ($1==20) print hline"\nPrintable ASCII\n"hline
if ($1==80) print hline"\nExtended ASCII (example: codepage 1252)\n"hline
printf "%03d\t%s\t%s\t%s\t%s\n" ,dec,$1,char,cpnt,text
}' $CHARMAP | iconv -c -f CP1252 -t UTF-8 | gxmessage -file -
EDIT: Tentatively added LOCATION_3 for Fatdog users.

.
Attachments
Screenshot.png
Screenshot shows output using /usr/share/cups/charmaps/windows-1252.txt
Ouput using /usr/lib/aspell/cp1252.cset would show different descriptions
(44.15 KiB) Downloaded 532 times
Last edited by MochiMoppel on Sat 05 Oct 2019, 03:06, edited 3 times in total.

williams2
Posts: 337
Joined: Fri 14 Dec 2018, 22:18

#2 Post by williams2 »

it seems that Puppy doesn't include an offline version
BionicPup64 has /usr/local/bin/ascii.sh
It displays using gtk_text_info

The bash script is modified from a script here:
http://tldp.org/LDP/abs/html/asciitable.html
There is an awk script on that web page, too.

User avatar
tallboy
Posts: 1760
Joined: Tue 21 Sep 2010, 21:56
Location: Drøbak, Norway

#3 Post by tallboy »

Hi MochiMoppel.
In both my Dpup Stretch-7.5 RC4, and Tahr64-6.0.6-uefi, there is no /usr/share/cups/charmaps/windows-1252.txt. There is no directory /usr/share/cups/charmaps/ at all, but there is one named /charset/.
The windows-1252.txt file is not found by pFind in any system file.

The file /usr/lib/aspell/cp1252.cset is there, in both Puppys.

Dpup Stretch-7.5 RC4 has /usr/local/bin/ascii.sh
True freedom is a live Puppy on a multisession CD/DVD.

step
Posts: 1349
Joined: Fri 04 May 2012, 11:20

#4 Post by step »

Hi MochiMoppel,

Fatdog64 has /usr/lib64/aspell/cp1252.cset but no /usr/share/cups/charmaps. There's /usr/share/cups/charsets but the files in there define font mappings. Perhaps the getunimap command might be of help:
> getunimap - dump the unicode map for the current console to stdout
[url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Fatdog64-810[/url]|[url=http://goo.gl/hqZtiB]+Packages[/url]|[url=http://goo.gl/6dbEzT]Kodi[/url]|[url=http://goo.gl/JQC4Vz]gtkmenuplus[/url]

some1
Posts: 117
Joined: Thu 17 Jan 2013, 11:07

#5 Post by some1 »

On a decent distro :)

/usr/share/i18n/charmaps

-Contains even an edition of all unicodes
..
The compression used may vary between distros.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#6 Post by MochiMoppel »

@williams2, @tallboy: Thanks for pointing me to ascii.sh. Not exactly what I was looking for but good to know that an ASCII table exists in some Puppies
williams2 wrote:The bash script is modified from a script here:
http://tldp.org/LDP/abs/html/asciitable.html
There is an awk script on that web page, too.
Makes me wonder why the awk script hasn't been used for the modification. Would be about 20 times faster.

@tallboy: I expect that only one of the 2 files would be present but not both. I edited my post to make it clearer (?).

@step: So the answer to my question 3) is no? Would the script work for you when replacing /usr/lib/aspell/cp1252.cset with /usr/lib64/aspell/cp1252.cset ?
On my system there is no getunimap command.

@some1: /usr/share/i18n/charmaps doesn't seem to contain charmap 1252, which is strange since 1252 is said to be the most widely used charmap. Though ISO-8859-1 comes close it is not the same.
As you already mentioned differing compression methods could be another show stopper. Would be interesting to know if indeed differences exist between distros.

some1
Posts: 117
Joined: Thu 17 Jan 2013, 11:07

#7 Post by some1 »

https://en.wikipedia.org/wiki/Windows-1252

Yes I know about the differences.

The € -might be handy - but who needs the flyspecs? :)

One may note - that 2% seem to like the ISO,6% the Windows codepage--
the rest can do without.

Anyway . nice awk and idea.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#8 Post by MochiMoppel »

some1 wrote:One may note - that 2% seem to like the ISO,6% the Windows codepage--
the rest can do without.
According to your link it's even much less. UTF-8 rulez!
Nevertheless when searching for ways to display old MS Word or WordPerfect documents I found that the "body texts" of such documents often contain strange non-ASCII characters, represented in hexdump as periods. That's where your flyspecks come into play. All sorts of curly, left, right and who knows what quotation marks were in use and cp1252 helps to identify them.

User avatar
rufwoof
Posts: 3690
Joined: Mon 24 Feb 2014, 17:47

#9 Post by rufwoof »

some1 wrote:The € -might be handy
On my UK laptop keyboard the € is AltGr 4 (alt to the right of the spacebar). Used to near never use AltGr other than for the € .. and even then very rarely. Nowadays however I use AltGr SPACE regularly as both that and the regular Alt SPACE launches my program launcher (xlunch). WIN SPACE launches skippy-xd (live) window selector. jwmrc snippet ...

Code: Select all

    <Key mask="C" key="Down">exec:amixer -c 1 set Master 2%- </Key>
    <Key mask="C" key="Up">exec:amixer set -c 1 Master 2%+ </Key>
    <Key mask="C" key="0">exec:amixer -c 1 sset Master,0 toggle </Key>
    <Key mask="4" key="space">exec:skippy-xd</Key>
    <Key mask="A" key="space">exec:/usr/local/bin/xlunch-show.sh</Key>
    <Key mask="5" keycode="65">exec:/usr/local/bin/xlunch-show.sh</Key> # AltGr space
Those key combinations for window selecting and program launching alongside the touchpad fits well for me. More often use that with the arrow keys for program selecting than I do use the touchpad. Also blends well IMO with the laptop's rightmost vertical strip of keys for jumping to the top of a web page (HOME), bottom (END) and Page Up/Down (or arrow up/down).
[size=75]( ͡° ͜ʖ ͡°) :wq[/size]
[url=http://murga-linux.com/puppy/viewtopic.php?p=1028256#1028256][size=75]Fatdog multi-session usb[/url][/size]
[size=75][url=https://hashbang.sh]echo url|sed -e 's/^/(c/' -e 's/$/ hashbang.sh)/'|sh[/url][/size]

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#10 Post by MochiMoppel »

No response from step yet, so maybe a fatdog user can answer my question to him.
I don't know fatdog, but in another 64bit Puppy (bionicpup64-8.0) the directory /usr/lib64/aspell is symlinked to /usr/lib/aspell, so in bionicpup64-8.0 there shouldn't be any problems. Same as in fatdog?

User avatar
rufwoof
Posts: 3690
Joined: Mon 24 Feb 2014, 17:47

#11 Post by rufwoof »

Fatdog ...

# pwd
/usr/lib64
# ls -l aspell
lrwxrwxrwx 1 root root 11 Jul 31 19:19 aspell -> aspell-0.60

... and ...

/usr/lib has no aspell entry at all
[size=75]( ͡° ͜ʖ ͡°) :wq[/size]
[url=http://murga-linux.com/puppy/viewtopic.php?p=1028256#1028256][size=75]Fatdog multi-session usb[/url][/size]
[size=75][url=https://hashbang.sh]echo url|sed -e 's/^/(c/' -e 's/$/ hashbang.sh)/'|sh[/url][/size]

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#12 Post by MochiMoppel »

Thanks. I added LOCATION_3 to above script and *assume* that this will work for Fatdog users. I can' t test it.
If there are no other locations in the Puppy universe I can remove the fall-back solution.

User avatar
rufwoof
Posts: 3690
Joined: Mon 24 Feb 2014, 17:47

#13 Post by rufwoof »

In my (wiak's build scripts) voidlinux, aspell is in /sbin

charmaps are in /usr/share/i18n/charmaps
[size=75]( ͡° ͜ʖ ͡°) :wq[/size]
[url=http://murga-linux.com/puppy/viewtopic.php?p=1028256#1028256][size=75]Fatdog multi-session usb[/url][/size]
[size=75][url=https://hashbang.sh]echo url|sed -e 's/^/(c/' -e 's/$/ hashbang.sh)/'|sh[/url][/size]

step
Posts: 1349
Joined: Fri 04 May 2012, 11:20

#14 Post by step »

Hi MochiMoppel, I tested the script with LOCATION_3 in Fatdog64 and it works. Thanks.
[url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Fatdog64-810[/url]|[url=http://goo.gl/hqZtiB]+Packages[/url]|[url=http://goo.gl/6dbEzT]Kodi[/url]|[url=http://goo.gl/JQC4Vz]gtkmenuplus[/url]

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#15 Post by MochiMoppel »

@rufwoof: "aspell is in /sbin"? :shock: Are you sure you mean the directory aspell and not the executable file with the same name?

@step: Thanks for testing.
some1 wrote:On a decent distro :)
/usr/share/i18n/charmaps
Seems that at least the location is consistent in all Puppies. Let's give it a try. I adapted the script and I'm surprised that despite the decompression involved it's pretty fast. As expected not all charmaps are supported by iconv (at least not in my iconv version), I tested all and listed those that work/don't work with the script here:

Code: Select all

File                works
---------------------------
ANSI_X3.4-1968.gz   yes
CP737.gz            no
CP775.gz            no
IBM437.gz           no
IBM850.gz           yes
IBM852.gz           no
IBM855.gz           no
IBM857.gz           no
IBM860.gz           no
IBM861.gz           no
IBM862.gz           no
IBM863.gz           no
IBM865.gz           no
IBM866.gz           no
IBM866NAV.gz        no
IBM869.gz           no
ISO-8859-1.gz       yes 
ISO-8859-2.gz       yes 
ISO-8859-2.gz       yes
ISO-8859-10.gz      yes
ISO-8859-11.gz      yes
ISO-8859-13.gz      yes
ISO-8859-14.gz      yes
ISO-8859-15.gz      yes
ISO-8859-16.gz      yes
UTF-8.gz            n.a.
My favorite is IBM850 because it's the only one with printable characters in hex range 80~9F.
ISO-8859-15 and ISO-8859-16 include the EURO sign.

I hope that this script works in all Puppies :

Code: Select all

#!/bin/bash
CODEPAGE=IBM850
FONTSIZE=10

gzip -cd /usr/share/i18n/charmaps/$CODEPAGE |  LC_ALL=C gawk -v cp=$CODEPAGE '
BEGIN { 
hline="---------------------------------------------------------" 
print "======== CODEPAGE "cp" =======\n\nDEC\tHEX\tCHR\tCODEPNT NAME\n"hline 
}
/^<U.*> /{
utf="U+"substr($1,3,4)
hex=substr($2,3)
dec=strtonum("0x" hex)
txt=substr($0,index($0,$3))
if (dec<32||dec==127) char=""; else char=sprintf("%c",dec) 
if (dec==9)  char="TAB" 
if (dec==10) char="LF" 
if (dec==13) char="CR"
if (dec==32)  print hline"\nPrintable ASCII\n"hline 
if (dec==128) print hline"\nExtended  ASCII\n"hline 
printf "%03d\t%s\t%s\t%s\t%s\n",dec,hex,char,utf,txt
}' | iconv -c -f $CODEPAGE -t UTF-8 2>&1 | gxmessage -title "CODEPAGE $CODEPAGE" -c -fn $FONTSIZE -file -
EDIT1: Changed awk to gawk and /<U.*> /{ to /^<U.*> /{
EDIT2: Added LC_ALL=C
Attachments
Screenshot.png
(57.58 KiB) Downloaded 222 times
Last edited by MochiMoppel on Tue 08 Oct 2019, 14:39, edited 2 times in total.

User avatar
rufwoof
Posts: 3690
Joined: Mon 24 Feb 2014, 17:47

#16 Post by rufwoof »

MochiMoppel wrote:@rufwoof: "aspell is in /sbin"? :shock: Are you sure you mean the directory aspell and not the executable file with the same name?
Directory/folder for aspell is /usr/lib/aspell-0.60/

There's also a aspell file in /usr/bin, that has the exact same filesize as the one in /sbin.
[size=75]( ͡° ͜ʖ ͡°) :wq[/size]
[url=http://murga-linux.com/puppy/viewtopic.php?p=1028256#1028256][size=75]Fatdog multi-session usb[/url][/size]
[size=75][url=https://hashbang.sh]echo url|sed -e 's/^/(c/' -e 's/$/ hashbang.sh)/'|sh[/url][/size]

step
Posts: 1349
Joined: Fri 04 May 2012, 11:20

#17 Post by step »

MochiMoppel wrote:
> I hope that this script works in all Puppies

It works in Fatdog64, with the caveat that iconv prints some weird character in front of the good one, see the screenshot. Btw, the script requires gawk due to strtonum not being POSIX awk.
Attachments
codepage-20191007-001.png
(39.94 KiB) Downloaded 391 times
[url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Fatdog64-810[/url]|[url=http://goo.gl/hqZtiB]+Packages[/url]|[url=http://goo.gl/6dbEzT]Kodi[/url]|[url=http://goo.gl/JQC4Vz]gtkmenuplus[/url]

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#18 Post by MochiMoppel »

Not only does your iconv print weird characters (┬ U+252C and ├ U+251C) in front of the good ones, it even stops printing good ones. Starting with dec 192 it stops printing BOX DRAWINGS and prints what should start with dec 80 (see my previous screenshot).
What happens when you use the iconv command of the first script (CP1252) and leave the rest as is? Should produce wrong names but good characters. I'm using iconv (GNU libc) 2.15

Thanks for mentioning the gawk requirement. Here awk is symlinked to gawk so I would never notice.

some1
Posts: 117
Joined: Thu 17 Jan 2013, 11:07

#19 Post by some1 »

Just a hunch/idea reading the latest posts.
1. set LC_ALL=C gawk
2. check the manual about (s)printf %c

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#20 Post by MochiMoppel »

some1 wrote: 1. set LC_ALL=C gawk
Oh yes, sorry. Forgot that in the second script (it's in the first and that made it work for step).

The reason for using LC_ALL=C: gawk 4.1.4 or newer requires the -b option (--characters-as-bytes) when printing extended ASCII. Older gawks (like mine) don't need and know such option and produce a syntax error when it is used.
LC_ALL=C comes to the rescue and removes the need for an option.

I edited the script. Should work now. Thanks for your reminder.

Post Reply