(OLD) (ARCHIVED) Puppy Linux Discussion Forum Forum Index (OLD) (ARCHIVED) Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info

This forum can also be accessed as http://oldforum.puppylinux.com
It is now read-only and serves only as archives.

Please register over the NEW forum
https://forum.puppylinux.com
and continue your work there. Thank you.

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups    
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Wed 25 Nov 2020, 21:40
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
Building MMview, a universal file viewer
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies. View previous topic :: View next topic
Page 23 of 24 [348 Posts]   Goto page: Previous 1, 2, 3, ..., 21, 22, 23, 24 Next
Author Message
step

Joined: 04 May 2012
Posts: 1352

PostPosted: Sun 19 Apr 2020, 13:10    Post subject:  

Perhaps it would begin to make sense if we suspected that letter Ì's encoding includes a null byte and awk is reading the string byte-wise rather than character-wise.
I can't reproduce your findings. I copied the four lines in your post and pasted them into a file, then ran the awk commands (no grep) with the file as input. LC_ALL is unset, my locale is English for the US, en_US.UTF-8. What's yours?

edit Here also mm_view's Find function prints the input file correctly when given "GetHelpSourceDialog" as the search term.

This post discusses a similar issue specifically about GNU awk. Basically it suggests that the input data forms invalid text in the OP's locale, therefore awk produces invalid output.

_________________
Fatdog64-810|+Packages|Kodi|gtkmenuplus
Back to top
View user's profile Send private message 
MochiMoppel


Joined: 26 Jan 2011
Posts: 2084
Location: Japan

PostPosted: Tue 21 Apr 2020, 06:08    Post subject:  

step wrote:
LC_ALL is unset, my locale is English for the US, en_US.UTF-8. What's yours?
Same
Quote:
I copied the four lines in your post and pasted them into a file, then ran the awk commands (no grep) with the file as input.
If that's all you did then it's not a valid comparison. Your file will be UTF-8 encoded, no problem for awk. You have to change to ISO-8859-1.

When I create this script in Geany
    #!/bin/bash
    awk 'BEGIN {print substr("MenuOkÌ128ÍselfÎGetHelpSourceDialog",1)}'
and run with F5 the result looks perfect: MenuOkÌ128ÍselfÎGetHelpSourceDialog.
Now in Geany I select Menu > Document > Set Encoding > West European > Western (ISO-8859-1) and run the script again. All I get is MenuOk.

My grep version: 2.14
My awk version: 3.1.

Quote:
edit Here also mm_view's Find function prints the input file correctly when given "GetHelpSourceDialog" as the search term.
Shocked Are you saying that MMview displays all 4 lines correctly? Would be close to a miracle since the edit widget is unable to display files other than plain ASCII or UTF-8. Trying to read ISO-8859-1 results in an error. Or do you mean that MMview (like in my screenshot) prints the input file incorrectly? What exactly is your output in the viewer pane when you search in directory /usr/share for "GetHelpSourceDialog" ?

Quote:
This post discusses a similar issue specifically about GNU awk. Basically it suggests that the input data forms invalid text in the OP's locale, therefore awk produces invalid output.
Probably related.
Back to top
View user's profile Send private message 
step

Joined: 04 May 2012
Posts: 1352

PostPosted: Tue 21 Apr 2020, 09:20    Post subject:  

So we use the same locale en_US.UTF-8.
Mochimoppel wrote:

My grep version: 2.14
My awk version: 3.1.


Mine:
# awk --version
GNU Awk 4.2.0, API: 2.0 (GNU MPFR 4.0.1, GNU MP 6.1.2)
# grep --version
grep (GNU grep) 3.1

Gawk 4.0.0 was release in the second half of 2011 source.
GNU awk version history: https://www.gnu.org/software/gawk/manual/html_node/Feature-History.html.
encoding-20200421.gif
 Description   geany before saving the file. Text selected and pasted directly from the forum. File encoding (in status line) automatically set by geany, not by me.
 Filesize   38.18 KB
 Viewed   517 Time(s)

encoding-20200421.gif

encoding-20200421-001.gif
 Description   geany after saving the file: file encoding unchanged.
 Filesize   39.33 KB
 Viewed   518 Time(s)

encoding-20200421-001.gif

encoding-20200421-002.gif
 Description   File contents (I forgot I could use mm_view to show hex bytes).
 Filesize   32.21 KB
 Viewed   516 Time(s)

encoding-20200421-002.gif

encoding-20200421-003.gif
 Description   awk: garbage in garbage out but with a warning
 Filesize   65.41 KB
 Viewed   517 Time(s)

encoding-20200421-003.gif

encoding-20200421-004.gif
 Description   geany terminal: grep does find a match but it's binary data so grep doesn't print it as the grep manual explains.
 Filesize   51.09 KB
 Viewed   517 Time(s)

encoding-20200421-004.gif

encoding-20200421-005.gif
 Description   geany terminal: add -a to grep to force printing binary file matches
 Filesize   51.26 KB
 Viewed   517 Time(s)

encoding-20200421-005.gif

encoding-20200421-006.gif
 Description   mm_view search input dialog
 Filesize   32.21 KB
 Viewed   521 Time(s)

encoding-20200421-006.gif

encoding-20200421-007.gif
 Description   mm_view text pane after search
 Filesize   6.67 KB
 Viewed   519 Time(s)

encoding-20200421-007.gif

encoding-20200421-008.gif
 Description   mm_view status line after search
 Filesize   3.46 KB
 Viewed   519 Time(s)

encoding-20200421-008.gif


_________________
Fatdog64-810|+Packages|Kodi|gtkmenuplus
Back to top
View user's profile Send private message 
some1

Joined: 17 Jan 2013
Posts: 123

PostPosted: Wed 22 Apr 2020, 05:31    Post subject:  

(I am faraway from linux,iconv,awk -so
just a possible idea/workaround to try)


Assuming an utf-8 locale:
iconv -f ISO-8859-1 < ISO-8859-1-data.txt | awk ....

Will the iconv-conversion produce valid data for awk ?
Back to top
View user's profile Send private message 
step

Joined: 04 May 2012
Posts: 1352

PostPosted: Wed 22 Apr 2020, 17:24    Post subject:  

@some1: Yes, it helps grep
Code:

# iconv -t UTF-8 -f ISO-8859-1 < geany1.sh | grep Help
awk 'BEGIN {print substr("MenuOkÌ128ÍselfÎGetHelpSourceDialog",1)}'

_________________
Fatdog64-810|+Packages|Kodi|gtkmenuplus
Back to top
View user's profile Send private message 
MochiMoppel


Joined: 26 Jan 2011
Posts: 2084
Location: Japan

PostPosted: Wed 22 Apr 2020, 22:53    Post subject:  

Thanks for the feedback and sorry for the late reply. I had no access to my computer.

@step: Your screenshot shows that you are using an outdated MMview version, making it impossible to answer my question ("What exactly is your output in the viewer pane when you search in directory /usr/share for "GetHelpSourceDialog" ?"). My problem occurs with the newest version and its file text search function. The old version doesn't have this capability and doesn't use awk - at least not for this purpose.

Still it's interesting to see that awk has changed, unfortunately not for the better.
If awk now issues an "Invalid multibyte data detected" warning I assume that it does so using stderr, which means that nothing is output to stdout and no match will be reported in MMview. Before awk at least tried to output the portion it could read, which was better than nothing.

With the new grep we don't have to worry about encoding anymore. In its old version grep treats ISO-8859-1 data as text, which would be in line with mime text/plain detection by the file command. Your screenshot shows that the newer grep now treats it as binary. The terse "Binary file blabla matches" line is all we get, no matching text line. Always ASCII and therefore no problem for awk.

@some1
yes, I think iconv it the only reliable way to avoid encoding problems with awk and gtkdialog, though not the way you and step tried to use it. We can't use -f ISO-8859-1 here as this would destroy all UTF-8 characters of the input file. Instead I don't use -f , only -t.
In mm_view I collect the grep/awk output in tmpfile, when it's done I shift it to msgfile for display in the viewer pane. This is done with a simple mv tmpfile > msgfile command. If tmpfile contains only a single ISO-8859-1 or other invalid UTF-8 character, the msgfile is spoiled and unreadable for gtkdialog. I therefore replaced the mv command with
iconv -ct UTF-8 tmpfile > msgfile. Seems to work. It reads tmpfile, irrespective of its encoding (most of the time it will be UTF-8 ) and translates it into UTF-8 (most of the times needlessly, but who cares, it's a precaution). The important part is the -c option: All invalid UTF-8 characters, e.g. the ones in ISO-8859-1 files, will be removed and only ASCII and valid UTF-8 remains.
Back to top
View user's profile Send private message 
rockedge


Joined: 11 Apr 2012
Posts: 1874
Location: Connecticut, United States

PostPosted: Sat 16 May 2020, 17:31    Post subject:  

was looking for a font....I had mm-view running...... and now I wish I could examine the fonts as they would appear using mm_view as well as seeing the actual content of the TFF.

Crazy right? just a thought, I like using the program.
Back to top
View user's profile Send private message Visit poster's website 
MochiMoppel


Joined: 26 Jan 2011
Posts: 2084
Location: Japan

PostPosted: Sun 17 May 2020, 01:24    Post subject:  

I don't know what you mean by "seeing the actual content". There is not much to see in a binary font file.
You could examine the files with tools like fc-query. Gives you more info than you probably need.

Speaking of fonts I also have a question: Does anyone know how to convert the "charset" values into UTF-8 codepoints?
What I want to know and display are the Unicode blocks a given font file supports. DejaVu Sans supports a lot, others, e.g. Type1 Dingbats, very few. My only problem is the conversion.
fcquery.png
 Description   
 Filesize   70.65 KB
 Viewed   300 Time(s)

fcquery.png

Back to top
View user's profile Send private message 
SFR


Joined: 26 Oct 2011
Posts: 1802

PostPosted: Sun 17 May 2020, 06:30    Post subject:  

MochiMoppel wrote:
I don't know what you mean by "seeing the actual content".

I might be wrong, but perhaps what Rockedge had in mind is previewing a specific font, e.g. something like:
Code:
#!/bin/sh

gen_line() {
   fc-list : family | xargs -I{} echo '
      <text use-markup="true" wrap="false">
         <label>"<span font='"'{}'"' size='"'x-large'"'>The quick brown fox jumps over the lazy dog.</span>"</label>
      </text>
   '
}

echo '
<window>
   <vbox visible="false">
      '$(gen_line)'
      <variable>vVBOX</variable>
   </vbox>
   <action signal="map-event">show:vVBOX</action>
</window>
' | gtkdialog -s

# showing vVBOX on map-event, because if it's visible from the beginning,
# the 'wrap="false"' attribute for <text> causes an ugly, empty space
# in the the bottom part of the window


MochiMoppel wrote:
Speaking of fonts I also have a question: Does anyone know how to convert the "charset" values into UTF-8 codepoints?
What I want to know and display are the Unicode blocks a given font file supports. DejaVu Sans supports a lot, others, e.g. Type1 Dingbats, very few. My only problem is the conversion.

Again, not sure if that's what you want, but take a look at this answer: https://stackoverflow.com/a/60475015
I only changed fc-match to fc-query to operate on specific files and it looks like this:
Code:
# ./ls-chars Fonts/OLDE_PL.TTF
0020     0021 !   0022 "   0023 #   0024 $   0025 %   0026 &   0027 '   0028 (   0029 )   
002a *   002b +   002c ,   002d -   002e .   002f /   0030 0   0031 1   0032 2   0033 3   
0034 4   0035 5   0036 6   0037 7   0038 8   0039 9   003a :   003b ;   003c <   003d =   
003e >   003f ?   0040 @   0041 A   0042 B   0043 C   0044 D   0045 E   0046 F   0047 G   
0048 H   0049 I   004a J   004b K   004c L   004d M   004e N   004f O   0050 P   0051 Q   
0052 R   0053 S   0054 T   0055 U   0056 V   0057 W   0058 X   0059 Y   005a Z   005b [   
005c \   005d ]   005e ^   005f _   0060 `   0061 a   0062 b   0063 c   0064 d   0065 e   
0066 f   0067 g   0068 h   0069 i   006a j   006b k   006c l   006d m   006e n   006f o   
0070 p   0071 q   0072 r   0073 s   0074 t   0075 u   0076 v   0077 w   0078 x   0079 y   
007a z   007b {   007c |   007d }   007e ~   008f    00a0     00a2 ¢   00a3 £   00a5 ¥   
00a8 ¨   00a9 ©   00ae ®   00af ¯   00b0 °   00b3 ³   00b4 ´   00b6 ¶   00b7 ·   00b9 ¹   
00bf ¿   00c6 Æ   00ca Ê   00d1 Ñ   00d3 Ó   00e6 æ   00ea ê   00f1 ñ   00f3 ó   0104 Ą   
0105 ą   0106 Ć   0107 ć   0118 Ę   0119 ę   0141 Ł   0142 ł   0143 Ń   0144 ń   0152 Œ   
0153 œ   015a Ś   015b ś   0178 Ÿ   0179 Ź   017a ź   017b Ż   017c ż   02c9 ˉ   2010 ‐   
2013 –   2014 —   2018 ‘   2019 ’   201c “   201d ”   2022 •   2026 …   2122 ™   2219 ∙   

#

Greetings!

_________________
[O]bdurate [R]ules [D]estroy [E]nthusiastic [R]ebels => [C]reative [H]umans [A]lways [O]pen [S]ource
Omnia mea mecum porto.
Back to top
View user's profile Send private message 
MochiMoppel


Joined: 26 Jan 2011
Posts: 2084
Location: Japan

PostPosted: Sun 17 May 2020, 11:24    Post subject:  

SFR wrote:
# showing vVBOX on map-event, because if it's visible from the beginning,
# the 'wrap="false"' attribute for <text> causes an ugly, empty space
# in the the bottom part of the window

You can keep it visible and need no map-event when using
Code:
<window resizable="false">


Quote:
I only changed fc-match to fc-query to operate on specific files and it looks like this:

Can't reproduce this. Here the script produces seq errors because the fc-query command returns gibberish:
Code:
# fc-query --format='%{charset}' /usr/share/fonts/default/TTF/DejaVuSans.ttf
  |>^1!|>^1!P0oWQ |>^1!|>^1!|>^1!!!!%#|>^1!|>^1!|>^1!|>^1!|>^1!|>^1!|>^1!|>^1!!!!)$|>^1!|>^1!|>^1!|>^1!|>^1!|>^1!|>^1!!3Vg{!!!.%|>^1!|>^1!xxCV&O3j4U|>[gE|>^0{|>^1!|>^1!!!!1&|>^1!|>^1!|>^1!|>^1!|>^1!|>^1!|>^1!|>^1!!!!4(9WIli|>K!h{ma}P|>^0~!!
What did I miss?
Back to top
View user's profile Send private message 
SFR


Joined: 26 Oct 2011
Posts: 1802

PostPosted: Sun 17 May 2020, 12:23    Post subject:  

MochiMoppel wrote:
You can keep it visible and need no map-event when using
Code:
<window resizable="false">

Yeah, but it's not always desirable to disable resizability.

MochiMoppel wrote:
Can't reproduce this. Here the script produces seq errors because the fc-query command returns gibberish:

Hmm, the output of that command has probably changed over the time.
In my case (Fatdog, fontconfig version 2.13.0) it's:

Code:
# fc-query --format="%{charset}" DejaVuSans.ttf
20-7e a0-2e9 2ec-2ee 2f3 2f7 300-34f 351-353 357-358 35a 35c-362 370-377 37a-37f 384-38a 38c 38e-3a1 3a3-525 531-556 559-55f 561-587 589-58a 5b0-5c3 5c6-5c7 5d0-5ea 5f0-5f4 606-607 609-60a 60c 615 61b 61f 621-63a 640-655 657 65a 660-670 674 679-6bf 6c6-6c8 6cb-6cc 6ce 6d0 6d5 6f0-6f9 7c0-7e7 7eb-7f5 7f8-7fa e3f e81-e82 e84 e87-e88 e8a e8d e94-e97 e99-e9f ea1-ea3 ea5 ea7 eaa-eab ead-eb9 ebb-ebd ec0-ec4 ec6 ec8-ecd ed0-ed9 edc-edd 10a0-10c5 10d0-10fc 1401-1407 1409-141b 141d-1435 1437-144a 144c-1452 1454-14bd 14c0-14ea 14ec-1507 1510-153e 1540-1550 1552-156a 1574-1585 158a-1596 15a0-15af 15de 15e1 1646-1647 166e-1676 1680-169c 1d00-1d14 1d16-1d23 1d26-1d2e 1d30-1d5b 1d5d-1d6a 1d77-1d78 1d7b 1d7d 1d85 1d9b-1dbf 1dc4-1dc9 1e00-1efb 1f00-1f15 1f18-1f1d 1f20-1f45 1f48-1f4d 1f50-1f57 1f59 1f5b 1f5d 1f5f-1f7d 1f80-1fb4 1fb6-1fc4 1fc6-1fd3 1fd6-1fdb 1fdd-1fef 1ff2-1ff4 1ff6-1ffe
[...]

Same in Xenial-7.5 (FC 2.11.94), but in Tahr-6.0.5 (FC 2.11.0) it looks just like yours.

Greetings!

_________________
[O]bdurate [R]ules [D]estroy [E]nthusiastic [R]ebels => [C]reative [H]umans [A]lways [O]pen [S]ource
Omnia mea mecum porto.
Back to top
View user's profile Send private message 
rockedge


Joined: 11 Apr 2012
Posts: 1874
Location: Connecticut, United States

PostPosted: Sun 17 May 2020, 15:45    Post subject:  

Hello SFR!

Yes, you are correct I would like to select the font.tff and preview it in the quick view panel.

Thank you for the handy script, I'll experiment around with it

the code works great.....
Screenshot(45).png
 Description   
 Filesize   159.67 KB
 Viewed   208 Time(s)

Screenshot(45).png

Back to top
View user's profile Send private message Visit poster's website 
MochiMoppel


Joined: 26 Jan 2011
Posts: 2084
Location: Japan

PostPosted: Sun 17 May 2020, 23:29    Post subject:  

@SFR: If I knew how to translate the (unformatted) charset matrix produced by a simple fc-query command
Code:
charset:
0000: 00000000 ffffffff ffffffff 7fffffff 00000000 ffffffff ffffffff ffffffff
0001: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
0002: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff 008873ff
0003: ffffffff ffffffff f58e7fff 7cff0007 ffffd7f0 fffffffb ffffffff ffffffff
[...]
into something human readable like the formatted fc-query output
Code:
20-7e a0-2e9 2ec-2ee 2f3 2f7 300-34f 351-353 357-358 35a 35c-362 370-377 37a-37f 384-38a 38c 38e-3a1 3a3-525 531-556 559-55f 561-587 589-58a 5b0-5c3 5c6-5c7 5d0-5ea 5f0-5f4 606-607 609-60a 60c 615 61b 61f 621-63a 640-655 657 65a 660-670 674 679-6bf 6c6-6c8 6cb-6cc 6ce 6d0 6d5 6f0-6f9 7c0-7e7 7eb-7f5 7f8-7fa e3f e81-e82 e84 e87-e88 e8a e8d e94-e97 e99-e9f ea1-ea3 ea5 ea7
[...]
I could accomplish my goal. Must be possible. I just don't know how. Crying or Very sad

@rockedge: Currently previewing the font in the viewer pane is technically not possible because the pane uses an edit widget which does not support pango formatting. SFR's script uses a text widget. Currently the pane consists of only 2 edit and 2 pixmap widgets, of which only one is visible at any given time. In order to implement your idea I would have to add 2 text widgets (wrapped and unwrapped) and all the extra code and performance overhead that comes with it. Of course running an application/script in an external window is always possible but steals the focus, prevents smooth scrolling and somehow defeats the purpose of MMview. Your idea is not crazy at all and I would love to implement it , but sorry ....
Back to top
View user's profile Send private message 
fredx181


Joined: 11 Dec 2013
Posts: 4481
Location: holland

PostPosted: Thu 21 May 2020, 03:34    Post subject:  

@rockedge, I've put something together for previewing fonts, see here:
http://murga-linux.com/puppy/viewtopic.php?p=1058549#1058549

@SFR Thanks for your code ! I've used it as a start.

Fred

_________________
Dog Linux website
Tinylinux blog by wiak
Back to top
View user's profile Send private message 
rockedge


Joined: 11 Apr 2012
Posts: 1874
Location: Connecticut, United States

PostPosted: Thu 21 May 2020, 10:21    Post subject:  

Thank you Fred!
Back to top
View user's profile Send private message Visit poster's website 
Display posts from previous:   Sort by:   
Page 23 of 24 [348 Posts]   Goto page: Previous 1, 2, 3, ..., 21, 22, 23, 24 Next
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies. View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0775s ][ Queries: 13 (0.0059s) ][ GZIP on ]