Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Sun 21 Oct 2018, 08:37
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
A bash script to convert .xml files to .txt
Post new topic   Reply to topic View previous topic :: View next topic
Page 3 of 3 [36 Posts]   Goto page: Previous 1, 2, 3
Author Message
musher0

Joined: 04 Jan 2009
Posts: 12832
Location: Gatineau (Qc), Canada

PostPosted: Sat 17 Feb 2018, 00:28    Post subject:
Subject description: Reducing the number of line spacings, NOT removing all of them.
 

Hello all.

Converting xml to txt, I have often met head on with the problem of too many --
or too few -- line spacings between the paragraphs and subtitles.

I have come up with a Bash solution to REDUCE the number of line spacings
in a text document. All other tips on the Internet (really; this is no exaggeration!)
being about completely removing them, there was a need for this "middle-of-
the-road" approach.

If you completely remove line spacings, it is as bad as when there are too many
of them: the reader has more trouble focusing on the content, because the difference
between foreground and background is either closer to nil -- with no line spacings --,
or too great -- when there are too many. The reader has to do a mental correction
as (s)he reads, and that gets in the way of faster and better understanding.

I have written a short article which expands on the above ideas, on the French side
of the forum
. Feel free to use the DeepL Translator on that post. I am available
if there remains any confusion about some sentences; just ask.

Enjoy. BFN.

_________________
musher0
~~~~~~~~~~
Fidèle elle commença, ainsi elle restera. (Prov. canadien) /
Faithful she began, so will she stay. (Canadian prov.)
Back to top
View user's profile Send private message 
MochiMoppel


Joined: 26 Jan 2011
Posts: 1663
Location: Japan

PostPosted: Sat 17 Feb 2018, 02:16    Post subject:
Subject description: Reducing the number of line spacings, NOT removing all of them.
 

musher0 wrote:
All other tips on the Internet (really; this is no exaggeration!)
being about completely removing them,
???
Look closer. One of the simplest ways is
Code:
cat -s filename

I normally use sed but I know that you don't like sed.

Your code seems to destroy content.
Input file, containing 12 lines:
Code:



4
5

7
8


11
12

Output file contains 6 lines:
Code:


5
7




BTW: Your bash code posted here also swallows content. The sample output skips some listitems ("Desktop" etc.) present in the original XML document.
Back to top
View user's profile Send private message 
musher0

Joined: 04 Jan 2009
Posts: 12832
Location: Gatineau (Qc), Canada

PostPosted: Sat 17 Feb 2018, 07:30    Post subject:  

Hi MochiMoppei.

I have tried < cat -s textfile >. It does a fair job, but it does not care if the text has
2 or 3 line spacings in a row, it condenses all of them into one line spacing. It is
ok, but in editing texts, the number of line spacings in a row means something.

Usually:
One line space between paragraphs;
Two line spaces between sections;
Three line spaces between chapters (or more major sections).

My script tries to respect that custom, whereas < cat -s > does not.

I have attached illustrations of the original txt, of the cat -s version, and of the
version produced by my script, so people can better grasp the concept. Only the
ending of the text is illustrated, but IMO it is telling enough about the line spacings,
AND about the line count. The sources for those texts are also attached.

~~~~~~~~~~~~~~~~

As to the missing content, thanks for noticing. I think I said it before, the result of
this type of tool always needs to be compared with the original by a human editor.
And corrections brought to final draft by said human when necessary.

~~~~~~~~~~~~~~~~

That said, this forum needs more good critics like yourself. (I'm serious!) Thanks
for bringing this to my attention. I'll see what I can do to solve the content
problem from within the script.

BFN.
sourcesadvanced.PA.zip
Description 
zip

 Download 
Filename  sourcesadvanced.PA.zip 
Filesize  7.26 KB 
Downloaded  26 Time(s) 
line-spacings-from-my-script.jpg
 Description   
 Filesize   99.09 KB
 Viewed   104 Time(s)

line-spacings-from-my-script.jpg

line-spacings-from-cat-s.jpg
 Description   
 Filesize   104.03 KB
 Viewed   108 Time(s)

line-spacings-from-cat-s.jpg

line-spacings-in-original.jpg
 Description   
 Filesize   69.2 KB
 Viewed   100 Time(s)

line-spacings-in-original.jpg


_________________
musher0
~~~~~~~~~~
Fidèle elle commença, ainsi elle restera. (Prov. canadien) /
Faithful she began, so will she stay. (Canadian prov.)
Back to top
View user's profile Send private message 
musher0

Joined: 04 Jan 2009
Posts: 12832
Location: Gatineau (Qc), Canada

PostPosted: Mon 19 Feb 2018, 18:42    Post subject:  

Hello all.

Here is a nice resource on how to lay out a document in text format:, with the goal of
enhancing content accessibility:
https://www.w3.org/TR/WCAG-TECHS/text.html

You would follow the above guidelines once the xml to txt conversion is finished, and
you want to prettify and / or standardize your results.

BFN.

_________________
musher0
~~~~~~~~~~
Fidèle elle commença, ainsi elle restera. (Prov. canadien) /
Faithful she began, so will she stay. (Canadian prov.)
Back to top
View user's profile Send private message 
puppy_apprentice


Joined: 07 Feb 2012
Posts: 194

PostPosted: Tue 06 Mar 2018, 17:01    Post subject:  

Mushero, in my Slacko 5.7 (didn't check other PUPs) i have found docbook.css - a stylesheet for docbook's xmls like those PekWM xmls.

Code:
/usr/share/examples/xml/


Like with my css you should add those lines to beginning of every xml file in docbook format:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/usr/share/examples/xml/docbook.css"?>
xml-view.jpg
 Description   Without colors and other bells and whistles like in my css but for quick reading is good enough.
 Filesize   55.12 KB
 Viewed   59 Time(s)

xml-view.jpg

Back to top
View user's profile Send private message 
musher0

Joined: 04 Jan 2009
Posts: 12832
Location: Gatineau (Qc), Canada

PostPosted: Tue 06 Mar 2018, 23:18    Post subject:  

Thanks, puppy_apprentice.
_________________
musher0
~~~~~~~~~~
Fidèle elle commença, ainsi elle restera. (Prov. canadien) /
Faithful she began, so will she stay. (Canadian prov.)
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 3 of 3 [36 Posts]   Goto page: Previous 1, 2, 3
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.3407s ][ Queries: 12 (0.0249s) ][ GZIP on ]