thread_saver

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Message
Author
big_bass
Posts: 1740
Joined: Mon 13 Aug 2007, 12:21

thread_saver

#1 Post by big_bass »

thinking about the forum down time and its good to backup stuff before problems occur

*Mu and seaside had the idea for this and seaside used gtkdialog *
http://www.murga-linux.com/puppy/viewto ... 1975402649

I wanted to do the same thing but I prefer Xdialog

and I wanted to test the three input option so I rewrote the GUI part in Xdialog

Major update
Updated and modified a lot to be easier for big threads
it makes a folder dates it , renames and renumbers the files
so that its easier on the browser to scroll quickly

12-29-2011
added long names and a filter to clean poorly formatted names

with the end goal from here files could be read and edited quickly
you can filter out the unneeded posts

Code: Select all

#!/bin/sh

# thread_saver big_bass completely rewritten to the basics and using Xdialog  
# 12-29-2011
# added date to folder rewritten the download part and the naming 
# of the files and numbering  

# original idea was based on
# ThreadGet Seaside 11-24-2010 (Based on Mu's Fetchforum basic program)
# For use on  phpBB forums
# update 3-29-2011 --add startpage, append to existing files


#------------------------------------------------
SEL=`Xdialog \
        --title "thread_saver" \
        --separator "\n" --stdout \
        --3inputsbox  "thread_saver" 0 0 \
            "URL (dowload this link)" "$1" \
            "end page number" "$2" \
            "name the html file" "$3"`

# lets get the three values in 3 separate  arrays 
SEL_ARRARY=($SEL)

THREAD=${SEL_ARRARY[0]}
NPAGE=${SEL_ARRARY[1]}
NAME=${SEL_ARRARY[@]:2:20}

# NAME get long names start at the third string 
# and count 20 words as the max length

# rename badly formatted files that have spaces and symbols
NAME_FIXED=`echo  "$NAME" | tr  ';"<>,+!@#$?%^*&(){}[]' ' ' | tr -s ' ' '_*'`
#------------------------------------------------

add_date=`date "+%m"-"%d"-"%y"`
URL_NAME=`basename $THREAD`
mkdir -p /root/Forum_Threads/${NAME_FIXED}"-folder-"${add_date}
cd /root/Forum_Threads/${NAME_FIXED}"-folder-"${add_date}


# zero start count fix 
NPAGE=($NPAGE-1)
let ALL_POSTS=15*$NPAGE

# simplified the renaming renumbering code a lot  big_bass
for i in $(seq 0 15 $ALL_POSTS); do
  
   let N_ADJ=($i/15)
  
   wget -N "$THREAD&start=$i"
   mv "$URL_NAME&start=$i" "${NAME_FIXED}""_""$N_ADJ"
   echo "$URL_NAME&start=$i" "${NAME_FIXED}""_""$N_ADJ"
done



Xdialog --title "done" \
       --msgbox "Thread downloaded to /root/Forum_Threads " 0 0 3000 



Attachments
thread_saver.png
(12.73 KiB) Downloaded 1062 times
Last edited by big_bass on Thu 29 Dec 2011, 15:45, edited 12 times in total.

User avatar
puppyluvr
Posts: 3470
Joined: Sun 06 Jan 2008, 23:14
Location: Chickasha Oklahoma
Contact:

#2 Post by puppyluvr »

:D Hello,
Thanks.... 8)
I was thinking along the same lines, but never got that far...
Sure beats 1 pg @ a time... :roll:

Works great, and all on 1 page...
Nice, and useful..

All right now everyone Backup your threads... :wink:

@Edit..
Works on puppylinux.info too!!
Close the Windows, and open your eyes, to a whole new world
I am Lead Dog of the
Puppy Linux Users Group on Facebook
Join us!

Puppy since 2.15CE...

jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

Re: thread_saver

#3 Post by jpeps »

big_bass wrote:

Code: Select all

# lets get the three values in 3 separate  arrays 
cat /tmp/downloader-info | tr '|' '\n' >/tmp/downloader-info2 
select_array=(`cat /tmp/downloader-info2`)
echo ${select_array[0]}
echo ${select_array[1]}
echo ${select_array[2]}

THREAD=${select_array[0]}
NPAGE=${select_array[1]}
NAME=${select_array[2]}
#------------------------------------------------

another way:

Code: Select all

var="$(cat /tmp/downloader-info)"
IFS="|"
set -- $var
THREAD="$1"
NPAGE="$2"
NAME="$3"

big_bass
Posts: 1740
Joined: Mon 13 Aug 2007, 12:21

#4 Post by big_bass »

I went the longer way to avoid using

Code: Select all

IFS="|"
set -- 
because you have to unset the IFS
for any code that follows
because the pipe is frequently used command

I changed the --separator "|" already in Xdialog only
*and the URL has quite a few "/// " to filter
since the default in Xdialog uses those too

the main point expressed was how to use the three inputs in Xdialog
it reduced all that gtkdialog code to just a few lines

Joe

jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#5 Post by jpeps »

big_bass wrote:
because you have to unset the IFS
for any code that follows
because the pipe is frequently used command
perhaps setting the --separator to " " ? Just an alternative...works fine the way you have it :)

Code: Select all

var="$(cat /tmp/downloader-info)"
 set -- $var
THREAD="$1"
NPAGE="$2"
NAME="$3"

big_bass
Posts: 1740
Joined: Mon 13 Aug 2007, 12:21

#6 Post by big_bass »

Major update
Updated and modified a lot to be easier for big threads
it makes a folder and renames and renumbers the files
so that its easier on the browser to scroll quickly

with the end goal from here files could be read and edited quickly
you can filter out the unneeded posts

Joe

jpeps
perhaps setting the --separator to " " ? Just an alternative...works fine the way you have it
I will look at that part again thanks still an alfa version *I want to combine some other html tools I wrote with this

Code: Select all

#!/bin/sh 

# thread_saver big_bass re written the GUI to use Xdialog to test the three input option
# 12-19-2011

# ThreadGet Seaside 11-24-2010 (Based on Mu's Fetchforum basic program)
# For use on  phpBB forums
# update 3-29-2011 --add startpage, append to existing files


#------------------------------------------------
Xdialog --separator "|" --3inputsbox  "Big Thread downloader" 0 0 "URL (dowload this link)" "$1"  "end page number" "$2" "name the html file" "$3" 2> /tmp/downloader-info

# lets get the three values in 3 separate  arrays 
cat /tmp/downloader-info | tr '|' '\n' >/tmp/downloader-info2 
select_array=(`cat /tmp/downloader-info2`)
echo ${select_array[0]}
echo ${select_array[1]}
echo ${select_array[2]}

THREAD=${select_array[0]}
NPAGE=${select_array[1]}
NAME=${select_array[2]}
#------------------------------------------------

n=$((NPAGE*15-15)) 
mkdir -p /root/Forum_Threads/
mkdir -p /tmp/Forum_Threads
cd /tmp/Forum_Threads
if [[ $SP =~ ^[0-9]+$ ]]; then
SP=$((SP*15-15))
else
SP=0
fi

# simplified big_bass
for i in $(seq $SP 15 $n) ; do 
wget -O $(printf "%04d" $i) -c "$THREAD&start=$i"
done


# file renumber and rename
cd /tmp/Forum_Threads
START_NUMBER=1

NUM=0
ls -1 | sort -n >/tmp/list_forum_pages.txt
for i in `cat /tmp/list_forum_pages.txt`
do

    echo "renumber file --> $NUM"
    mv /tmp/Forum_Threads/$i /tmp/Forum_Threads/$NAME"_"$START_NUMBER.htm
    let NUM=$NUM+1
    let START_NUMBER=$START_NUMBER+1
done

mkdir -p /root/Forum_Threads/$NAME

mv $NAME* /root/Forum_Threads/$NAME

rm -r  /tmp/Forum_Threads/
rm -f  /tmp/downloader-info
rm -f  /tmp/downloader-info2
rm -f  /tmp/list_forum_pages.txt

Xdialog --title "done" \
	    --msgbox "Thread downloaded to /root/Forum_Threads " 0 0 3000

aarf

#7 Post by aarf »

i you can name the downloaded-page-thread by pulling the title from the <title>title</title>
then it can be automated to get many threads.
i have been waiting for years for a gui that produce the necessary code to do matching. my short term memory quickly forgets the stanyx and i have to start from scratch if i want to do this matching stuff. sorry.

big_bass
Posts: 1740
Joined: Mon 13 Aug 2007, 12:21

#8 Post by big_bass »

aarf
its doable but ... one example here you see all the spaces in the names it can be done it just needs some
adjustments some conditioning to be a "correct" file name ... hey no problem that's easy to do
<title>Puppy Linux Discussion Forum :: View topic - Classic Pup 2.14X -- Updated 2 series</title>


Joe

aarf

#9 Post by aarf »

big_bass wrote:aarf
its doable but ... one example here you see all the spaces in the names it can be done it just needs some
adjustments some conditioning to be a "correct" file name ... hey no problem that's easy to do
<title>Puppy Linux Discussion Forum :: View topic - Classic Pup 2.14X -- Updated 2 series</title>


Joe
ok i think that as well as the relevant title bits, a date from the first post could also feature in the name that will eliminate duplicate names and make it easier to reference. go further and also add the date of the last post and it will be easier to do new backups. something like
1.nov.2010 classic pup 2.14X -- Updated 2 series 20.nov.2011.htm. or whatever date format is easy to search or order in time. possibly also include the original thread number in the name also for future backup reference.

(i'll pull my request for ,mht image containing files for now, till it progresses further. :wink: )

seaside
Posts: 934
Joined: Thu 12 Apr 2007, 00:19

#10 Post by seaside »

big_bass,

Nice work. (It just shows what can happen when someone who knows what they're doing gets a hold on things) :)

Also the file i/o could be eliminated-

Code: Select all

SEL=`Xdialog --separator "|" --stdout --3inputsbox  "Thread downloader" 0 0 "URL (dowload this link)" "$1"  "end page number" "$2" "name the html file" "$3"`

THREAD=`echo "$SEL" | cut -f1 -d'|'`
NPAGE=`echo "$SEL" | cut -f2 -d'|'`
NAME=`echo "$SEL" | cut -f3 -d'|'`
For the life of me, I can't remember why I thought wget had to be run in a terminal for this to work :)

Regards,
s

aarf

#11 Post by aarf »

probably would be a good idea to pop over to phpbb devs forum and check to see we're not re-inventing the wheel.

big_bass
Posts: 1740
Joined: Mon 13 Aug 2007, 12:21

#12 Post by big_bass »

Hey seaside

you did a great job
and I also took your suggestion about the
code snippet you posted today and used it thanks
I got a little 'array happy' in that part
*its a habit to for me to use extra output files when testing
so I can debug stuff quickly Xdialog either works or it doesnt
no good error messages but its mostly easy


@hey jpeps you had a good code snippet too and worked
but I went with seasides

@aarf I have to do some heavy testing with the auto naming part before I add it
some people even included slashes ,back slashes , spaces and other symbols in the file names that doesnt play nicely with making directories

updated main post with seasides suggested
shortened code snippet

Joe

aarf

#13 Post by aarf »

big_bass wrote:
@aarf I have to do some heavy testing with the auto naming part before I add it
some people even included slashes ,back slashes , spaces and other symbols in the file names that doesnt play nicely with making directories



Joe
there has got to be a ready made code snippet that does the job, just a matter of knowing where to find it
perhaps autoname them to their thread number for now. still useful and unique and google search can still be used to find content.

jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#14 Post by jpeps »

big_bass wrote:
@hey jpeps you had a good code snippet too and worked
but I went with seasides
I agree....so

Code: Select all

SEL=`Xdialog --separator " " --stdout --3inputsbox  "Thread downloader" 0 0 "URL (download this link)" "$1"  "end page number" "$2" "name the html file" "$3"`


set -- $SEL

THREAD="$1"
NPAGE="$2"
NAME="$3"

aarf

#15 Post by aarf »

aarf wrote:
big_bass wrote:
@aarf I have to do some heavy testing with the auto naming part before I add it
some people even included slashes ,back slashes , spaces and other symbols in the file names that doesnt play nicely with making directories



Joe
there has got to be a ready made code snippet that does the job, just a matter of knowing where to find it
perhaps autoname them to their thread number for now. still useful and unique and google search can still be used to find content.
naming to their thread number wont need any matching at all. it will be simple to just replace their name in the code by the number of the step variable. should be ready to start already. (but am not in thinking mode at present. :lol: ) will need a test for empty downloands so page number wouldnt be needed, or will need to match and thus get the number of pages number from the first page.

big_bass
Posts: 1740
Joined: Mon 13 Aug 2007, 12:21

#16 Post by big_bass »

I was trying to use an array instead and figured out how to get it
by replacing the --separator "\n" \ with a new line :D

I cleaned up the code here a bit too
in the actual code I dont use echo but its here only for a working snippet

thanks for all the suggestions
all working code is good code I just wanted to use arrays
*because later it will be easier to translate this to BaCon code


*args $1,$,2,$3 are sometimes arguments from the command line
so I dont like to force 'set' those


Code: Select all

SEL=`Xdialog \
        --title "thread_saver" \
        --separator "\n" \
        --stdout \
        --3inputsbox  "thread_saver" 0 0 \
            "URL (dowload this link)" "$1" \
            "end page number" "$2" \
            "name the html file" "$3"`

# lets get the three values in 3 separate  arrays 
SEL_ARRARY=($SEL)

THREAD=${SEL_ARRARY[0]}
NPAGE=${SEL_ARRARY[1]}
NAME=${SEL_ARRARY[2]}

echo $THREAD
echo $NPAGE
echo $NAME

jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#17 Post by jpeps »

looks good!

BTW/ that will work with --separator " " as well.

User avatar
puppyluvr
Posts: 3470
Joined: Sun 06 Jan 2008, 23:14
Location: Chickasha Oklahoma
Contact:

#18 Post by puppyluvr »

:D Hello,
Wow, talk about clean..
IDK how (but I will) that works, but it does..

SEL=`Xdialog \

allows you to chain the commands so cleanly..
Why wasnt I informed of this... :wink:

Why are the arrays # from 0 but as variables, saved from 1.??
Close the Windows, and open your eyes, to a whole new world
I am Lead Dog of the
Puppy Linux Users Group on Facebook
Join us!

Puppy since 2.15CE...

User avatar
vovchik
Posts: 1507
Joined: Tue 24 Oct 2006, 00:02
Location: Ukraine

#19 Post by vovchik »

Dear Joe,

Thanks from me and many of us here.

With kind regards,
vovchik

seaside
Posts: 934
Joined: Thu 12 Apr 2007, 00:19

#20 Post by seaside »

Even more.....

Code: Select all

SEL=`Xdialog \
        --title "thread_saver" \
        --separator "\n" \ #or " "
        --stdout \
        --3inputsbox  "thread_saver" 0 0 \
            "URL (dowload this link)" "$THREAD" \
            "end page number" "$NPAGE" \
            "name the html file" "$NAME"` 
That won't set the vars, but will avoid the $1, $2,$3 ( actually, you could use anything $A,$B,$C....etc..)

Another way is to set a default text for each item which is helpful to know the exact syntax to enter (a sort of tooltip) and it also highlights each item.

Code: Select all

SEL=`Xdialog \
        --title "thread_saver" \
        --separator " " \
        --stdout \
        --3inputsbox  "thread_saver" 0 0 \
            "URL (dowload this link)" "http://murga-linux.com/puppy/viewtopic.php?t=74404" \
            "end page number" "5" \
            "name the html file" "Xdialog-tips"` 
I guess we've worked this one over pretty well :)



Cheers,
s

Post Reply