Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Thu 18 Dec 2014, 18:36
All times are UTC - 4
 Forum index » House Training » HOWTO ( Solutions )
how to download a complete website
Moderators: Flash, Ian, JohnMurga
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 2 [27 Posts]   Goto page: 1, 2 Next
Author Message
aarf

Joined: 30 Aug 2007
Posts: 3620
Location: around the bend

PostPosted: Sat 11 Jul 2009, 03:16    Post subject:  how to download a complete website
Subject description: with wget
 

copy and paste (use shift+Ins(ert) to paste into a console) this onto a console.
Code:
#wget   -c   --recursive      --no-clobber      --page-requisites      --html-extension      --convert-links      --restrict-file-names=windows      --domains www.farmfountain.com   -P  mnt/home/home/user/      --no-parent          www.farmfountain.com

explanations
Code:
#wget --help

_________________

ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_

<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
Back to top
View user's profile Send private message Visit poster's website 
eztuxer


Joined: 06 Nov 2008
Posts: 485
Location: Belgium

PostPosted: Sat 25 Jul 2009, 17:42    Post subject:  

Thanks !

Interesting website by the way.

_________________
Don't poop it down... Pup it Up !
Back to top
View user's profile Send private message Visit poster's website 
eztuxer


Joined: 06 Nov 2008
Posts: 485
Location: Belgium

PostPosted: Sat 25 Jul 2009, 17:53    Post subject:  

Too bad it only downloads the index page here:

Code:
# wget   -c   --recursive      --no-clobber      --page-requisites      --html-extension      --convert-links      --restrict-file-names=windows      --domains www.pupitup.phpbb3now.com   -P  mnt/home/home/user/      --no-parent          www.pupitup.phpbb3now.com
--23:45:23--  http://www.pupitup.phpbb3now.com/
           => `mnt/home/home/user/www.pupitup.phpbb3now.com/index.html'
Resolving www.pupitup.phpbb3now.com... 174.37.114.54
Connecting to www.pupitup.phpbb3now.com|174.37.114.54|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://pupitup.phpbb3now.com/ [following]
File `mnt/home/home/user/pupitup.phpbb3now.com/index.html' already there; not retrieving.


FINISHED --23:45:23--
Downloaded: 0 bytes in 0 files
Converting mnt/home/home/user/pupitup.phpbb3now.com/index.html... 0-2
Converted 1 files in 0.002 seconds.


What could I do to force total copy of this forum ?

http://pupitup.phpbb3now.com/

_________________
Don't poop it down... Pup it Up !
Back to top
View user's profile Send private message Visit poster's website 
paulh177


Joined: 22 Aug 2006
Posts: 875
Location: ST862228

PostPosted: Sat 25 Jul 2009, 18:07    Post subject:  

wget -m might do it
_________________
Want to report a bug or problem? Have a read of this first ...
Back to top
View user's profile Send private message 
eztuxer


Joined: 06 Nov 2008
Posts: 485
Location: Belgium

PostPosted: Sun 26 Jul 2009, 05:38    Post subject:  

Didn't work either.
The problem is that this host is bugs ridden, or partially locked ? (can't register any new members), this could also affect wget.
Or they purposely configured the server to not respond to wget (by implementing a minimum delay period between pages view ?) to make their "clients" captive of their lousy service.

I've downloaded another forum with the same parameters, and it worked fine.

_________________
Don't poop it down... Pup it Up !
Back to top
View user's profile Send private message Visit poster's website 
gposil


Joined: 06 Apr 2009
Posts: 1305
Location: Stanthorpe (The Granite Belt), QLD, Australia

PostPosted: Sun 26 Jul 2009, 05:46    Post subject:  

Try PMirrorget...its in new Puppy but not on iblio yet... let me know

http://www.gposil.com/pets/PMirrorget-0.1.pet

.Tested on your site and works perfectly...for me

_________________
Dpup Home
Back to top
View user's profile Send private message Visit poster's website MSN Messenger 
BarryK
Puppy Master


Joined: 09 May 2005
Posts: 7099
Location: Perth, Western Australia

PostPosted: Sun 26 Jul 2009, 06:19    Post subject:  

Yes, Gposil's nice little Pmirrorget will be in 4.3beta (or pre-beta) but it is delayed a couple of days. Expect it mid-week.

See announcement re Pmirrorget:

http://puppylinux.com/blog/?viewDetailed=00915

_________________
http://bkhome.org/news/
Back to top
View user's profile Send private message Visit poster's website 
eztuxer


Joined: 06 Nov 2008
Posts: 485
Location: Belgium

PostPosted: Sun 26 Jul 2009, 08:23    Post subject:  

It's working ! Smile
Thank you gposil, I'll just have to FTP it when finished downloading.
Great soft cause HTTrack wasn't working on this site either.
You saved my butt.

And yes, Barry, it is a must have standard soft in the new Puppy, small size, great job.

_________________
Don't poop it down... Pup it Up !
Back to top
View user's profile Send private message Visit poster's website 
eztuxer


Joined: 06 Nov 2008
Posts: 485
Location: Belgium

PostPosted: Sun 26 Jul 2009, 08:36    Post subject:  

OOOPPPSSS !!!

It downloaded the forum OK, but when trying to view it off line it's not working, I guess it's normal cause the pages links haven't been rearranged for off line operation, and presume it should work All Right once uploaded within the forum via ftp.
I'll do that AFTER walking Arobas (my dog), and let you know.

_________________
Don't poop it down... Pup it Up !
Back to top
View user's profile Send private message Visit poster's website 
tlchost

Joined: 05 Aug 2007
Posts: 1741
Location: Baltimore, Maryland USA

PostPosted: Sun 26 Jul 2009, 09:05    Post subject:  

gposil wrote:
Try PMirrorget...its in new Puppy but not on iblio yet


Does it respect the robot.txt file and thus not look in directories protected by robots.txt ?

Will it download files in /cgi-bin and other files such as graphics and css files that are called by the website, but stored on the server other than the site directory itself?

Thanks
Back to top
View user's profile Send private message Visit poster's website 
aarf

Joined: 30 Aug 2007
Posts: 3620
Location: around the bend

PostPosted: Sun 26 Jul 2009, 10:13    Post subject:  

without taking anything away from PMirrorget which i haven't tried and didn't know of previously,
eztuxer wrote:
Too bad it only downloads the index page here:

Code:
# wget   -c   --recursive      --no-clobber      --page-requisites      --html-extension      --convert-links      --restrict-file-names=windows      --domains www.pupitup.phpbb3now.com   -P  mnt/home/home/user/      --no-parent          www.pupitup.phpbb3now.com
--23:45:23--  http://www.pupitup.phpbb3now.com/
           => `mnt/home/home/user/www.pupitup.phpbb3now.com/index.html'
Resolving www.pupitup.phpbb3now.com... 174.37.114.54
Connecting to www.pupitup.phpbb3now.com|174.37.114.54|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://pupitup.phpbb3now.com/ [following]
File `mnt/home/home/user/pupitup.phpbb3now.com/index.html' already there; not retrieving.


FINISHED --23:45:23--
Downloaded: 0 bytes in 0 files
Converting mnt/home/home/user/pupitup.phpbb3now.com/index.html... 0-2
Converted 1 files in 0.002 seconds.


What could I do to force total copy of this forum ?

http://pupitup.phpbb3now.com/


www.pupitup.phpbb3now.com is possibly the problem, pupitup.phpbb3now.com may work

Code:
# wget   -c   --recursive      --no-clobber      --page-requisites      --html-extension      --convert-links      --restrict-file-names=windows      --domains .pupitup.phpbb3now.com   -P  mnt/home/home/user/      --no-parent          pupitup.phpbb3now.com


started it and it seems to go ok then stopped it. seems to be downloading a lot of files with .html extensions (perhaps remove --html-extension) which might make it difficult to upload, unless you can use the batch file name changer in rox. With html extension they will link locally, with/for php extensions they will need a locally installed php enabled server to display properly plus you will also need the original php code for the pages which you will NOT get using non privileged access to any website.

overall though your best bet would be to do a backup of the database from within admin and then upload that backup file to your new hosting site database. perhaps ask the site owner for the backup file if you are not privileged to that resource else it is still a lot of work you are aiming at.

_________________

ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_

<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>

Last edited by aarf on Sun 26 Jul 2009, 11:40; edited 2 times in total
Back to top
View user's profile Send private message Visit poster's website 
aarf

Joined: 30 Aug 2007
Posts: 3620
Location: around the bend

PostPosted: Sun 26 Jul 2009, 10:40    Post subject:  

have edited the wget code to be as it should in my last post.
as far as i know mirroring a site that has a database is NOT possible in any way, shape or form that is reasonably useful via internet global user access.
the only way to mirror a forum successfully is to get the 'backup.sql' from the database on the server then use that to create tables in your new site. it may or may not also be possible to relocate a forum from within phpbb2 admin depending on your privileges.
using any form of over the web copy/mirror of a php forum is a recipe for vast amounts of time wasting effort. however that is your choice.

_________________

ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_

<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
Back to top
View user's profile Send private message Visit poster's website 
aarf

Joined: 30 Aug 2007
Posts: 3620
Location: around the bend

PostPosted: Sun 26 Jul 2009, 11:59    Post subject:  

tlchost wrote:
gposil wrote:
Try PMirrorget...its in new Puppy but not on iblio yet


Does it respect the robot.txt file and thus not look in directories protected by robots.txt ?

Will it download files in /cgi-bin and other files such as graphics and css files that are called by the website, but stored on the server other than the site directory itself?

Thanks

i haven't had a look at PMirrorget but i very much doubt that you will get the source code out of the cgi-bin however a html image of cgi-bin output is possible as are graphics files.
css files are downloaded with/by wget so I cant see that they would be a problem for PMirrorget.
robot.txt is also downloaded first but how or if it is used in any way i dont know.

wget cannot access htaccess locked directories without having the username/password. Whether robot.txt is able to give that sort of denial protection i dont know and haven't tested it.

_________________

ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_

<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
Back to top
View user's profile Send private message Visit poster's website 
eztuxer


Joined: 06 Nov 2008
Posts: 485
Location: Belgium

PostPosted: Sun 26 Jul 2009, 13:40    Post subject:  

After a 4 miles walk and over an hour of Taï Chi Chuan, I feel refreshed and I'm back in the starting blocks.
As I was fearing, this is no simple challenge and it goes way beyond my knowledge in Linux and Puppy.
I've managed a few forums as admin, but this is my first time for domain based phpBB3.

Here's the URL with me logged in as admin:

http://pupitup.phpbb3now.com/adm/index.php?sid=ad157ef79760fa499eecfe13da175d27&i=bots&mode=bots

I don't know if this might help.
I do not have access to the data base, and the folks running phpBB3now are not reachable:

http://forum.phpbb3now.com/
Can't register either:
http://forum.phpbb3now.com/ucp.php?mode=register&sid=e945b6e37cab250141aa2c85bb3e45ee

One thing I could do is add a new bot to let robot.txt roam trough freely, maybe.

If it would be possible to convert (if necessary) links so that it could work @ http://pupitup.org/forum/phpBB3/ that would be a breeze.
If not,I'll jusst copy/paste all forum sections manually, and would eventually forget about the existing posts.

_________________
Don't poop it down... Pup it Up !
Back to top
View user's profile Send private message Visit poster's website 
gposil


Joined: 06 Apr 2009
Posts: 1305
Location: Stanthorpe (The Granite Belt), QLD, Australia

PostPosted: Sun 26 Jul 2009, 19:39    Post subject:  

PMirrorget will download all normal site files, html, css, txt and graphics pointed to by site files, it will not download sql database material...
_________________
Dpup Home
Back to top
View user's profile Send private message Visit poster's website MSN Messenger 
Display posts from previous:   Sort by:   
Page 1 of 2 [27 Posts]   Goto page: 1, 2 Next
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » House Training » HOWTO ( Solutions )
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0970s ][ Queries: 12 (0.0079s) ][ GZIP on ]