how to download a complete website

How to do things, solutions, recipes, tutorials
Message
Author
aarf

how to download a complete website

#1 Post by aarf »

copy and paste (use shift+Ins(ert) to paste into a console) this onto a console.

Code: Select all

#wget   -c   --recursive      --no-clobber      --page-requisites      --html-extension      --convert-links      --restrict-file-names=windows      --domains www.farmfountain.com   -P  mnt/home/home/user/      --no-parent          www.farmfountain.com
explanations

Code: Select all

#wget --help

User avatar
eztuxer
Posts: 494
Joined: Thu 06 Nov 2008, 11:19
Location: Hungary
Contact:

#2 Post by eztuxer »

Thanks !

Interesting website by the way.
Don't poop it down... Pup it Up !

User avatar
eztuxer
Posts: 494
Joined: Thu 06 Nov 2008, 11:19
Location: Hungary
Contact:

#3 Post by eztuxer »

Too bad it only downloads the index page here:

Code: Select all

# wget   -c   --recursive      --no-clobber      --page-requisites      --html-extension      --convert-links      --restrict-file-names=windows      --domains www.pupitup.phpbb3now.com   -P  mnt/home/home/user/      --no-parent          www.pupitup.phpbb3now.com
--23:45:23--  http://www.pupitup.phpbb3now.com/
           => `mnt/home/home/user/www.pupitup.phpbb3now.com/index.html'
Resolving www.pupitup.phpbb3now.com... 174.37.114.54
Connecting to www.pupitup.phpbb3now.com|174.37.114.54|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://pupitup.phpbb3now.com/ [following]
File `mnt/home/home/user/pupitup.phpbb3now.com/index.html' already there; not retrieving.


FINISHED --23:45:23--
Downloaded: 0 bytes in 0 files
Converting mnt/home/home/user/pupitup.phpbb3now.com/index.html... 0-2
Converted 1 files in 0.002 seconds.
What could I do to force total copy of this forum ?

http://pupitup.phpbb3now.com/
Don't poop it down... Pup it Up !

User avatar
paulh177
Posts: 975
Joined: Tue 22 Aug 2006, 20:41

#4 Post by paulh177 »

wget -m might do it

User avatar
eztuxer
Posts: 494
Joined: Thu 06 Nov 2008, 11:19
Location: Hungary
Contact:

#5 Post by eztuxer »

Didn't work either.
The problem is that this host is bugs ridden, or partially locked ? (can't register any new members), this could also affect wget.
Or they purposely configured the server to not respond to wget (by implementing a minimum delay period between pages view ?) to make their "clients" captive of their lousy service.

I've downloaded another forum with the same parameters, and it worked fine.
Don't poop it down... Pup it Up !

User avatar
gposil
Posts: 1300
Joined: Mon 06 Apr 2009, 10:00
Location: Stanthorpe (The Granite Belt), QLD, Australia
Contact:

#6 Post by gposil »

Try PMirrorget...its in new Puppy but not on iblio yet... let me know

http://www.gposil.com/pets/PMirrorget-0.1.pet

.Tested on your site and works perfectly...for me
[img]http://gposil.netne.net/images/tlp80.gif[/img] [url=http://www.dpup.org][b]Dpup Home[/b][/url]

User avatar
BarryK
Puppy Master
Posts: 9392
Joined: Mon 09 May 2005, 09:23
Location: Perth, Western Australia
Contact:

#7 Post by BarryK »

Yes, Gposil's nice little Pmirrorget will be in 4.3beta (or pre-beta) but it is delayed a couple of days. Expect it mid-week.

See announcement re Pmirrorget:

http://puppylinux.com/blog/?viewDetailed=00915
[url]https://bkhome.org/news/[/url]

User avatar
eztuxer
Posts: 494
Joined: Thu 06 Nov 2008, 11:19
Location: Hungary
Contact:

#8 Post by eztuxer »

It's working ! :)
Thank you gposil, I'll just have to FTP it when finished downloading.
Great soft cause HTTrack wasn't working on this site either.
You saved my butt.

And yes, Barry, it is a must have standard soft in the new Puppy, small size, great job.
Don't poop it down... Pup it Up !

User avatar
eztuxer
Posts: 494
Joined: Thu 06 Nov 2008, 11:19
Location: Hungary
Contact:

#9 Post by eztuxer »

OOOPPPSSS !!!

It downloaded the forum OK, but when trying to view it off line it's not working, I guess it's normal cause the pages links haven't been rearranged for off line operation, and presume it should work All Right once uploaded within the forum via ftp.
I'll do that AFTER walking Arobas (my dog), and let you know.
Don't poop it down... Pup it Up !

tlchost
Posts: 2057
Joined: Sun 05 Aug 2007, 23:26
Location: Baltimore, Maryland USA
Contact:

#10 Post by tlchost »

gposil wrote:Try PMirrorget...its in new Puppy but not on iblio yet
Does it respect the robot.txt file and thus not look in directories protected by robots.txt ?

Will it download files in /cgi-bin and other files such as graphics and css files that are called by the website, but stored on the server other than the site directory itself?

Thanks

aarf

#11 Post by aarf »

without taking anything away from PMirrorget which i haven't tried and didn't know of previously,
eztuxer wrote:Too bad it only downloads the index page here:

Code: Select all

# wget   -c   --recursive      --no-clobber      --page-requisites      --html-extension      --convert-links      --restrict-file-names=windows      --domains www.pupitup.phpbb3now.com   -P  mnt/home/home/user/      --no-parent          www.pupitup.phpbb3now.com
--23:45:23--  http://www.pupitup.phpbb3now.com/
           => `mnt/home/home/user/www.pupitup.phpbb3now.com/index.html'
Resolving www.pupitup.phpbb3now.com... 174.37.114.54
Connecting to www.pupitup.phpbb3now.com|174.37.114.54|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://pupitup.phpbb3now.com/ [following]
File `mnt/home/home/user/pupitup.phpbb3now.com/index.html' already there; not retrieving.


FINISHED --23:45:23--
Downloaded: 0 bytes in 0 files
Converting mnt/home/home/user/pupitup.phpbb3now.com/index.html... 0-2
Converted 1 files in 0.002 seconds.
What could I do to force total copy of this forum ?

http://pupitup.phpbb3now.com/
www.pupitup.phpbb3now.com is possibly the problem, pupitup.phpbb3now.com may work

Code: Select all

# wget   -c   --recursive      --no-clobber      --page-requisites      --html-extension      --convert-links      --restrict-file-names=windows      --domains .pupitup.phpbb3now.com   -P  mnt/home/home/user/      --no-parent          pupitup.phpbb3now.com
started it and it seems to go ok then stopped it. seems to be downloading a lot of files with .html extensions (perhaps remove --html-extension) which might make it difficult to upload, unless you can use the batch file name changer in rox. With html extension they will link locally, with/for php extensions they will need a locally installed php enabled server to display properly plus you will also need the original php code for the pages which you will NOT get using non privileged access to any website.

overall though your best bet would be to do a backup of the database from within admin and then upload that backup file to your new hosting site database. perhaps ask the site owner for the backup file if you are not privileged to that resource else it is still a lot of work you are aiming at.
Last edited by aarf on Sun 26 Jul 2009, 15:40, edited 2 times in total.

aarf

#12 Post by aarf »

have edited the wget code to be as it should in my last post.
as far as i know mirroring a site that has a database is NOT possible in any way, shape or form that is reasonably useful via internet global user access.
the only way to mirror a forum successfully is to get the 'backup.sql' from the database on the server then use that to create tables in your new site. it may or may not also be possible to relocate a forum from within phpbb2 admin depending on your privileges.
using any form of over the web copy/mirror of a php forum is a recipe for vast amounts of time wasting effort. however that is your choice.

aarf

#13 Post by aarf »

tlchost wrote:
gposil wrote:Try PMirrorget...its in new Puppy but not on iblio yet
Does it respect the robot.txt file and thus not look in directories protected by robots.txt ?

Will it download files in /cgi-bin and other files such as graphics and css files that are called by the website, but stored on the server other than the site directory itself?

Thanks
i haven't had a look at PMirrorget but i very much doubt that you will get the source code out of the cgi-bin however a html image of cgi-bin output is possible as are graphics files.
css files are downloaded with/by wget so I cant see that they would be a problem for PMirrorget.
robot.txt is also downloaded first but how or if it is used in any way i dont know.

wget cannot access htaccess locked directories without having the username/password. Whether robot.txt is able to give that sort of denial protection i dont know and haven't tested it.

User avatar
eztuxer
Posts: 494
Joined: Thu 06 Nov 2008, 11:19
Location: Hungary
Contact:

#14 Post by eztuxer »

After a 4 miles walk and over an hour of Taï Chi Chuan, I feel refreshed and I'm back in the starting blocks.
As I was fearing, this is no simple challenge and it goes way beyond my knowledge in Linux and Puppy.
I've managed a few forums as admin, but this is my first time for domain based phpBB3.

Here's the URL with me logged in as admin:

http://pupitup.phpbb3now.com/adm/index. ... &mode=bots

I don't know if this might help.
I do not have access to the data base, and the folks running phpBB3now are not reachable:

http://forum.phpbb3now.com/
Can't register either:
http://forum.phpbb3now.com/ucp.php?mode ... 85bb3e45ee

One thing I could do is add a new bot to let robot.txt roam trough freely, maybe.

If it would be possible to convert (if necessary) links so that it could work @ http://pupitup.org/forum/phpBB3/ that would be a breeze.
If not,I'll jusst copy/paste all forum sections manually, and would eventually forget about the existing posts.
Don't poop it down... Pup it Up !

User avatar
gposil
Posts: 1300
Joined: Mon 06 Apr 2009, 10:00
Location: Stanthorpe (The Granite Belt), QLD, Australia
Contact:

#15 Post by gposil »

PMirrorget will download all normal site files, html, css, txt and graphics pointed to by site files, it will not download sql database material...
[img]http://gposil.netne.net/images/tlp80.gif[/img] [url=http://www.dpup.org][b]Dpup Home[/b][/url]

aragon
Posts: 1698
Joined: Mon 15 Oct 2007, 12:18
Location: Germany

#16 Post by aragon »

@eztuxer

you might want to try httrack/ghhtrack.

http://www.murga-linux.com/puppy/viewto ... 70&t=38413

i've tested with the windows-version (winhttrack) and it seems to do what you need.

good luck
aragon

aarf

#17 Post by aarf »

eztuxer wrote: As I was fearing, this is no simple challenge and it goes way beyond my knowledge in Linux and Puppy.
I've managed a few forums as admin, but this is my first time for domain based phpBB3.
the skill level required to install a forum yourself on your own domain and then move an existing forum to it is not high and is not too difficult IF you have the backup.sql file from the old forum.
to do the install an transfer requires no detailed knowledge of php or mysql databases or linux.
further, the knowledge you have of puppy already should enable you to install a functioning forum locally on your own computer for you to play with and test. installing the required apache/or hiawatha server, mysql database and php on your own computer is now a somewhat simple affair.
if you want to proceed search the puppy forum for LAMPP or hiawatha for install the server information. you will also need the forum scripts from phpBB forum. all are freely available for download.

User avatar
eztuxer
Posts: 494
Joined: Thu 06 Nov 2008, 11:19
Location: Hungary
Contact:

#18 Post by eztuxer »

@ aragon: GHTTrack cant download it.

@ aarf: It wasn't too difficult to install PHPBB3 on the domain, with Mysql it's just that it was a first time job.

I've copy/pasted all forums manually and start doing the same with some of the posts.
It takes time but it will be done.
Don't poop it down... Pup it Up !

User avatar
ttuuxxx
Posts: 11171
Joined: Sat 05 May 2007, 10:00
Location: Ontario Canada,Sydney Australia
Contact:

#19 Post by ttuuxxx »

I liked Pmirrorget so much I included it in 2.14x, Its small and real easy to use :) http://www.murga-linux.com/puppy/viewto ... f5179b7c41
ttuuxxx
http://audio.online-convert.com/ <-- excellent site
http://samples.mplayerhq.hu/A-codecs/ <-- Codec Test Files
http://html5games.com/ <-- excellent HTML5 games :)

PupGeek
Posts: 353
Joined: Sun 06 Sep 2009, 11:30

#20 Post by PupGeek »

yeah the wget -rx method is pretty useful..... but gotta watch out for all the junk too.

Post Reply