Use shell functions to fetch information online

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Post Reply
Message
Author
User avatar
Flash
Official Dog Handler
Posts: 13071
Joined: Wed 04 May 2005, 16:04
Location: Arizona USA

Use shell functions to fetch information online

#1 Post by Flash »

I don't know if this will be useful or even belongs in the Programming section. I just saw it and thought it looked like it might.
By Marco Fioretti
January 2, 2012, 9:00 AM PST

Takeaway: Marco Fioretti shows two examples of shell functions that you can use for web scraping when all you need is a quick way to extract text from a given website.

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#2 Post by technosaurus »

If anyone wants more examples, I have written quite a few examples of web scraping. L18L is using my google translate code for localizing shell scripts, jpeps has started using my yahoo finance example, Barry incorporated my google search grokking into puppy's alternative man command after die.net changed their formatting, there are a lot more, but that is all I can remember.

here is the basic process:
use the forms to get the appropriate results (keep a note of what does what)
save and open the html of the page and look for <form> .... </form>
(you will need to add the website and any subdirectories to the "action")
grok the hell out of that till you get it down to a minimum

[stop here if you just want to use it in a web page]

each one of the name=name1 value=value1 pairs translates to a corresponding &name1=value1

you can simulate the form being submitted by opening a browser to:

<URLofpage><action>?name1=value1&name2=value2....

[stop here if you just want to use it to get a page]

if that works - try it with wget (you may need to add -U firefox to wget to defeat anticrawler blocks)

if you output wget to stdout, you can pipe it through sed, grep, cut, etc... to format however you like

[see other tutorials for various types of formatting]
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
sunburnt
Posts: 5090
Joined: Wed 08 Jun 2005, 23:11
Location: Arizona, U.S.A.

#3 Post by sunburnt »

It`s what I used for the gtkDialog GUIs for a Debian downloader.
Download the page with wget and parse it for the needed text.

But trying to get it to resolve dependencies proved to be a real struggle.

Post Reply