Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Mon 22 Sep 2014, 06:24
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
how do I remove duplicate words on a text list?[solved]
Post_new_topic   Reply_to_topic View_previous_topic :: View_next_topic
Page 1 of 1 Posts_count  
Author Message
scsijon

Joined: 23 May 2007
Posts: 1042
Location: the australian mallee

PostPosted: Fri 01 Mar 2013, 05:19    Post_subject:  how do I remove duplicate words on a text list?[solved]  

I have a list of 24000+ 'words' in a Text file. It's in a vertical list format.

However, about 40+% are duplicates and i'd like to delete them.

The format of the 'words' are basically anything that can be typed in on a keyboard.

I have used 'return-newline' as the separator.

Has anyone a simple script I can run to 'fix the problem'

thanks

Edited_time_total
Back to top
View user's profile Send_private_message Visit_website 
SFR


Joined: 26 Oct 2011
Posts: 1068

PostPosted: Fri 01 Mar 2013, 06:57    Post_subject: Re: how do I remove duplicate words on a text list?  

Hey Scsijon

To make it clear - the list looks like this:
abc
def
abc
abc
blablabla
zxzx
something
blablabla
...


and should look like this:
abc
def
blablabla
zxzx
something
...


right?

If you don't mind that lines will be also sorted:
Code:
sort -u input_file

But if it's a problem, here's cute awk one-liner I just found on Stack Overflow:
Code:
awk '!_[$0]++' input_file


Greetings!

_________________
[O]bdurate [R]ules [D]estroy [E]nthusiastic [R]ebels => [C]reative [H]umans [A]lways [O]pen [S]ource
Omnia mea mecum porto.
Back to top
View user's profile Send_private_message 
vovchik


Joined: 23 Oct 2006
Posts: 1285
Location: Ukraine

PostPosted: Fri 01 Mar 2013, 08:49    Post_subject:  

Dear guys,

This is pretty easy too:

Code:

cat some.txt | sort | uniq


With kind regards,
vovchik
Back to top
View user's profile Send_private_message 
amigo

Joined: 02 Apr 2007
Posts: 2247

PostPosted: Fri 01 Mar 2013, 11:52    Post_subject:  

I recently had the same problem and wrote something for it. But, I don't find it right now, so I've re-created it:
Code:
#!/bin/bash
# uniq_no-sort
# print out uniq lines, but without sorting them

FILE=$1

OUT="$FILE.uniq"
: > $OUT

while read LINE ; do
   if ! [[ $(fgrep -q $LINE $OUT) ]] ; then
      echo $LINE >> $OUT
   fi
done < $FILE
Back to top
View user's profile Send_private_message 
tallboy


Joined: 21 Sep 2010
Posts: 444
Location: Oslo, Norway

PostPosted: Fri 01 Mar 2013, 12:28    Post_subject:  

Sorry, mistake, could not find a way to delete post.
_________________
True freedom is a live Puppy on a multisession CD/DVD.
Back to top
View user's profile Send_private_message 
GustavoYz


Joined: 07 Jul 2010
Posts: 896
Location: .ar

PostPosted: Fri 01 Mar 2013, 14:50    Post_subject:  

vovchik wrote:

Code:

cat some.txt | sort | uniq


With kind regards,
vovchik


Cat isnt really needed:
Code:
sort file.txt | uniq

_________________

Back to top
View user's profile Send_private_message 
scsijon

Joined: 23 May 2007
Posts: 1042
Location: the australian mallee

PostPosted: Fri 01 Mar 2013, 18:17    Post_subject:  

Sorry folks, I wish it was that simple.

I have already sorted the list, that was when I relaized how many duplicates were in it.


consider this:


aaa
aba
ada
ad
aea
aea
aea
agd
agd
ased
ased
ased-ss
ased-ss<p

and on we go.

I want to remove all the duplicates found.

It's what happens when you need to rebuild a crashed component list from backups.
Back to top
View user's profile Send_private_message Visit_website 
Keef


Joined: 20 Dec 2007
Posts: 628
Location: Staffordshire

PostPosted: Fri 01 Mar 2013, 19:15    Post_subject:  

Code:

# cat list.txt
aaa
aba
ada
ad
aea
aea
aea
agd
agd
ased
ased
ased-ss
ased-ss<p
# sort list.txt | uniq
aaa
aba
ad
ada
aea
agd
ased
ased-ss
ased-ss<p
#

Seems to work for me....
Back to top
View user's profile Send_private_message 
scsijon

Joined: 23 May 2007
Posts: 1042
Location: the australian mallee

PostPosted: Fri 01 Mar 2013, 19:59    Post_subject:  

sorry , vovchick, GustavoYz and Keef

yes it does, I must have done a typo the first time I tried it.

thanks all
Back to top
View user's profile Send_private_message Visit_website 
Display_posts:   Sort by:   
Page 1 of 1 Posts_count  
Post_new_topic   Reply_to_topic View_previous_topic :: View_next_topic
 Forum index » Off-Topic Area » Programming
Jump to:  

Rules_post_cannot
Rules_reply_cannot
Rules_edit_cannot
Rules_delete_cannot
Rules_vote_cannot
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0614s ][ Queries: 12 (0.0056s) ][ GZIP on ]