Concatenating unknown filetypes

Booting, installing, newbie
Post Reply
Message
Author
inherit2
Posts: 10
Joined: Mon 23 Jun 2014, 15:05

Concatenating unknown filetypes

#1 Post by inherit2 »

Hi,

Cat command concatenates files.

I have a three files that need to be concatenated. FILE1.rar FILE2.rar and FILE3.rar. If there was only one file in it as the content, a move for instance, I would use:

Code: Select all

cat FILE* > ./one_file.avi
But what if I:
1. I Don't know the content of the files that need to be concatenated. I don't know whether it is video file or anything else.
2. There are two or more files as the content of FILE1, FILE2, FILE3. For example a text file, pdf file and JPEG file.

thx

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

#2 Post by s243a »

Just make it a double arrow:

Code: Select all

cat FILE* >> ./one_file.avi
although you might want to pipe it to a sort function first.

Edit but you'll need to do it in a loop instead of using shell patterns (i.e. *). I'll give more details later or alternatively you could write some filter function for the pipeline.

musher0
Posts: 14629
Joined: Mon 05 Jan 2009, 00:54
Location: Gatineau (Qc), Canada

Re: cat and concatenating files

#3 Post by musher0 »

inherit2 wrote:Hi,
(...)
But what if I:
1. I Don't know the content of the files that need to be concatenated. I don't know whether it is video file or anything else.
2. There are two or more files as the content of FILE1, FILE2, FILE3. For example a text file, pdf file and JPEG file.

thx
Hello inherit2.

Before concatenating, may I suggest that you would use the < file >
utility to discover the type of the various files. The -b setting should be
enough, but if you need more info, you can skip it.

E.g.

Code: Select all

[~]>file -b spm3rc 
UTF-8 Unicode text, with very long lines

[~]>file -b Menu.zip 
Zip archive data, at least v1.0 to extract

[~]>file -b lshw-short.rpt
UTF-8 Unicode text

[/mnt/ram1/Downloads]>file desc-heral-histor-partic.pdf 
desc-heral-histor-partic.pdf: PDF document, version 1.5

[/mnt/sdc6/Films]>file Documentary\ -\ \'Meet\ The\ Coywolf\'-MhtuHXInt88.mkv
Documentary - 'Meet The Coywolf'-MhtuHXInt88.mkv: Matroska data

[/mnt/sdc6/Films]>file in_her_shoes169.avi
in_her_shoes169.avi: RIFF (little-endian) data, AVI, 1024 x 768, 25.00 fps, video: Microsoft MPEG-4 v2, audio: MPEG-1 Layer 3 (stereo, 44100 Hz)
And then you change the 3-letter extension at the end of the file name
to reflect the type of file it is and dissipate the confusion.

IHTH
musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)

User avatar
6502coder
Posts: 677
Joined: Mon 23 Mar 2009, 18:07
Location: Western United States

#4 Post by 6502coder »

I don't understand your question, because what you are proposing to do makes no sense to me.

When the "cat" command concatenates files, it simply takes the all the bytes of the first file, appends all the bytes of the 2nd file, appends to that all the bytes of the 3rd file, etc., to produce one big file.

Cat does not inspect each file to see what type it is, because cat does not care. It just blindly dumps all the bytes into one file. Some files, like plain text files, have no special internal structure. But many files, such as JPGs and MP3s and .doc files and so forth, DO have a specific internal structure. If you concatenate these kinds of files, all you get is a big mess that no longer has the required internal structure.

Suppose for example you have 2 JPG image files, a.jpg and b.jpg. Then after

# cat a.jpg b.jpg > c.jpg

the file c.jpg is NOT a proper JPG file, because JPG files have to have a specific internal structure:
https://en.wikipedia.org/wiki/JPEG_File ... nge_Format

The file c.jpg violates this structure. Moreover, to recover the two original images from c.jpg, you'd need a program that would search c.jpg looking for the header data that marks the start of the bytes that came from b.jpg file, and split c.jpg at that point.

Similarly, if you have two ZIP archives this.zip and that.zip, after

# cat this.zip that.zip > theother.zip

the file theother.zip is NOT a proper ZIP archive, because ZIP files have a specific internal structure [see Wikipedia entry for "ZIP (file format)"]

and the file theother.zip violates this structure. For the same reason, I do not believe your claim that RAR archives can be concatenated to produce a valid RAR archive.

And the example you asked about, with

# cat FILE1.txt FILE2.pdf FILE3.jpg > bigmess

would simply produce a giant mess. To recover the original 3 files, you'd need a program that would first scan "bigmess" to find the header sections of the PDF and JPG, and split the file at those points.

User avatar
Burn_IT
Posts: 3650
Joined: Sat 12 Aug 2006, 19:25
Location: Tamworth UK

#5 Post by Burn_IT »

I rather suspect he is hoping that it WILL do the search and conversion for him.

I usually do a quick browse of unknown files with a hex browser and looking at the first or last block of it tends to make its format obvious.
"Just think of it as leaving early to avoid the rush" - T Pratchett

inherit2
Posts: 10
Joined: Mon 23 Jun 2014, 15:05

#6 Post by inherit2 »

Sorry, maybe I wasn't too clear. I have three file parts FILE1.rar, FILE2.rar, FILE3.rar. Those 3 files are the result of division (by some dividing software). I am not sure what the content of those parts is.

Now, what shall I do to get to the content? I thought that cat comes here in handy.

There seems to be no problem when divided files consist of one rar file. Simply we can use: cat FILE* > file.rar and after concatenating we will receive file.rar that we need to extract. But the problem is that I don't know the content.

Of course I can use cat FILE* > file and then file file - but the output says "file: RAR archive data, v14, flags: Archive volume, os: Win32" - which is not true. I know there is an avi file.

PS
With the help of ... some other system whose name starts with "w" I know that the content is large avi file. I just extract FILE1 and the output is: holiday.avi. But I am a Linux user and I don't use that "w" os. Smile

On Linux unraring in the same way results in an error: "Truncated RAR file data"

User avatar
Flash
Official Dog Handler
Posts: 13071
Joined: Wed 04 May 2005, 16:04
Location: Arizona USA

#7 Post by Flash »

Inherit2, did you read 6502coder's post? I'd say it answers your question pretty well.

User avatar
Burn_IT
Posts: 3650
Joined: Sat 12 Aug 2006, 19:25
Location: Tamworth UK

#8 Post by Burn_IT »

If RAR split a resulting file because it became too big, RAR will be the tool to use to combine them again and extract the contents.
If I remember doing this sort of thing years ago, an attempted extract from just one of the parts will cause RAR to ask for the others.

It should not matter in the least what OS RAR ran under to create the file in the first place.
"Just think of it as leaving early to avoid the rush" - T Pratchett

User avatar
6502coder
Posts: 677
Joined: Mon 23 Mar 2009, 18:07
Location: Western United States

#9 Post by 6502coder »

Well, it seems to me that what we're dealing with is one of two possible scenarios:

1) There was a big file, or set of files, that was packed as a RAR archive. Then that RAR archive file was split into 3 pieces, FILE1.rar, FILE2.rar, FILE3.rar. Of course this is a misuse of the .rar extension, since none of the 3 pieces is in fact a proper RAR archive file, but whatever...

In this case, you would simply undo the splitting with

# cat FILE1.rar FILE2.rar FILE3.rar > FILE.rar

AND THEN unpack the resulting FILE.rar archive file. Note that

# file FILE.rar

will indeed say that FILE.rar is a RAR archive, because it IS. The "file" command does not unpack a RAR (or ZIP, or whatever) archive to determine its contents, all it sees is that FILE.rar itself is a RAR archive. The fact that FILE.rar contains an AVI file (or whatever) is completely irrelevant.

2) The alternative scenario is that an original big file was first split into 3 pieces, and then each piece was separately packed as a RAR archive. In this case, all 3 .rar files would indeed be proper RAR archive files.

In this case, you would FIRST unpack each RAR file, THEN use "cat" to recombine the pieces into one big file.

User avatar
Burn_IT
Posts: 3650
Joined: Sat 12 Aug 2006, 19:25
Location: Tamworth UK

#10 Post by Burn_IT »

Then there is that the original RAR compression produced a file that was too big to produce a single output file and RAR itself split it into 3.
"Just think of it as leaving early to avoid the rush" - T Pratchett

Post Reply