- slide 1 of 9
Copyright of image used: geekz.co.uk
- slide 2 of 9
There are several ways to join files in Linux. Some of the options are very smart and can merge only the differences. Some are just straight file joiners. Why would you have to use this? Think of the following scenarios: You have a file that you shared and someone has send you the file back with amendments. Now you need to join the two into one file. Or you have a really old file and started on a new version, and now you find the older version and want to join the two. Most of the commands discussed here are part of the Coreutils and should therefore be available in each distribution.
Whatever you end up using after reading this article, you will be able to join the files in Linux.
- slide 3 of 9
All of the commands below have a manual page. If you are new to the Linux command line, the manual pages explain what options there are for you to use. In most cases it also includes example uses. There are websites dedicated to just displaying these manual pages, however you can access most man pages directly in any shell. Just type "man COMMAND" (without double quotes).
If you want a manual that is more complete, then just the man page try this command: "info coreutils 'nl invocation'" (without double quotes). This example produces the complete manual for the NL command. Read the INFO man page for more "man info".
- slide 4 of 9
SDIFF is one of the smart commands to join a file. If you have two files that are the same but have slight differences, SDIFF can merge only the differences. SDIFF is a derivative from its original UNIX command DIFF. DIFF was developed in the early 70's by AT&T Bell Labs. The final version of DIFF was released in 1974 and was part of the 5th edition of UNIX.
sdiff [OPTION] FILE1 FILE2
The way you can use SDIFF is so diverse that to describe everything you can do with SDIFF would take up an article on its own.
For more information check out the SDIFF man page:
In a terminal type:
- slide 5 of 9
NL also a UNIX command was originally used for numbering lines. Unfortunately I couldn't find any more history on the NL command. With NL we can just straight join two files it doesn't compare anything it just joins the files. NL isn't really meant for file joining but does do a decent job of it nonetheless.
nl FILE1 > FILE2
- slide 6 of 9
SORT is actually used for sorting through a list of files. To accomplish this, it uses keys to sort on. By default, SORT takes takes the entire input as a key. SORT sorts,merges and compares files; therefore we can also use it to join two files:
sort FILE1 > FILE2
Check the man page of SORT for all the options, man sort. Or check the info page for a full manual: "info coreutils 'sort invocation'" (without double quotes).
- slide 7 of 9
Despite popular beliefs, the UNIX PASTE command is meant to be used to join files. PASTE uses two options; -d which is used to add a delimiter, and -s which will append the data in serial instead of parallel (read horizontal instead of vertical). Using paste, we can merge two files into one third file:
paste -d ',' FILE1 FILE2 > FILE3
- slide 8 of 9
The CAT command is to meant to be used to concatenate - or join - files. The abbreviation stands for catenate, which is a synonym for concatenate. CAT is often also used to display a file's contents. If you search for definitions or uses of the CAT command, you often find the term Useless Use Of Cat(UUOC). This phrase was made popular by the comp.unix.shell group on Usenet. The phrase was coined because users of that newsgroup where of the opinion that using cat without concatenating was useless. This is also referred to as CAT abuse. It is thought that using cat in this way, according to the group, is a waste of time and a process. However you still see a lot of Linux tutorials (on several subjects) use it in this manner. You could therefore conclude that this use of CAT is now widely accepted.
Here is the example:
cat FILE1 >> FILE2
- slide 9 of 9
The JOIN command is similar to the join you might know from talking SQL. It works much the same as join for relational databases. In addition to other options, you can use -t to add a delimiter. This is handy if you are joining a .csv file for instance. If you know the format of the files, you can also select a field to join by using the -1 field and -2 field option. It's great if you need to quickly join two fields and you don't want to join all fields in the file. Here is a simple example:
join [OPTION] FILE1 FILE2
JOIN also has more options so again check the man or info page by using the commands below:
info coreutils 'join invocation'