Learn How To Join Files In Linux

ep067

Copyright of image used: geekz.co.uk

Introduction

There are several ways to join files in Linux. Some of the options are very smart and can merge only the differences. Some are just straight file joiners. Why would you have to use this? Think of the following scenarios: You have a file that you shared and someone has send you the file back with amendments. Now you need to join the two into one file. Or you have a really old file and started on a new version, and now you find the older version and want to join the two. Most of the commands discussed here are part of the Coreutils and should therefore be available in each distribution.

Whatever you end up using after reading this article, you will be able to join the files in Linux.

Man Pages

All of the commands below have a manual page. If you are new to the Linux command line, the manual pages explain what options there are for you to use. In most cases it also includes example uses. There are websites dedicated to just displaying these manual pages, however you can access most man pages directly in any shell. Just type "man COMMAND" (without double quotes).

If you want a manual that is more complete, then just the man page try this command: "info coreutils 'nl invocation'" (without double quotes). This example produces the complete manual for the NL command. Read the INFO man page for more "man info".

SDIFF

SDIFF example in terminal

SDIFF is one of the smart commands to join a file. If you have two files that are the same but have slight differences, SDIFF can merge only the differences. SDIFF is a derivative from its original UNIX command DIFF. DIFF was developed in the early 70's by AT&T Bell Labs. The final version of DIFF was released in 1974 and was part of the 5th edition of UNIX.

Example:

sdiff [OPTION] FILE1 FILE2

The way you can use SDIFF is so diverse that to describe everything you can do with SDIFF would take up an article on its own.

For more information check out the SDIFF man page:

In a terminal type:

man sdiff

NL

NL example in terminal

NL also a UNIX command was originally used for numbering lines. Unfortunately I couldn't find any more history on the NL command. With NL we can just straight join two files it doesn't compare anything it just joins the files. NL isn't really meant for file joining but does do a decent job of it nonetheless.

Example:

nl FILE1 > FILE2

SORT

SORT example in terminal

SORT is actually used for sorting through a list of files. To accomplish this, it uses keys to sort on. By default, SORT takes takes the entire input as a key. SORT sorts,merges and compares files; therefore we can also use it to join two files:

sort FILE1 > FILE2

Check the man page of SORT for all the options, man sort. Or check the info page for a full manual: "info coreutils 'sort invocation'" (without double quotes).

PASTE

PASTE example in termial

Despite popular beliefs, the UNIX PASTE command is meant to be used to join files. PASTE uses two options; -d which is used to add a delimiter, and -s which will append the data in serial instead of parallel (read horizontal instead of vertical). Using paste, we can merge two files into one third file:

paste -d ',' FILE1 FILE2 > FILE3

CAT

CAT example in terminal

The CAT command is to meant to be used to concatenate – or join – files. The abbreviation stands for catenate, which is a synonym for concatenate. CAT is often also used to display a file's contents. If you search for definitions or uses of the CAT command, you often find the term Useless Use Of Cat(UUOC). This phrase was made popular by the comp.unix.shell group on Usenet. The phrase was coined because users of that newsgroup where of the opinion that using cat without concatenating was useless. This is also referred to as CAT abuse. It is thought that using cat in this way, according to the group, is a waste of time and a process. However you still see a lot of Linux tutorials (on several subjects) use it in this manner. You could therefore conclude that this use of CAT is now widely accepted.

Here is the example:

cat FILE1 >> FILE2

JOIN

JOIN Example in terminal

The JOIN command is similar to the join you might know from talking SQL. It works much the same as join for relational databases. In addition to other options, you can use -t to add a delimiter. This is handy if you are joining a .csv file for instance. If you know the format of the files, you can also select a field to join by using the -1 field and -2 field option. It's great if you need to quickly join two fields and you don't want to join all fields in the file. Here is a simple example:

join [OPTION] FILE1 FILE2

JOIN also has more options so again check the man or info page by using the commands below:

man join

info coreutils 'join invocation'