How to use PostgreSQL COPY and \COPY command to improve database operation


Use PostgreSQL COPY and \COPY command to improve database operation

PostgreSQL has very useful COPY command and \COPY command to input or output data between database and file. We can save our time by utilizing the commands effectively.

Of course, if we need to add many records into database, you create a lot of SQL INSERT statements, but COPY and \COPY command is very convenient and performance is very good.


Understanding difference between COPY command and \COPY command

PostgreSQL provides two COPY commands. One is "COPY" and the other is "\COPY". Before start using COPY command effectively, we need to understand the difference.

COPY command is to input and output data between database and file ONLY in database server. If we connect to database remotely (E.g. from other computer), we cannot use "COPY" command from remote computer. In other words, input file or output file have to be in database server.

On the other hand, "\COPY" is to input or output file in client computer. If we connect to database server remotely by utilizing psql command, we can input or put file in client computer.


Prepare test data

In order to use COPY and \COPY command, let's create a CSV file and a database table. CSV file contains id column and name column, also it has file header line (id,name). We will use COPY and \COPY command by using the CSV file and database table.
$ cat colors.csv
id,name
1,red
2,blue
3,yellow
4,pink
5,green


=> \d colors
    Table "public.colors"
+--------+---------+-----------+
| Column |  Type   | Modifiers |
+--------+---------+-----------+
| id     | integer |           |
| name   | text    |           |
+--------+---------+-----------+


Load CSV file into table

In order to load CSV file into the colors table, we use psql command, so we need to use \COPY command. Also, we can use COPY command and \COPY command as follows.

COPY table_name FROM file_name   (Load file into table)
COPY table_name TO file_name     (Write table data to file)

\COPY table_name FROM file_name  (Load file into table)
\COPY table_name TO file_name    (Write table data to file)

Now, we load CSV file into table. Since the file data is separated by comma (,), we need to tell \COPY command we use CSV format. Also, the file contains header line (id,name), we need to skip the first line by using "HEADER" option. We are going to load file data into table, "\COPY table_name FROM file_name" we need to use as follows.
=> \COPY colors FROM 'colors.csv' CSV HEADER

=> SELECT * FROM colors;
+----+--------+
| id |  name  |
+----+--------+
|  1 | red    |
|  2 | blue   |
|  3 | yellow |
|  4 | pink   |
|  5 | green  |
+----+--------+
(5 rows)


Write table data to file

In order to write table data to file, we need to use "\COPY table_name TO file_name" command. Also, the file data needs to be separated by comma (,), we need to tell \COPY command we use CSV format. We also need to write header line (id,name), we add "HEADER" option to \COPY command.

=> \COPY colors TO 'test.csv' CSV HEADER

$ cat test.csv
id,name
1,red
2,blue
3,yellow
4,pink
5,green


Load and write tab separated data between table and file

In real world, we sometimes need to use tab separated (\t) data. Let's see how to use \COPY command to make tab separated file.

The difference between comma separated file and tab separated file is that we need to add "DELIMITER E'\t'" option to tell \COPY command to use tab (\t) character. E'\t' means that PostgreSQL recognizes "\" and "t" are different characters, so PostgreSQL thinks there are two characters, but DELIMITER options takes only one character (1 byte). we need to add "E" at the beginning of the delimiter, so that PostgreSQL thinks \ is a special character. As a result of that, \COPY command can understand the delimiter (separator) is tab character (\t).

Let's load tab separated file into table.

$ cat colors.tsv
id      name
1       red
2       blue
3       yellow
4       pink
5       green

=> DELETE FROM colors;

=> \COPY colors FROM 'colors.tsv' CSV DELIMITER E'\t' HEADER

=> SELECT * FROM colors;
+----+--------+
| id |  name  |
+----+--------+
|  1 | red    |
|  2 | blue   |
|  3 | yellow |
|  4 | pink   |
|  5 | green  |
+----+--------+
(5 rows)


Then, let's write table data into tab separated file
=> \COPY colors TO 'test.tsv' CSV DELIMITER E'\t' HEADER

$ cat test.tsv
id      name
1       red
2       blue
3       yellow
4       pink
5       green


Adding double-quotes into each column value

If we have comma or tab character in column value in file or table, we don't know from where to where value is continued. In that case, all column values need to be quoted by double-quotes (""), so that EXCEL file can separate values properly.

In order to test, we made colors2.txt file as follows. This data contains comma (,) in column value. Also, all column values are quoted by double-quotes.

$ cat colors2.csv
id,name
"1","red"
"2","bl,ue"
"3","yel,low"
"4","pi,nk"
"5","gr,een"

=> DELETE FROM colors;

=> \COPY colors FROM 'colors2.csv' CSV HEADER QUOTE '"'

=> SELECT * FROM colors;
+----+---------+
| id |  name   |
+----+---------+
|  1 | red     |
|  2 | bl,ue   |
|  3 | yel,low |
|  4 | pi,nk   |
|  5 | gr,een  |
+----+---------+
(5 rows)


=> \COPY colors TO 'foobar.csv' CSV HEADER QUOTE '"'

$ cat foobar.csv
id,name
1,red
2,"bl,ue"
3,"yel,low"
4,"pi,nk"
5,"gr,een"


If we add only "QUOTE" option without '"', only some values are quoted by double-quotes. If we need to double-quote all column values, we need to add FORCE option with asterisk (*), so that all column values are wrapped by double-quotes.
=> \COPY colors TO 'foobar.csv' CSV FORCE QUOTE *

$ cat foobar.csv
id,name
"1","red"
"2","bl,ue"
"3","yel,low"
"4","pi,nk"
"5","gr,een"


If we need to add double-quotes to a specific columns, we need to tell which columns need to be quoted with FORCE QUOTE option followed by comma separated column names as follows.
=> \COPY colors TO 'foobar.csv' CSV HEADER FORCE QUOTE name

$ cat foobar.csv
id,name
1,"red"
2,"bl,ue"
3,"yel,low"
4,"pi,nk"
5,"gr,een"


=> \COPY colors TO 'foobar.csv' CSV HEADER FORCE QUOTE id,name

$ cat foobar.csv
id,name
"1","red"
"2","bl,ue"
"3","yel,low"
"4","pi,nk"
"5","gr,een"

If we have double-quote character in data, we need to use single-quote character (') to wrap column value as follows.
=> \COPY colors TO 'xyz.csv' CSV HEADER QUOTE ''''

$ cat xyz.csv
id,name
1,red
2,'bl,ue'
3,'yel,low'
4,'pi,nk'
5,'gr,een'


Write table data to console (standard output)

\COPY command can write table data to console (standard output). We can use "stdout" output file, so that we can display table data on console (standard output) as follows.

=> \COPY colors TO stdout
1       red
2       bl,ue
3       yel,low
4       pi,nk
5       gr,een

=> \COPY colors TO stdout CSV HEADER FORCE QUOTE *
id,name
"1","red"
"2","bl,ue"
"3","yel,low"
"4","pi,nk"
"5","gr,een"


Read data from console (standard input) and put it into table

\COPY command also can read data from console (standard input) and put it into table. We can use "stdin" as input file.

When we use \COPY command with "stdin", we are asked to type data line by line by hitting ENTER key. Let's type line by line. First line is header line (id,name). In order to tell data is ended, we need to type backslash and dot (\.) at the end of the lines, so that \COPY command knows input data is ended.

=> \COPY colors FROM stdin CSV HEADER
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> id,name
>> 1,red
>> 2,blue
>> 3,yellow
>> 4,pink
>> 5,green
>> \.

=> SELECT * FROM colors;
+----+--------+
| id |  name  |
+----+--------+
|  1 | red    |
|  2 | blue   |
|  3 | yellow |
|  4 | pink   |
|  5 | green  |
+----+--------+
(5 rows)


How to use SQL in \COPY command

We only manipulated data by indicating table_name and file_name or stdin/stdout, but we can also use SQL statement in \COPY command.

We need to use "\COPY (SQL statement) TO [file_name or stdout]" format. SQL statement needs to be quoted by "()" as follows.

Let's run SQL statement in \COPY command. We display results on console as follows.
=> \COPY (SELECT id, name FROM colors WHERE name IN ('red','blue')) TO stdout CSV HEADER
id,name
1,red
2,blue

As long as I tried, we need to provide SQL statement in one line.


How to use Linux command in \COPY

\COPY command has "PROGRAM" options which has ability to run Linux command. By using the option, we can run Linux command. This is very powerful!

In the following example, we run gzip command (file compression command) in \COPY command to compress data. As a result of that, compressed file is created and it contains comma separated table data.
=> \COPY colors TO PROGRAM 'gzip > /tmp/colors.gz' CSV HEADER

$ zcat /tmp/colors.gz
id,name
1,red
2,blue
3,yellow
4,pink
5,green


How to use COPY command in PL/PgSQL function

In this section, we use COPY command (Not \COPY) in PostgreSQL database function (PL/PgSQL).

Since database function runs in PostgreSQL database server, we need to use COPY command instead of \COPY command. \COPY command should be used in psql command-line tool and can use from remote computer, but COPY command is for reading/writing data in PostgreSQL database server. Therefore, we need to use COPY command as PL/PgSQL function runs in PostgreSQL database server.

Let's create the following simple PL/PgSQL function as follows. The function simply write table data to file by using COPY command. COPY command reads colors table and write all data to /tmp/colors.csv file. Finally, the function returns true if we have no issues.
CREATE OR REPLACE FUNCTION generate_csv()
  RETURNS boolean AS
$BODY$
DECLARE
  var_sql text;
BEGIN
  RAISE NOTICE 'Start generating CSV file.';
  var_sql := 'COPY colors TO ''/tmp/colors.csv'' CSV HEADER ;';
  RAISE NOTICE '%', var_sql;
  EXECUTE var_sql;
  RAISE NOTICE 'Generated CSV file.';

  RETURN true;
END;
$BODY$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER;


Please note that we have to create the function as "postgres" super user. Only super user is able to read/write data in file system, so please make sure the function is created and owned by postgres user. As we can see, we successfully write data to CSV file in PL/PgSQL function as follows.
$ psql -U postgres

=# SELECT * FROM generate_csv();
NOTICE:  Start generating CSV file.
NOTICE:  COPY colors TO '/tmp/colors.csv' CSV HEADER ;
NOTICE:  Generated CSV file.
+--------------+
| generate_csv |
+--------------+
| t            |
+--------------+
(1 row)

=# \q

$ cat /tmp/colors.csv 
id,name
1,red
2,blue
3,yellow
4,pink
5,green