You have a very big CSV file, and you want to split it into several files of n lines. Smaller files let you open them with CSV editors without reaching their limits.
If you are not afraid of using the terminal, macOS and Linux come with a simple utility to split files into parts: split. See full documentation on split man page.
split [-l line_count] [-b byte_count] [-n chunk_count] [file [prefix]]
The command split
breaks up files using:
- lines - useful for text-based files
- bytes - for binary files
- number - if you want to split a file into a specific number of parts.
CSV is a format to store structured data using text files. In a CSV file, each line is a data record. To split a CSV file, we'll use split
with the lines
parameter.
split -l 100000 filename.csv
Split
generates files with basic names beginning with x: xaa
, xab
, xac
, etc. See the next section to split your CSV files and rename the generated files.
Using split and renaming generated files
Split
has two interesting parameters. The first is -d to use numeric suffixes (1, 2, 3, etc.) instead of letters. The second is the prefix parameter for the generated files. To define the prefix, just add a string after your CSV filename.
split -l 100000 -d filename.csv file_
Then, loop on all files starting with file_
and rename them to add .csv
.
for i in $(find file_*); do mv $i "$i.csv"; done
Adding CSV header to generated files
You have several generated files like: file_00.csv
, file_01.csv
, etc.
But the CSV header line is on the first file (file_00.csv
) and all other files don't have a CSV header line.
To copy the file_00.csv first line on all other files, run the following command:
for i in $(find . -type f -name "file_*.csv" -not -name "file_00.csv");
do echo -e "$(head -1 file_00.csv)\n$(cat $i)" > $i;
done
What it does is simple:
Step1
Loop on all file that begin with file_ but are not file_00.csv:
find . -type f -name "file_*.csv" -not -name "file_00.csv"
Step2
For each file in the loop, copy the first line of file_00.csv
to it. There is no easy way to prepend a line to a file using the command line. We just use echo
with the first line from file_00.csv and the file content from the current file cat $i
.
To Sum up the whole process
Step 1: Split the CSV file into parts of 100000 lines and prefix the generated files with "file_".
split -l 100000 -d filename.csv file_
Step 2: Add .csv
to all generated files.
for i in $(find file_*); do mv $i "$i.csv"; done
Step 3: Copy header from first generated file at the beginning of the other files.
for i in $(find . -type f -name "file_*.csv" -not -name "file_00.csv");
do echo -e "$(head -1 file_00.csv)\n$(cat $i)" > $i;
done