Few CSV editors can manage large CSV files. Spreadsheet tools like Excel and Google Sheet break before reaching the million of lines. And dedicated tools target data scientists so they are expensive and complex to use.
A free alternative is to use your terminal and manage your big CSV file using the command line.
A CSV file is a text file, with each line being a data record. And each record is made of fields separated by commas (or sometimes by semicolons ";" or tabular keys). Because a CSV file is a text file and each line is a record, you can use Linux/Unix commands for manipulating texts.
The command line might seem scary but they are also very powerful. I'm listed here the command I use to search in big CSV files.
Note: All the commands listed bellow are just reading your files. They don't overwrite or delete your CSV files.
A summary of the commands listed on this page:
- Searching in a CSV file for a single keyword
- Searching in a CSV file for multiple keywords
- How to count the number of matching rows in a big CSV file
Before moving to the commands, open your Terminal. On macOs, you can use Spotlight with keyboard shortcut ⌘ + Space and type Terminal. Then hit "Enter"/"Return" to open the Terminal.
Searching in a big CSV file for a single keyword
To perform a simple search and list all rows that match a specific keyword, use grep.
grep "keyword" filepath.csv
To be case insensitive, add the -i option.
grep -i "keyword" filepath.csv
In my example, I search in a companies dataset the lines matching the keyword airtable.com. I get the two matching rows. One for the company with the website airtable.com (the good one), and another for chinaairtable.com (unexpected).
Search for whole keyword
To avoid unexpected results like that, you can switch to Regular Expression (RegEx) and search for the whole keyword.
By adding the -E option to grep, I tell grep to interpret the pattern as a regular expression. And I add \b before and after my keyword. \b is a RegEx operator that means word boundary.
grep -E -i "\bkeyword\b" filepath.csv
Search terms starting with a keyword
By using the same RegEx operator \b at the beginning of your pattern, you are searching all the lines with a word starting with your keyword.
grep -E -i "\bkeyword" filepath.csv
Search terms ending with a keyword
And by using it at the end of your pattern, you are searching the lines with a word ending with your keyword.
grep -E -i "keyword\b" filepath.csv
Searching in a big CSV file for multiple keywords
When searching for multiple keywords in a CSV file, you can search for one of the keywords or for all the keywords.
Searching for one of several keywords in a big CSV file
Grep with a RegEx is once again the perfect choice. In a RegEx, the symbol | means OR.
grep -i -E "keyword1|keyword2" filepath.csv
And to search for whole keywords, we can add \b for each keyword.
grep -i -E "\bkeyword1\b|\bkeyword2\b" filepath.csv
Searching for all the keywords in a big CSV file
To search for lines in your CSV file that have all the requested keywords, you can use another command: awk. Awk is another command tool to deal with text files. It has more advanced features.
Awk also works with RegEx and interprets text enclosed in slashes (/) as regular expressions.
awk '/keyword1/ && /keyword2/' filepath.csv
Awk processes the file line by line. To be case insensitive, you can convert the line to lower text before searching your keywords:
awk '{lc=tolower($0)}; (lc ~ /keyword1/) && (lc ~ /keyword2/)' filepath.csv
Here, we take the current line $0 and create a variable lc containing the current line in lower case. Then we check for keywords in it.
In the screenshot below, I search in my CSV for a line with the keywords france AND google.
How to count the number of matching rows?
Several commands can be combined. An important command is wc for word count. It does a great job at counting words or lines in a text file.
Use the -l option to count lines in a CSV file or after another command.
For example, to count how many lines match a keyword, combine it with the single keyword search.
grep -i "keyword" filepath.csv | wc -l