Find and Replace is a powerful tool to clean data. When you need to replace a string in your data or to normalize values, Find and Replace is a perfect choice. And with the advanced options, it becomes a tool to remove strings from your texts.
Datablist offers a fast and powerful Find And Replace feature. Use it on any dataset size, from a few items to hundreds of thousands.
Find and Replace workflow
Step 1: Define the pattern and the options
From the Datablist app, open the Find and Replace tool from the "Clean" menu.
The first step is to define the pattern to be looked for. And the property to search.
The "Find" pattern is mandatory. All other inputs are optional. With an empty "Replace with" string, the pattern will be removed from your data.
Select if the search must be performed on a specific property or on all your data.
To learn more about the options, check the Find and Replace Options section.
Step 2: Run the "find" phase
Datablist "Find and Replace" uses a two phases workflow. It first searches for all the occurrences of your entered pattern. And then, after your confirmation, it replaces them.
Click on the "Find" button to run the first phase.
On the search summary, you will find:
- How many occurrences of the pattern have been found
- How many distinct items have the pattern in their property(ies). If the pattern is found twice in a text, it will count for 1 item.
- The occurrences number per property.
- And a preview of the replacement process on a sample of 10 items.
Step 3: Replace matched terms
At any moment, you can change the search options. Change the search pattern or the options until you get the requested preview results.
Then, run the replace phase by clicking "Replace All".
The replacement process is fast! On hundreds of thousands of items, it will take seconds or a few minutes. The more items to edit, the longer.
The items will be updated directly in the data table. Close the drawer or re-open the "Options" section to start a new Find and Replace process.
Find and Replace Options
Match case
By default, searching for "John" will return items that have "john", "John", or "JOHN" in their data. To make your search case-sensitive, and only replace items with "John", check the checkbox "Match case".
Match using regular expression
When using regular expressions, all options except "Match case" are disabled. Please check the Find and Replace using Regular Expression to see examples.
Whole words
Searching for "and" will match "Land", "and", "Andor", etc. To only search for "and", check the "whole words" checkbox. This option can also be used with brackets or parenthesis. Search for (s)
with "Whole words" checked to search and replace all (s)
occurrences but not the ones belonging to words (like street(s)
).
Ignore Accents
Characters with accents are different from their non-accented ancestor. é
is a different character than e
, or ë
. Datablist Find and Replace tool has the option to ignore the differences. Any accented or non-accented character will match any accented or non-accented character.
For example:
- Search for "Ile de Ré" will match "Ile de Re", "Ile de Ré".
- Search for "Ile de Re" will also match "Ile de Re", "Ile de Ré".
Find and Replace using Regular Expression
Regular Expression Cheat Sheet
| ^ | Start of string
| $ | End of string
| \b | Word boundary
| \s | White space (tab, space, new line, etc.)
| \w | Word characters - A word character is a character a-z, A-Z, 0-9, including _ (underscore)
| \d | Matches digits from 0 to 9
| + | n+ - matches any string that contains at least one n
| * | n* - quantifier matches any string that contains zero or more occurrences of n
| ? | n? quantifier matches any string that contains zero or one occurrences of n
Remove url scheme
^(http|https|ftp):?//(www)?(\.)?
And replace with an empty string.
Remove letter at the end
Define the letter you want to remove and add $
to indicate the letter must be at the end.
Here is an example to remove the /
from url.
/$
And replace with an empty string.
Get domain from an email address
^(\w|\.|-|\+)*@
And replace with an empty string.
Remove people name titles
^\s*(mr|mrs|dr|miss|ms|sir|madam|m).?\s
And replace with an empty string.
Remove Company name suffix
,?\s(llc|inc|incorporated|corporation|corp|co|gmbh|ltd).?$
And replace with an empty string.
Keep only the first word of a sentence (or keep the first name from full name)
^(?<first>(\w)*)(.*)
And replace with
$<first>
Remove the first word of a sentence (or keep the last name from full name)
^(?<first>(\w)+)(\s)+(?<last>.+)
And replace with
$<last>
Remove the alias component of email addresses
^(?<first>(\w|-|~)*)\+(\w)*(?<last>(@.*))
And replace with
$<first>$<last>
Check our full guide on email list cleaning.
Remove query params from URLs
To remove query params (?path=string
in a URL such as https://www.google.com?path=string
) from a list of URLS, use the following RegEx.
\?.*$
And replace with an empty string
FAQ
Why some properties are not available to search in?
Find and Replace is available only for Text and Number properties.