Cold outreach and lead generation have become much harder over the last few years, and we are not seeing any changes so far. This is because of the rise and wide adoption of advanced AI personalization since 2023, and it will become even more difficult with time. However, there are still several proven techniques that can consistently generate leads by following core cold outreach principles, even without using the latest AI personalization.
One of them is always clean the company names.
New LLM tools like ChatGPT simplify company name cleaning. However, processing thousands of names can be expensive. A cost-effective alternative is using ChatGPT to generate a script, which you can run for free on hundreds of thousands of names.
In this article, I'll talk about those 2 ways of leveraging AI company name cleaning:
- How to clean company names using Generative AI (ChatGPT)
- How to clean company names using AI-Generated JavaScript code
- The differences between the two methods and when to use each
Method 1: Cleaning the company names using Generative AI.
The first method leverages Generative AI (ChatGPT) to clean company names automatically. This approach is ideal for lists under 20,000 records and offers high accuracy, even for international names.
Datablist provides a dedicated Company Name Cleaner template that removes legal forms like LLC, Inc., GmbH, SAS, and more. It uses OpenAI’s API to clean and standardize names in bulk intelligently.
Let’s go step by step on how to use this method.
First, import your company names in a CSV or Excel file.
My file contains company names only for demonstration purposes, but you can upload a regular file with multiple properties/columns, and it will work in the same way.
Then select the “Enrich” option.
Then select the Enrichment Templates.
Choose the “Company Name Cleaner”.
Then, edit the prompt and select the column that contains the company names using {{Name}} or /Name.
Go to the next step to configure your outputs.
You can either create a new property for your outputs or map it to an existing property in your collection.
In my case, I created a new one.
Datablist also automatically creates a "Run Status" property. This helps tracking which names have been processed, and the cost for each processing.
Now you can configure your run settings by choosing between these options:
- Run it in Async (in the cloud)
- Test on the first 10 items
- Run only on the first 10 items, first 100 items, or configure how many items you want to clean.
I've configured my run settings, and now I can run my enrichment.
This is the last step before we move to the second part of the enrichment.
I chose to run the enrichment in Async on the first 100 items.
After 30 seconds, I received the cleaned company names.
Here is a video of the Company Name cleaning process
Method 2: Clean Company Names with an AI-Generated JavaScript code.
If you need to clean large lists of company names without spending on OpenAI credits, this method is for you. Instead of using AI directly for each name, we use it once to generate a JavaScript script that does the cleaning.
This approach is perfect for bulk processing and ensures consistency across structured lists. Once the script is generated, you can run it for free on hundreds of thousands of company names.
Here’s how to do it step by step.
First, import your company names in Datablist. Use a CSV or Excel file.
My file contains company names only for this demonstration, but you can upload a file with multiple properties/columns. Your file can contain hundreds of thousands of lines. Datablist is good for opening large CSV files.
Select the "Edit" option and then choose the "AI Editing" feature.
Here is the prompt to use with the property containing the company names as a reference:
I want you to clean and normalize all the company names.
In order to do that you have to remove all legal forms.
Here are the legal forms you have to remove but only if they are behind the company name
SA, SARL, SAS, SASU, EURL, SNC, SCS, SCIC, SCM, SEL, SELARL, SELAS, SELAFA, SELCA, SEP, GIE, EI, EIRL, AERL, ENO, SCOP, SCIC, SC, SICA, CAE, SARL de famille, SAS de famille, SELURL, SELASU, SELAFAU, SELCAU, SEPU, GIEU, EIU, EIRLU, AERLU, ENOU, SCOPU, SCICU, SCU, SICAU, CAEU, SARL de familleU, SAS de familleU, SELURLU, SELASUU, SELAFAUU, SELCAUU, SEPUU LLC, Inc., Corp., Co., LLP, LP, PLLC, PA, PC, DBA, S Corp, C Corp, B Corp, Nonprofit, Sole Proprietorship, Partnership, Joint Venture, Cooperative, Trust, Estate, Fund, Association, Society, Union, Syndicate, Consortium, Holdings, Group, Foundation, Institute, Limited, LTD, GP, LP, LLP, LLC, C Corp, S Corp, PC, B Corp, Ltd, PLC, CIC, GbR, OHG, KG, PartG, GmbH, UG, AG, eG, SNC, SCS, SARL, SA, SAS, EURL, Pty Ltd, OPC, VOF, CV, BV, NV, KG, KGaA, JV, GmbH & co. kg, company
Use {{company_name}} as a reference and remove all legal forms.
Notes: If you want to use the same prompt don’t forget to use a property as a reference using curved parentheses ({{Property}}).
Once the AI has generated the script, you can review your results and improve your prompt if necessary.
Once you have clicked on "Run on items".
What is the difference between both methods?
Generative AI Cleaning (Method 1) uses AI directly to clean company names, while Method 2 (AI-Generated JavaScript) uses AI to create a JavaScript script that performs the cleaning.
Generative AI Cleaning (Method 1) offers higher accuracy, works well for international lists, and is ideal for datasets under 20,000 records. However, it consumes OpenAI credits for each company name.
With AI-generated JavaScript (Method 2), ChatGPT is used once to create a cleaning script, which can then be run for free on large datasets. This method is great for mass processing and avoids ongoing API costs, but it works best for structured lists with predictable formatting.
Both methods automate company name cleaning, but the choice depends on your list size, budget, and accuracy needs.
When should you use Method 1 versus Method 2?
Generative AI Cleaning (Method 1)
- Lists smaller than 20,000 records
- When accuracy is needed
- International lists
AI-Generated JavaScript (Method 2)
- Huge lists
- No budget for OpenAI credits
- Lists that follow the same structure
Frequently Asked Questions (FAQ) About Cleaning Company Names
How do I clean company names automatically for my B2B lead generation?
There are two primary methods available: using a dedicated company name cleaner ChatGPT template (Method 1) or utilizing AI editing features (Method 2). The choice depends on your list size, budget, and accuracy requirements. For lists under 20,000 records, running ChatGPT for each name provides higher accuracy, while using a generated JavaScript script is ideal for larger datasets with no additional costs.
What's the best way to remove legal entities from company names at scale?
For large volumes, the most efficient approach is using the AI editing feature (Method 2), which can process large volumes of company names without requiring credits. It automatically removes common legal forms like LLC, Inc., GmbH, and many others while preserving the core company name. For more precise cleaning, especially with international companies, the dedicated company name cleaner (Method 1) offers superior accuracy.
Can AI clean company names from different countries and languages?
Yes, both cleaning methods support international company names. Method 1 is specifically optimized for multi-language support and recognizes legal forms from various countries. Method 2 can be customized through prompt engineering to handle specific language requirements and international legal entity formats.
How long does it take to clean 1000 company names using AI?
Using Method 1 (company name cleaner template), it takes approximately 2 minutes to process 1000 company names in Async mode. Method 2 (AI editing) typically processes the same volume faster but may require additional review time to ensure accuracy. Both methods offer batch processing capabilities to optimize cleaning efficiency.