Localization is a must if you want your content to be accessible to an international audience. Machine translation using machine learning offers high-quality translations for a fraction of the cost compared to professional human translators.
But how to translate your content without spending your time copy-pasting your text to Google Translate and back to your application?
CSV files are perfect for storing structured data and are used by most applications to export your data into a generic format.
Product catalogs can be exported to CSV files. CMS articles can be extracted into flat files. CSV has become the de facto format for sharing generic data between applications.
But you can't translate a CSV file as a simple text file. Some columns might be names, or dates and must be kept unchanged. How do you translate some columns from a CSV file?
In this guide, we’ll go through these steps:
- How to import your CSV file on Datablist
- How to translate the text on a CSV column to another language
- Export your translated data as CSV or Excel files
Let's dive in.
Step 1: Import your CSV file
In this guide, all the work will be done online with Datablist CSV Viewer and Editor.
I will use a "Book Summaries" CSV file in this example. The demo CSV file is accessible here.
Start by creating a new collection. The collection will hold your CSV data. Click on the + button in the sidebar or the "Start with a CSV/Excel file" shortcut on the home page.
When using the "+" in the sidebar, select "Import CSV/Excel" to go to the CSV importer.
After loading your CSV file, Datablist lists the columns found. Every column is analyzed to detect its data type. On the column with your data to translate, select the "Long Text" data type if not already selected.
With "Long Text", you will have a better user experience managing your texts.
Review your data and import your CSV content into Datablist.
Step 2: Select the items to translate
Datablist offers "Enrichment" to interact with your data. You can enrich Facebook leads or clean email listings directly from your data.
Three translation providers are integrated into Datablist: Google Translate, Deepl Translate, and ChatGPT. You can use one of them, or try them all to find the best suited for your texts.
Important - Enrichments are only available for registered users. Sign up for free to start translating your CSV files.
Enrichments are available with the "Enrich" menu. Enrichments will run on all your items or on the selected items.
Translate all items
By default, the actions available with the "Enrich" button will run on all of your items. Translation actions will translate all of your listed items.
Select specific items
To translate only some of your items, click and move on the checkboxes while holding the click.
Step 3: Translate your CSV file
To list the machine translation providers available on Datablist, filter the enrichments by clicking on the "Translations" section.
Translate a CSV file using Google Translate
Select "Google Translate" in the listed enrichments to open the configuration drawer.
Before translating your CSV, you need to fill out the settings and define the input.
For the settings, you must define:
- Source Language - Optional - If not defined, the source language will be auto-detected. If your texts are in different languages, keep the source language unselected.
- Target Language - Mandatory - Select the target language. Language names are listed with English names. See supported languages.
Then, in "Input Property", select the property from your CSV file with the texts to translate. The other columns won't be translated. This is perfect to translate a single CSV column and preserve the others.
In this example, the book summaries are listed in the "Description" property. I select "Description".
Important:
Only Text/Long Text properties are translatable. Other data types are ignored.
Click on "Continue to outputs configuration" to define where the translated texts will be saved.
Here, click on the "+" button to create a new property in your collections.
Notes
Your source property (= column) will not be updated. The output property cannot be the input property as it would overwrite its data. You will be able to manage your properties in the last step to have the desired column names.
Once configured, click "Run on first 10 items" to run the translation enrichment on a test batch of 10 items.
This step is very useful to ensure your settings configuration is correct. Scroll the data table to see your translated texts. If the translations are not good, please review the "Source Language" or the input property.
If the translated texts are good for the ten first items, click "Run enrichment on all items" to process the remaining items.
Translate a CSV file using Deepl
Deepl is a machine translation service launched in 2017. Deepl is known for its accuracy compared to Google Translate. Under the hood, Deepl uses a different kind of neural network called convolutional neural networks (CNNs). This technology keeps the context as it processes long texts. So, Deepl is better suited for longer texts and Google Translate for texts with unrelated sentences.
To use Deepl, follow the same process as Google Translate but select the new Deepl action.
Translate a CSV file using ChatGPT (23 times cheaper!)
ChatGPT is a language model developed by OpenAI. One of its many uses is for translation. It comprehends the nuances of language, allowing it to grasp the context and meaning behind sentences. This contextual awareness enables ChatGPT to produce accurate and contextually appropriate translations.
ChatGPT comes in different models, each with a different pricing. On the 18th of July, 2024, OpenAI released the GPT-4o mini model. A very cost-effective model returning high-quality translations.
Compared with Google Translate or Deepl, ChatGPT is 23 times cheaper and the translations are of almost equivalent quality to Google Translate. Deepl is still first in translation quality but by a small margin.
I recommend you give it a try on a subset of your items to evaluate the translation quality. If like me you find the results accurate, ChatGPT is an economical choice for CSV translations.
Step 4: Review your translations
Translated texts are saved in the "Translated Texts" output property.
Automatic translations are convenient but error-prone. A manual review of your translations is important. Check the translated texts to:
- Find names and domain-specific vocabulary - Google might translate product names or technical words. Note: words beginning with a capital letter are usually not translated.
- Review variable placeholders - If your texts are used for software applications, check variable placeholders that must not be translated.
To quickly read and edit your translated texts, press "Enter" on a cell to enter edit mode.
Or open the details drawer.
Step 5: Export your translations as CSV or Excel files
The final step is to export your data back to a file to reimport it into your application. See our documentation to export your data in CSV or Excel files.
To export your data with the same CSV columns as the imported file: edit your properties to match the original columns.
Click on the properties management button.
Then:
- Rename the source property
- Hide the source property
- Rename the new property with translated text to match the source name
See the video below for a step-by-step guide.
Everything you need to know about CSV file translations
How to get the number of characters to translate?
Datablist provides an easy tool to perform calculations on your data. Use the "Characters Count" calculation to get the sum of characters for a property.
Click on the property column, and select "Perform calculation".
Then, pick "Characters count".
The result is displayed directly in the drawer.
Notes
Calculations run on the current view. If you have filters or selected items, the calculation will run on those instead of all items.
How much do the translation providers cost?
Use the number of characters you got above to get an estimation of the translation cost.
Google Translate and Deepl cost 35 credits per 1000 characters. One credit costs $0.001 (less with higher top-ups).
So, the calcul is:
(NumberCharacters/1000)*35 = TotalCreditsCost
Then,
TotalCreditsCost*0.001= TotalCostDollar
For example, for 300 000 characters:
(300000/1000)*35*0.001 = $10.5
ChatGPT costs 1.5 credits per 1000 characters. With the same calcul, we get for 300 000 characters:
(300000/1000)*1.5*0.001 = $0.45
That is 23 times cheaper with ChatGPT (10.5/0.45 = 23).
Why use CSV files for bulk translations?
CSV is a format to store structured data using text files. In a CSV file, each line is a data record. And each record is made of fields separated by commas (or sometimes by semicolons ";" or tabular keys).
CSV files are a common way to transfer data between applications. Because it relies on text files, the CSV format is a simple way to export database or spreadsheet data. Any text editor can open it but you need a CSV editor to avoid errors when manipulating CSV files.
You can export your data in CSV in most applications, translate specific columns, and reimport the translated CSV file back into your application.
Any structured listings with text columns are perfect to be represented with CSV.
For example:
- Translating a Product Catalog with text columns such as title and description
- Translating blog articles using structured files with title and article content
- Translating a list of User Reviews
How many translations can I perform for free?
Datablist uses Google Translate API and Deepl which have a cost. As much as we would like to offer this service for free, we have to set a limit for free translations.
Free users can translate up to 50 items per month for free. Upgrade to the Standard plan for unlimited translations.
Can I use my own Deepl API key?
If you already have a Deepl account, you can use it directly on Datablist. An option to set your custom Deepl API Key is available for Standard users.
With a custom Deepl API Key, the action doesn't use Datablist credits. The action calls Deepl API on your behalf.
This option is perfect when you have large datasets to translate.
Human translation vs machine translation
Machine translation is powered by automated software that translates source content into target languages. The best automatic translation providers use artificial intelligence and machine learning to offer high-quality translations.
Human translations by professional translators shine on complex texts. When the context is important, the text is long, and when vocabulary is technical or specific, machine translation is surpassed.
I like hybrid translation, combining machine translation with human review and rewrite. The hybrid translation is perfect for webpage internationalization.
What languages are supported?
Datablist relies on Google Translate API, Deepl API, and ChatGPT API for automatic translations. The accepted languages for Google Translate are listed on this page. At the time of writing, 111 languages are available. Translations from and to any of the listed languages are possible.
Deepl supports fewer languages. Currently 28 languages are available.