An Impressum is a legally mandated statement in German-speaking countries, required in publications like books, newspapers, websites, and business correspondence. It must include details about the ownership and authorship of the document.
Most websites in German-speaking countries have an Impressum page with their contact information.
Scraping this data is perfect for lead generation. Impressum pages contain valuable data such as:
- Founder/CEO Name
- Job title
- Phone Number
- Email Address
- Postal Address
The information is available but not in a structured format. Until now, it was hard to extract automatically that information. But with AI, I'll show you how to extract those contact information in a structured format.
In this guide, I show you how to use Datablist AI Agent to scrape a list of websites and extract any information you want from the Impressum pages.
Step 1: Import your list of websites
First, import a CSV/Excel file with a list of websites you want to scrape. Datablist is a lead management tool with superpowers. One of them is the Datablist AI Agent. Our AI Agent understands text and can scrape website to find relevant data.
To import your CSV file, create an empty collection and click import. Or click on the "Start with a CSV/Excel file" button from the home screen.
This is my file imported. It contains two columns: the name of the company, and its website.
Step 2: Write a prompt that works
Now, we will ask an AI Agent to visit each website, find the Impressum page link, and then read the Impressum page to extract contact information.
Click on the "Enrich" menu and select "AI Agent".
We must write a prompt with the website URL to visit, and the data we want to extract. And we can add some tips to find the Impressum page.
Here is my prompt:
Visit {{WEBSITE}} and scrape the impressum page to extract the following information:
- Founder name
- Job Title
- Email Address
- Phone Number
The Impressum page is usually on the URL /impressum
You can ask for more (or less) data points. The website URL is defined as a placeholder. And Datablist will run a personalized prompt for each line.
To use a variable for the website, enter 2 brackets characters and select the Website property from your collection.
Step 3: Configure expected outputs
After the prompt, we need to configure the expected outputs. The Agent uses the output's name and description along with the prompt to understand the mission.
Here, we have:
- Founder Name - Description: The name of the founder. Empty if not found.
- Job Title - Description: The job title of the founder, when available.
- Email Address - Description: Email address found on the impressum page.
- Phone Number - Description: The company phone number.
Step 4: Add outputs to the collection
Click "Continue to outputs configuration". The expected outputs configured in the previous step appear here.
Select "+" to add a new property (=column) to your collection for each output.
You can also create a property to see the error message when the agent is not able to perform the task.
Step 5: Run the enrichment
The last step, click on "Instant run" to run the agent.
The "Error Msg" property shows a text when the agent is not able to reach the website or if no Impressum page is found.