Introduction: Why do you need data enrichment?
Data enrichment is both critical and challenging. Having up-to-date and accurate information can make or break your professional networking and business intelligence efforts. Unfortunately, most data enrichment tools rely on stale or incomplete datasets. This problem becomes particularly glaring when trying to find LinkedIn profiles, a vital resource for recruiters, sales professionals, and researchers.
But what if there was a way to avoid these pitfalls? With Airtop, you can harness the power of fresh data and AI-driven automation to build tools that work with precision. In this tutorial, Iβll show you how to create a robust LinkedIn profile scraper using TypeScript, Node.js, and the Airtop SDK. By the end of this guide, you'll have a professional-grade application that can:
Read professional profiles from a CSV file.
Automatically search Google for LinkedIn profiles using AI to ensure accurate, up-to-date results.
Save enriched data to a new CSV file.
Prerequisites
Before we begin, ensure you have the following:
Node.js
npm (Node Package Manager)
An Airtop API key. Get one for free here!
Basic understanding of TypeScript and async programming
No time to go through the entire setup? I got you! You can download the entire project here and make the necessary changes in the README to have it all up and running quickly.
Step 1: Project Setup
Let's start by creating our project structure. We'll build our entire application incrementally, with each section building upon the last.
Initialize the Project
In your terminal of choice, run the following commands to create a directory, create a new Node.js project, and install TypeScript.
These commands also install the Airtop SDK for AI scrapping and Dotenv to ensure we keep our credentials safe.
mkdir linkedin-data-enrichment
cd linkedin-data-enrichment
npm init -y
npm install typescript ts-node @types/node @airtop/sdk dotenv
npx tsc --init
Project Structure
Now, create the following directory structure:
linkedin-data-enrichment/
β
βββ src/
β βββ index.ts
βββ data/
β βββ profiles.csv
βββ .env
βββ package.json
βββ tsconfig.json
Configuration and Type Definitions
We'll start by creating our complete interface and config implementation in src/index.ts. This approach allows us to develop the entire application in a single, cohesive file.
import { AirtopClient } from "@airtop/sdk";
import type {
ExternalSessionWithConnectionInfo,
SessionResponse,
WindowId,
WindowIdResponse
} from "@airtop/sdk/api";
import * as fs from 'fs/promises';
import * as path from 'path';
import dotenv from 'dotenv';
const CONFIG = {
BATCH_SIZE: 1,
MAX_RETRIES: 3,
RETRY_DELAY_MS: 1000,
PATHS: {
INPUT_FILE: 'data/profiles.csv',
OUTPUT_DIR: 'output',
OUTPUT_FILE: 'profiles_with_linked_in_profiles.csv'
}
} as const;
interface UserProfile {
firstName: string;
lastName: string;
email: string;
}
interface ProfileWithQuery extends UserProfile {
query: string;
}
interface ProfileWithLinkedInProfile extends ProfileWithQuery {
linkedInProfile: string;
}
Utility Functions
We'll add our utility functions to generate search queries and load profiles from our CSV file that will power our Google search:
const generateGoogleSearchQuery = (userProfile: UserProfile): string => {
const query = `${userProfile.firstName} ${userProfile.lastName} ${userProfile.email} linkedin`;
return `https://www.google.com/search?q=${encodeURIComponent(query)}`;
}
const fetchProfilesFromFile = async (): Promise<UserProfile[]> => {
const projectRoot = path.resolve(__dirname, '..');
const filePath = path.join(projectRoot, CONFIG.PATHS.INPUT_FILE);
const data = await fs.readFile(filePath, 'utf8');
const lines = data.split('\n');
return lines
.slice(1)
.filter(line => line.trim())
.map(line => {
const [email, firstName, lastName] = line.split(',');
return {
email: email.trim(),
firstName: firstName.trim(),
lastName: lastName.trim()
};
});
}
LinkedIn Profile Search Function
Now, we'll implement our core search function with retry logic. This function is the heart of our application and is where we spin up multiple Airtop browsers, start a Google search and use AI to filter the results for the type of information weβre looking for:
const searchForLinkedInProfile = async (
session: ExternalSessionWithConnectionInfo,
window: WindowId,
client: AirtopClient,
profile: ProfileWithQuery
): Promise<string | null> => {
const delay = (ms: number) => new Promise(resolve => setTimeout(resolve, ms));
for (let attempt = 1; attempt <= CONFIG.MAX_RETRIES; attempt++) {
try {
await client.windows.loadUrl(session.id, window.windowId, {
url: profile.query,
});
console.log(`Searching for ${profile.firstName} ${profile.lastName} on LinkedIn`);
const result = await client.windows.pageQuery(session.id, window.windowId, {
prompt: `You are tasked with retrieving a person's LinkedIn profile URL. Please locate the LinkedIn profile for the specified individual and return only the URL.
LinkedIn profile URLs begin with https://www.linkedin.com/in/ so use that to identify the profile. There may be profiles with country based subdomains like https://nl.linkedin.com/in/ that you should also use.
If there are multiple links, return the one that most closely matches the profile based on the email domain and the name.
Do not return any other text than the URL.
Do not return any urls corresponding to posts that may begin with https://www.linkedin.com/posts/
If you are unable to find the profile, return 'Error'`
});
return result.data.modelResponse;
} catch (error) {
if (attempt === CONFIG.MAX_RETRIES) {
console.error(`Failed to find profile after ${CONFIG.MAX_RETRIES} attempts:`, profile.email, error);
return null;
}
console.warn(`Attempt ${attempt} failed, retrying...`);
await delay(CONFIG.RETRY_DELAY_MS * attempt);
}
}
return null;
}
Batch Processing Functions
We'll add functions to handle batch processing. If there are many contacts to try and get information from, there will be an option to batch requests into multiple groups.
const runSequentialBatch = async (client: AirtopClient, profiles: ProfileWithQuery[], batchIndex: number) => {
console.log(`Running batch ${batchIndex}`);
let session: SessionResponse;
let window: WindowIdResponse;
try {
session = await client.sessions.create();
} catch (error) {
console.error("Error creating session", error);
return [];
}
try {
window = await client.windows.create(session.data.id);
} catch (error) {
console.error("Error creating window", error);
return [];
}
console.log("Created session and window for batch", batchIndex);
const profilesWithLinkedInProfiles: ProfileWithLinkedInProfile[] = [];
for (const profile of profiles) {
const linkedInProfile = await searchForLinkedInProfile(session.data, window.data, client, profile);
if (linkedInProfile) {
const result = {
...profile,
linkedInProfile
}
profilesWithLinkedInProfiles.push(result);
}
}
await client.sessions.terminate(session.data.id);
return profilesWithLinkedInProfiles;
}
const runBatchesInParallel = async (
client: AirtopClient,
batches: ProfileWithQuery[][]
): Promise<ProfileWithLinkedInProfile[]> => {
const promises = batches.map((batch, index) =>
runSequentialBatch(client, batch, index)
);
const results = await Promise.all(promises);
return results.flat();
}
Results Saving Function
Now, we create logic in our code to create another CSV file if one doesnβt exist to save the results we get from each browser instance from the above code.
const saveProfilesToFile = async (
profiles: ProfileWithLinkedInProfile[]
): Promise<void> => {
const projectRoot = path.resolve(__dirname, '..');
const outputDir = path.join(projectRoot, CONFIG.PATHS.OUTPUT_DIR);
const filePath = path.join(outputDir, CONFIG.PATHS.OUTPUT_FILE);
await fs.mkdir(outputDir, { recursive: true });
const csvHeaders = ['email', 'firstName', 'lastName', 'linkedInProfile'];
const csvRows = profiles.map(profile => [
profile.email,
profile.firstName,
profile.lastName,
profile.linkedInProfile,
]);
const csvContent = [
csvHeaders.join(','),
...csvRows.map(row => row.join(','))
].join('\n');
await fs.writeFile(filePath, csvContent);
console.log(`Saved ${profiles.length} profiles to ${filePath}`);
}
Main Execution Function
Finally, we'll create our main execution function to orchestrate the above code and call the functions based on our config.
const main = async () => {
console.time('Total Execution Time');
try {
const apiKey = process.env.AIRTOP_API_KEY;
if (!apiKey) {
throw new Error("AIRTOP_API_KEY is not set");
}
const client = new AirtopClient({ apiKey });
const profiles = await fetchProfilesFromFile();
console.log(`Loaded ${profiles.length} profiles`);
const profilesWithQueries = generateProfilesWithSearchQueries(profiles);
const batches: ProfileWithQuery[][] = [];
for (let i = 0; i < profilesWithQueries.length; i += CONFIG.BATCH_SIZE) {
batches.push(profilesWithQueries.slice(i, i + CONFIG.BATCH_SIZE));
}
const profilesWithLinkedInProfiles = await runBatchesInParallel(client, batches);
await saveProfilesToFile(profilesWithLinkedInProfiles);
console.log("\n=== Execution Summary ===");
console.log(`Total profiles processed: ${profiles.length}`);
console.log(`Successful matches: ${profilesWithLinkedInProfiles.length}`);
console.log(`Failed matches: ${profiles.length - profilesWithLinkedInProfiles.length}`);
} catch (error) {
console.error("Application failed:", error);
process.exit(1);
} finally {
console.timeEnd('Total Execution Time');
}
}
dotenv.config();
main().catch(console.error);
Environment Setup
Update the .env file in the project root with the following content:
AIRTOP_API_KEY=your_airtop_api_key_here
You can go here to get your API key for free.
Preparing Input Data
Create data/profiles.csv with the information from the people you want to get LinkedIn profiles from. Check out our example here if you need a little help :
email,firstName,lastName
john.doe@example.com,John,Doe
jane.smith@company.com,Jane,Smith
Running the Application
Back in your terminal, itβs time to get running.
# Install dependencies
npm install
# Build and run the application
node ./src/index.js
A word on Ethical Technology and Responsible Innovation
The true power of this tool lies not just in its technical capabilities but in its responsible application. Technologists and professionals have a critical responsibility to use automation technologies with integrity. This means respecting platform terms of service, protecting individual privacy, and ensuring that our innovations serve human needs without compromising ethical standards.
Conclusion
Building a LinkedIn data enrichment tool is more than just writing codeβit's about solving real-world professional networking challenges with intelligent, ethical technology. By leveraging TypeScript, Node.js, and the Airtop SDK, we've created a sophisticated solution that transforms manual profile searching into an automated, efficient process. Let us know how you use it and which improvements you make.
Happy coding! ππ©βπ»π¨βπ»