AI-Powered Job Listing Scraper

This automation extracts job listings from a list of websites defined in a Google Sheet. It uses a multi-layered approach to find job titles, then writes the collected data back into the same Google Sheet on a separate tab for easy analysis.
This template automates the process of web scraping for job openings. Its workflow is as follows:
  1. Read Input: It starts by reading a list of company career page URLs from a specified tab in your Google Sheet.
  2. Multi-Tier Extraction: For each URL, it attempts to extract job titles using a three-tier strategy to maximize success while minimizing cost:
    • Tier 1 (Fast & Free): It first tries to find job titles by quickly parsing the page's HTML, looking for common patterns in job listings.
    • Tier 2 (AI-Assisted): If the first tier fails, it sends the page's HTML to an AI model for a more intelligent analysis to identify job titles.
    • Tier 3 (Full Browser AI): As a last resort, it loads the page in a full cloud browser and uses a powerful AI vision model to analyze the rendered page, even handling sites that rely heavily on JavaScript.
  3. Data Aggregation: The automation collects all the job titles found across all the websites.
  4. Write Output: Finally, it creates a new tab in your original Google Sheet and writes all the extracted job titles, along with the company name and source URL, into a clean, organized table.
Usage Ideas
  • Automated Job Board: Aggregate job openings from dozens of competitor or industry-specific career pages into a single spreadsheet.
  • Market Research: Scrape product names, features, and prices from e-commerce sites for competitive analysis.
  • Lead Generation: Extract contact information or company details from lists of business websites.
  • Content Aggregation: Collect articles, blog posts, or press releases from various sources on a specific topic.
  • Real Estate Monitoring: Pull property listings from multiple real estate agency websites into a central database.
Customization Ideas
This template is designed to be highly adaptable to your specific needs. You have the flexibility to:
  • Specify Your Data Source: Easily connect your own Google Sheet by providing its URL.
  • Organize Your Way: Tell the automation which tab contains your list of websites and where you want the results to be saved. You can also specify which columns in your sheet contain the website name and URL.
  • Control the Scope: Adjust the maximum number of pages the scraper should navigate through on each site, giving you control over the depth of the search.
  • Target Any Content: While pre-configured for German job titles, the AI prompts and search patterns can be easily modified to extract any type of data (like product names, real estate listings, or news headlines) from websites in any language.
  • Customize Your Output: Change the column headers in the final output sheet to match your project's requirements.
Agent Inputs
Required Parameters
Name
Type
Default
googleSheetUrl
string
None
URL of the Google Sheet containing clinic URLs (input) and where job listings are written (output)
Optional Parameters
Name
Type
Default
inputSheetName
string
AGB Übersicht
Name of the tab containing clinic data (columns A=Status, B=Name, E=URL)
maxPages
number
5
Maximum pages to paginate per website. Use -1 to process all available pages.
outputSheetName
string
Job Listings
Name of the tab where job listings are written. Created if missing, overwritten each run.