Skip to main content
Parallel’s data integrations let you enrich datasets with web intelligence without leaving your existing data workflows. Whether you’re working with DataFrames in Python, SQL queries in a data warehouse, or analytics databases, there’s an integration that fits your stack.

How it works

All data integrations follow the same pattern:
  1. Define inputs: Specify which columns contain the data to research (company name, website, etc.)
  2. Define outputs: Describe what information you want to extract (“CEO name”, “Founding year”, etc.)
  3. Choose a processor: Select speed vs thoroughness based on your needs
  4. Get enriched data: Receive structured results with optional citations

Available integrations

Choosing an integration

IntegrationBest forProcessing model
SparkLarge-scale distributed processingUDF with concurrent processing per partition
BigQueryGoogle Cloud data warehousesRemote function with batched API calls
SnowflakeSnowflake data warehousesBatched UDTF (partition-based)
DuckDBLocal analytics, embedded databasesBatch processing (recommended) or SQL UDF
PolarsPython DataFrame workflowsBatch processing
SupabasePostgreSQL/Supabase applicationsEdge Function

Installation

All Python-based integrations are available via the parallel-web-tools package:
# Install with specific integration
pip install parallel-web-tools[polars]
pip install parallel-web-tools[duckdb]
pip install parallel-web-tools[spark]

# Install with all integrations
pip install parallel-web-tools[all]
For BigQuery and Snowflake, additional deployment steps are required to set up cloud functions and permissions. See the individual integration guides for details.

Common patterns

Input column mapping

All integrations use the same input mapping format—a dictionary where keys describe the data semantically and values reference your actual column names:
input_columns = {
    "company_name": "name",      # "name" is the column in your data
    "website": "domain",         # "domain" is the column in your data
    "headquarters": "location",  # "location" is the column in your data
}

Output column descriptions

Describe what you want to extract in plain language. Column names are automatically converted to valid identifiers:
output_columns = [
    "CEO name",                           # → ceo_name
    "Founding year (YYYY format)",        # → founding_year
    "Annual revenue (USD, most recent)",  # → annual_revenue
]

Next steps