Data Integrations

Parallel’s data integrations let you enrich datasets with web intelligence without leaving your existing data workflows. Whether you’re working with DataFrames in Python, SQL queries in a data warehouse, or analytics databases, there’s an integration that fits your stack.

How it works

All data integrations follow the same pattern:

Define inputs: Specify which columns contain the data to research (company name, website, etc.)
Define outputs: Describe what information you want to extract (“CEO name”, “Founding year”, etc.)
Choose a processor: Select speed vs thoroughness based on your needs
Get enriched data: Receive structured results with optional citations

Available integrations

Apache Spark

Distributed enrichment for large-scale data processing with PySpark UDFs

Google BigQuery

SQL-native remote functions for enrichment directly in BigQuery queries

Snowflake

SQL-native UDTF with batched processing via External Access Integration

DuckDB

Batch processing and SQL UDFs for local analytics databases

Polars

DataFrame-native enrichment with batch processing and LazyFrame support

Supabase

Edge Functions for enrichment in Supabase applications

Choosing an integration

Integration	Best for	Processing model
Spark	Large-scale distributed processing	UDF with concurrent processing per partition
BigQuery	Google Cloud data warehouses	Remote function with batched API calls
Snowflake	Snowflake data warehouses	Batched UDTF (partition-based)
DuckDB	Local analytics, embedded databases	Batch processing (recommended) or SQL UDF
Polars	Python DataFrame workflows	Batch processing
Supabase	PostgreSQL/Supabase applications	Edge Function

Installation

All Python-based integrations are available via the parallel-web-tools package:

# Install with specific integration
pip install parallel-web-tools[polars]
pip install parallel-web-tools[duckdb]
pip install parallel-web-tools[spark]

# Install with all integrations
pip install parallel-web-tools[all]

For BigQuery and Snowflake, additional deployment steps are required to set up cloud functions and permissions. See the individual integration guides for details.

Common patterns

Input column mapping

All integrations use the same input mapping format—a dictionary where keys describe the data semantically and values reference your actual column names:

input_columns = {
    "company_name": "name",      # "name" is the column in your data
    "website": "domain",         # "domain" is the column in your data
    "headquarters": "location",  # "location" is the column in your data
}

Output column descriptions

Describe what you want to extract in plain language. Column names are automatically converted to valid identifiers:

output_columns = [
    "CEO name",                           # → ceo_name
    "Founding year (YYYY format)",        # → founding_year
    "Annual revenue (USD, most recent)",  # → annual_revenue
]

Next steps

Choose a Processor

Select the right processor based on speed vs thoroughness requirements

Task API

Learn about the underlying Task API that powers all data integrations

Pricing

View detailed pricing for all processors and API endpoints

Web Tools

Web Agents

MCP

Integrations

Additional Resources

Data Integrations

How it works

Available integrations

Apache Spark

Google BigQuery

Snowflake

DuckDB

Polars

Supabase

Choosing an integration

Installation

Common patterns

Input column mapping

Output column descriptions

Next steps

Choose a Processor

Task API

Pricing

Web Tools

Web Agents

MCP

Integrations

Additional Resources

​How it works

​Available integrations

Apache Spark

Google BigQuery

Snowflake

DuckDB

Polars

Supabase

​Choosing an integration

​Installation

​Common patterns

​Input column mapping

​Output column descriptions

​Next steps

Choose a Processor

Task API

Pricing

How it works

Available integrations

Choosing an integration

Installation

Common patterns

Input column mapping

Output column descriptions

Next steps