🔄 Sync HubSpot Company Records to S3 Using AWS Lambda and Step Functions

In this guide, I will show you how to build an AWS Lambda project that gets HubSpot company records and saves them to an S3 bucket. We use Step Functions to continue the batch process safely. This helps avoid Lambda timeout problems.

✅ What This Lambda Function Does

  • Reads lastmodifieddate from environment variables
  • Gets records in batches (for example, 100 at a time)
  • Stops before hitting the Lambda timeout limit
  • Saves each batch to S3
  • Uses pagination (offset) to get more data
  • Can continue using Step Functions

🗂️ Project Structure

lambda_hubspot_sync/
│
├── handler.py
├── requirements.txt
└── utils.py

🔧 handler.py

This is the main Lambda function.

import os
import time
from utils import get_company_records, write_to_s3

BATCH_SIZE = 100
TIME_LIMIT = 840  # Stop at 14 mins (Lambda max is 15 mins)
BUCKET_NAME = os.environ['BUCKET_NAME']

HUBSPOT_TZ = ZoneInfo("UTC") # Default: UTC+00:00

# Set your offset in minutes
offset_mins = int(os.getenv("OFFSET_MINUTES"))

# Calculate time X minutes ago in ISO 8601 format (UTC)
minutes_ago = datetime.now(HUBSPOT_TZ) - timedelta(minutes=offset_mins)

# Convert to UTC timezone
LAST_MODIFIED = minutes_ago.isoformat()

def lambda_handler(event, context):
    start_time = time.time()
    offset = event.get("offset", 0)

    while True:
        records, next_offset = get_company_records(LAST_MODIFIED, BATCH_SIZE, offset)

        if not records:
            print("No more records.")
            break

        filename = f"hubspot_companies_batch_{offset}.json"
        write_to_s3(records, BUCKET_NAME, filename)

        if not next_offset:
            break

        offset = next_offset

        if time.time() - start_time > TIME_LIMIT:
            print("Reached safe time limit.")
            return {
                "status": "incomplete",
                "next_offset": offset
            }

    return {
        "status": "complete"
    }

🔧 utils.py

This handles API calls and S3 uploads.

import json
import boto3
import requests

HUBSPOT_API_KEY = os.environ['HUBSPOT_API_KEY']
S3 = boto3.client("s3")

def get_company_records(last_modified, limit, offset):
    url = "https://api.hubapi.com/crm/v3/objects/companies/search"
    headers = {
        "Authorization": f"Bearer {HUBSPOT_API_KEY}",
        "Content-Type": "application/json"
    }

    payload = {
        "filterGroups": [{
            "filters": [{
                "propertyName": "lastmodifieddate",
                "operator": "GTE",
                "value": last_modified
            }]
        }],
        "limit": limit,
        "after": offset
    }

    resp = requests.post(url, headers=headers, json=payload)
    data = resp.json()

    companies = data.get("results", [])
    next_offset = data.get("paging", {}).get("next", {}).get("after")

    return companies, next_offset

def write_to_s3(data, bucket, filename):
    S3.put_object(
        Bucket=bucket,
        Key=filename,
        Body=json.dumps(data, indent=2).encode("utf-8")
    )
    print(f"Wrote batch to {filename}")

📦 requirements.txt

This handles API calls and S3 uploads.

boto3
requests

🧪 Lambda Environment Variables

KeyExample Value
BUCKET_NAMEyour-s3-bucket-name
HUBSPOT_API_KEYyour-hubspot-app-token
OFFSET_MINUTES2024-01-01T00:00:00.000Z

🔁 Use Step Functions to Continue the Process

Sometimes, one Lambda function is not enough. We use AWS Step Functions to continue the process using the offset.

🗺️ Step Function Workflow

{
  "Comment": "State Machine to process HubSpot records in batches",
  "StartAt": "Get HubSpot Data",
  "States": {
    "Get HubSpot Data": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:your-region:your-account-id:function:YourLambdaFunctionName",
      "Parameters": {
        "offset.$": "$.offset"
      },
      "ResultPath": "$.lambdaResult",
      "Next": "Check If More Data"
    },
    "Check If More Data": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.lambdaResult.status",
          "StringEquals": "complete",
          "Next": "End"
        },
        {
          "Variable": "$.lambdaResult.status",
          "StringEquals": "incomplete",
          "Next": "Get HubSpot Data"
        }
      ]
    },
    "End": {
      "Type": "Succeed"
    }
  }
}

🧭 How to Set Up

  1. Go to AWS Step Functions Console
  2. Create a state machine and paste the JSON
  3. Give permission to call the Lambda function
  4. Done!

⏰ Trigger Step Function with CloudWatch

You can trigger the process every 5 minutes using this cron:

cron(0/5 * * * ? *)

Or trigger using events like log messages.

🔐 IAM Permissions

Make sure you have these permissions:

Step Functions Role

{
  "Effect": "Allow",
  "Action": "lambda:InvokeFunction",
  "Resource": "arn:aws:lambda:your-region:your-account-id:function:YourLambdaFunctionName"
}

Lambda Role

{
  "Effect": "Allow",
  "Action": [
    "s3:PutObject",
    "logs:CreateLogGroup",
    "logs:CreateLogStream",
    "logs:PutLogEvents",
    "lambda:InvokeFunction"
  ],
  "Resource": "*"
}

✅ Final Thoughts

This solution helps you:

  • Sync data from HubSpot in safe batches
  • Avoid timeout issues
  • Continue processing using Step Functions
  • Automate using CloudWatch rules

Let me know if you want to add error handling, custom filtering, or push this to a CI/CD pipeline next!

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.