🔄 Sync HubSpot Company Records to S3 Using AWS Lambda and Step Functions

In this guide, I will show you how to build an AWS Lambda project that gets HubSpot company records and saves them to an S3 bucket. We use Step Functions to continue the batch process safely. This helps avoid Lambda timeout problems.

✅ What This Lambda Function Does

Reads lastmodifieddate from environment variables
Gets records in batches (for example, 100 at a time)
Stops before hitting the Lambda timeout limit
Saves each batch to S3
Uses pagination (offset) to get more data
Can continue using Step Functions

🗂️ Project Structure

lambda_hubspot_sync/
│
├── handler.py
├── requirements.txt
└── utils.py

🔧 `handler.py`

This is the main Lambda function.

import os
import time
from datetime import datetime, timedelta
from zoneinfo import ZoneInfo
from utils import get_company_records, write_to_s3

BATCH_SIZE = 100
TIME_LIMIT = 840  # Stop at 14 mins (Lambda max is 15 mins)
BUCKET_NAME = os.environ['BUCKET_NAME']

HUBSPOT_TZ = ZoneInfo("UTC") # Default: UTC+00:00

# Set your offset in minutes
offset_mins = int(os.getenv("OFFSET_MINUTES"))

# Calculate time X minutes ago in ISO 8601 format (UTC)
minutes_ago = datetime.now(HUBSPOT_TZ) - timedelta(minutes=offset_mins)

# Convert to UTC timezone
LAST_MODIFIED = minutes_ago.isoformat()

def lambda_handler(event, context):
    start_time = time.time()
    offset = event.get("offset", 0)

    while True:
        records, next_offset = get_company_records(LAST_MODIFIED, BATCH_SIZE, offset)

        if not records:
            print("No more records.")
            break

        filename = f"hubspot_companies_batch_{offset}.json"
        write_to_s3(records, BUCKET_NAME, filename)

        if not next_offset:
            break

        offset = next_offset

        if time.time() - start_time > TIME_LIMIT:
            print("Reached safe time limit.")
            return {
                "status": "incomplete",
                "next_offset": offset
            }

    return {
        "status": "complete"
    }

🔧 `utils.py`

This handles API calls and S3 uploads.

import os
import json
import boto3
import requests

HUBSPOT_API_KEY = os.environ['HUBSPOT_API_KEY']
S3 = boto3.client("s3")

def get_company_records(last_modified, limit, offset):
    url = "https://api.hubapi.com/crm/v3/objects/companies/search"
    headers = {
        "Authorization": f"Bearer {HUBSPOT_API_KEY}",
        "Content-Type": "application/json"
    }

    payload = {
        "filterGroups": [{
            "filters": [{
                "propertyName": "lastmodifieddate",
                "operator": "GTE",
                "value": last_modified
            }]
        }],
        "limit": limit,
        "after": offset
    }

    resp = requests.post(url, headers=headers, json=payload)
    data = resp.json()

    companies = data.get("results", [])
    next_offset = data.get("paging", {}).get("next", {}).get("after")

    return companies, next_offset

def write_to_s3(data, bucket, filename):
    S3.put_object(
        Bucket=bucket,
        Key=filename,
        Body=json.dumps(data, indent=2).encode("utf-8")
    )
    print(f"Wrote batch to {filename}")

📦 `requirements.txt`

This handles API calls and S3 uploads.

boto3
requests

🧪 Lambda Environment Variables

Key	Example Value
`BUCKET_NAME`	`your-s3-bucket-name`
`HUBSPOT_API_KEY`	`your-hubspot-app-token`
`OFFSET_MINUTES`	`2024-01-01T00:00:00.000Z`

🔁 Use Step Functions to Continue the Process

Sometimes, one Lambda function is not enough. We use AWS Step Functions to continue the process using the offset.

🗺️ Step Function Workflow

{
  "Comment": "State Machine to process HubSpot records in batches",
  "StartAt": "Get HubSpot Data",
  "States": {
    "Get HubSpot Data": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:your-region:your-account-id:function:YourLambdaFunctionName",
      "Parameters": {
        "offset.$": "$.offset"
      },
      "ResultPath": "$.lambdaResult",
      "Next": "Check If More Data"
    },
    "Check If More Data": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.lambdaResult.status",
          "StringEquals": "complete",
          "Next": "End"
        },
        {
          "Variable": "$.lambdaResult.status",
          "StringEquals": "incomplete",
          "Next": "Get HubSpot Data"
        }
      ]
    },
    "End": {
      "Type": "Succeed"
    }
  }
}

🧭 How to Set Up

Go to AWS Step Functions Console
Create a state machine and paste the JSON
Give permission to call the Lambda function
Done!

⏰ Trigger Step Function with CloudWatch

You can trigger the process every 5 minutes using this cron:

cron(0/5 * * * ? *)

Or trigger using events like log messages.

🔐 IAM Permissions

Make sure you have these permissions:

Step Functions Role

{
  "Effect": "Allow",
  "Action": "lambda:InvokeFunction",
  "Resource": "arn:aws:lambda:your-region:your-account-id:function:YourLambdaFunctionName"
}

Lambda Role

{
  "Effect": "Allow",
  "Action": [
    "s3:PutObject",
    "logs:CreateLogGroup",
    "logs:CreateLogStream",
    "logs:PutLogEvents",
    "lambda:InvokeFunction"
  ],
  "Resource": "*"
}

✅ Final Thoughts

This solution helps you:

Sync data from HubSpot in safe batches
Avoid timeout issues
Continue processing using Step Functions
Automate using CloudWatch rules

Let me know if you want to add error handling, custom filtering, or push this to a CI/CD pipeline next!