Developer guide
Shopify Bulk Operations in Rails
A Rails implementation guide for Shopify Bulk Operations covering job orchestration, JSONL downloads, polling versus webhooks, and the service boundaries that make large syncs maintainable.
Why Bulk Operations belong in the backend
Shopify Bulk Operations are not a clever frontend optimization. They are a durable backend workflow. That distinction matters because the moment you stop treating them like “just another API call”, your architecture gets better and your on-call future self becomes slightly less cursed.
Shopify explicitly recommends offline access tokens for bulk query work because online tokens expire after 24 hours, while bulk work can continue far longer. Shopify also states that direct API access from admin UI extensions does not support bulk operations. In practice, that pushes the responsibility to your backend, which is exactly where it belongs anyway.
“Subscribing to the webhook topic is recommended over polling.”
The practical rule
Bulk Operations are a state machine, not a request. Model them like durable work and Rails becomes a great fit.
The backend responsibilities are the real story:
- store the offline token and API version per shop
- submit and track operation IDs
- handle retries and partial failure
- download result files before signed URLs expire
- stream JSONL safely instead of loading it all into memory
- resume processing from checkpoints after crashes or deploys
- write metrics and operator-visible status
None of that belongs in a controller action that hopes for the best. Hope is not a queueing strategy. It is barely a strategy for ordering lunch.
When Bulk Operations beat normal pagination in Rails
Bulk queries are not always the right tool. They are the right tool when the data volume or operational shape makes synchronous pagination painful.
| Approach | Best for | Main advantage | Main caution |
|---|---|---|---|
| Normal GraphQL query with cursors | interactive screens, small exports, short jobs | simple control flow and better immediate error feedback | slow or fragile for very large datasets |
| Bulk query | catalog syncs, audit jobs, historical backfills, full reindexing | Shopify executes the heavy read asynchronously and returns JSONL | you must build tracking, download, and processing pipeline code |
| Bulk mutation | large imports or repeated write operations | far fewer client-side throttling concerns | line order is not a safe dependency and input size has limits |
As a rule of thumb, Bulk Operations become attractive when one or more of these are true:
- you need a full-shop sync, not just “the next page”
- the job is okay being asynchronous
- you need better operational durability than a long loop of paginated requests
- the output is naturally processed in batches or streams
- the work belongs in a background job and not in a request-response cycle
Shopify’s bulk query model also has structural rules. A bulk query must include at least one connection field. Shopify describes bulk query execution as a single top-level field query, supports up to five connections, and allows nesting to a maximum depth of two. That matters because you should design queries specifically for bulk export, not blindly copy whatever your admin screen already asks for.
Another important nuance: bulk execution itself is not charged the same way as the equivalent synchronous query body. You still pay the normal cost for the mutation or status request you send, but Shopify runs the heavy bulk execution asynchronously on their side. That is why Bulk Operations feel weirdly cheap to start and weirdly expensive to ignore once the JSONL truck arrives at your backend.
A Rails bulk-sync model that survives reality
The clean implementation is to promote bulk work to a first-class model in your database.
Do not hide everything inside one giant service object called SyncEverythingService.
That file name is usually a cry for help.
A durable record gives you:
- a stable place to store the Shopify bulk operation ID
- explicit states and timestamps
- checkpoint metadata
- operator-visible progress
- a resume point after worker crashes
- a clean boundary between orchestration and domain processing
# db/schema.rb excerpt
create_table :shopify_bulk_syncs do |t|
t.references :shop, null: false, foreign_key: true
t.string :kind, null: false # query | mutation
t.string :purpose, null: false # products_full_sync | orders_backfill | product_import
t.string :state, null: false # queued | submitting | running | downloading | processing | completed | failed | canceled
t.string :shopify_bulk_operation_id
t.string :api_version, null: false
t.text :query_text
t.text :mutation_text
t.string :staged_upload_path
t.text :result_url
t.text :partial_data_url
t.string :error_code
t.bigint :object_count, default: 0, null: false
t.bigint :root_object_count, default: 0, null: false
t.bigint :processed_lines_count, default: 0, null: false
t.bigint :last_processed_line_number, default: 0, null: false
t.jsonb :metadata, default: {}, null: false
t.datetime :submitted_at
t.datetime :started_at
t.datetime :completed_at
t.datetime :failed_at
t.timestamps
end
add_index :shopify_bulk_syncs, [:shop_id, :state]
add_index :shopify_bulk_syncs, :shopify_bulk_operation_id, unique: trueI like a state machine that looks roughly like this:
| State | What it means | Owner |
|---|---|---|
queued | record exists, not yet submitted to Shopify | scheduler or controller |
submitting | worker is creating the operation | submit job |
running | Shopify accepted it and is executing | poller or webhook handler |
downloading | result URL resolved, file download started | download job |
processing | JSONL is being streamed into handlers | processor job |
completed | all lines processed and post-work finished | finisher job |
failed | Shopify failure or app-side processing failure | any stage |
canceled | canceled by app or Shopify | operator or platform |
The important design move is separation of concerns:
- submitter creates the bulk operation
- tracker refreshes status from Shopify
- downloader resolves and fetches the signed file URL
- processor streams JSONL into domain-specific handlers
- finisher writes metrics and schedules follow-up work
That split looks verbose when the project is small. Then the first worker crash happens, the result URL expires, and suddenly verbosity starts looking suspiciously like good taste.
Running bulk queries from Rails
The query path is usually the first one teams implement. The good news is that Shopify makes the submission itself straightforward. The bad news is that the submission itself is not the hard part.
Start with a thin GraphQL client wrapper that knows the shop domain, access token, and API version. Keep it boring. Boring infrastructure is underrated.
# app/services/shopify_admin_graphql/client.rb
module ShopifyAdminGraphql
class Client
def initialize(shop:)
@shop = shop
end
def post(query:, variables: {})
uri = URI("https://#{@shop.shopify_domain}/admin/api/#{@shop.admin_api_version}/graphql.json")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Post.new(uri.request_uri)
request["Content-Type"] = "application/json"
request["X-Shopify-Access-Token"] = @shop.offline_access_token
request.body = JSON.dump({ query: query, variables: variables })
response = http.request(request)
body = JSON.parse(response.body)
raise "Shopify GraphQL HTTP #{response.code}: #{response.body}" unless response.is_a?(Net::HTTPSuccess)
body
end
end
endThen wrap the actual bulk query submission in an application service that records the returned operation ID and handles Shopify user errors cleanly.
# app/services/shopify_bulk_ops/start_query.rb
module ShopifyBulkOps
class StartQuery
MUTATION = <<~GRAPHQL
mutation StartBulkQuery($query: String!) {
bulkOperationRunQuery(query: $query) {
bulkOperation {
id
status
}
userErrors {
field
message
}
}
}
GRAPHQL
def initialize(sync:)
@sync = sync
@shop = sync.shop
@client = ShopifyAdminGraphql::Client.new(shop: @shop)
end
def call
@sync.with_lock do
@sync.update!(state: "submitting", submitted_at: Time.current)
response = @client.post(
query: MUTATION,
variables: { query: @sync.query_text }
)
payload = response.dig("data", "bulkOperationRunQuery")
user_errors = payload.fetch("userErrors")
if user_errors.any?
@sync.update!(
state: "failed",
metadata: @sync.metadata.merge("submit_errors" => user_errors),
failed_at: Time.current
)
return
end
bulk_operation = payload.fetch("bulkOperation")
@sync.update!(
shopify_bulk_operation_id: bulk_operation.fetch("id"),
state: "running",
started_at: Time.current
)
end
end
end
endA good query design rule is to make bulk queries purpose-built for downstream processing. Do not ask for everything merely because storage is cheap. Storage is cheap right up until you are parsing ten million lines you never needed.
Ask for stable IDs first. Then ask for fields your downstream handlers actually use. For
nested data, remember that bulk output is flattened into JSONL and child objects are linked
back to parents with __parentId.
PRODUCTS_BULK_QUERY = <<~GRAPHQL
{
products {
edges {
node {
id
title
updatedAt
vendor
status
variants {
edges {
node {
id
sku
price
updatedAt
}
}
}
}
}
}
}
GRAPHQLAlso, do one boring but important thing before you bulk anything: run a normal GraphQL
query first. Shopify explicitly recommends this because bulk failures give worse feedback
than synchronous query failures. That sounds dull until you save yourself an hour of
debugging an ACCESS_DENIED caused by one field you forgot needed a scope.
Running bulk mutations from Rails
Bulk mutations are where teams go from “nice, this is powerful” to “why is there a staged upload in the middle of my otherwise respectable Tuesday?”
The flow is different from bulk queries:
- create a JSONL file where each line contains variables for one mutation execution
- call
stagedUploadsCreateto get an upload target - upload the JSONL file to the signed URL Shopify provides
- call
bulkOperationRunMutationwith your mutation string and staged upload path - track completion and process the output JSONL
There are several non-obvious restrictions here. Shopify’s current docs say that bulk mutations can use any GraphQL Admin API mutation except the bulk-operation mutations themselves, the mutation is limited to one connection field, and the input JSONL file cannot exceed 100MB. Shopify also warns that the GraphQL Admin API does not serially process the JSONL lines, so line order is not a dependency you are allowed to romanticize.
# app/services/shopify_bulk_ops/start_mutation.rb
module ShopifyBulkOps
class StartMutation
STAGED_UPLOADS_CREATE = <<~GRAPHQL
mutation {
stagedUploadsCreate(input: [{
resource: BULK_MUTATION_VARIABLES,
filename: "bulk_op_vars.jsonl",
mimeType: "text/jsonl",
httpMethod: POST
}]) {
stagedTargets {
url
resourceUrl
parameters {
name
value
}
}
userErrors {
field
message
}
}
}
GRAPHQL
RUN_MUTATION = <<~GRAPHQL
mutation RunBulkMutation($mutation: String!, $stagedUploadPath: String!) {
bulkOperationRunMutation(
mutation: $mutation,
stagedUploadPath: $stagedUploadPath
) {
bulkOperation {
id
status
}
userErrors {
field
message
}
}
}
GRAPHQL
def initialize(sync:, jsonl_path:)
@sync = sync
@shop = sync.shop
@jsonl_path = jsonl_path
@client = ShopifyAdminGraphql::Client.new(shop: @shop)
end
def call
staged_target = create_staged_upload!
upload_jsonl!(staged_target)
staged_upload_path = extract_staged_upload_path(staged_target)
response = @client.post(
query: RUN_MUTATION,
variables: {
mutation: @sync.mutation_text,
stagedUploadPath: staged_upload_path
}
)
payload = response.dig("data", "bulkOperationRunMutation")
user_errors = payload.fetch("userErrors")
if user_errors.any?
@sync.update!(
state: "failed",
metadata: @sync.metadata.merge("submit_errors" => user_errors),
failed_at: Time.current
)
return
end
@sync.update!(
staged_upload_path: staged_upload_path,
shopify_bulk_operation_id: payload.dig("bulkOperation", "id"),
state: "running",
started_at: Time.current
)
end
private
def create_staged_upload!
response = @client.post(query: STAGED_UPLOADS_CREATE)
payload = response.dig("data", "stagedUploadsCreate")
raise "staged upload failed: #{payload['userErrors'].inspect}" if payload["userErrors"].any?
payload.fetch("stagedTargets").first
end
def upload_jsonl!(target)
uri = URI(target.fetch("url"))
form_data = target.fetch("parameters").map { |p| [p.fetch("name"), p.fetch("value")] }
file = UploadIO.new(File.open(@jsonl_path), "text/jsonl", File.basename(@jsonl_path))
request = Net::HTTP::Post::Multipart.new(uri.path, Hash[form_data].merge("file" => file))
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
response = http.request(request)
raise "JSONL upload failed: #{response.code} #{response.body}" unless response.is_a?(Net::HTTPSuccess) || response.code.to_i == 201
end
# Shopify expects the staged upload path, not the full signed URL.
def extract_staged_upload_path(target)
URI(target.fetch("resourceUrl")).path.sub(%r{^/}, "")
end
end
endYour JSONL generator should be deterministic and boring. One line, one mutation input, stable formatting, no magical side effects.
File.open(path, "w") do |file|
products.each do |product|
line = {
input: {
id: product.shopify_gid,
title: product.title,
vendor: product.vendor
}
}
file.puts(JSON.generate(line))
end
endThe critical mental model is this: a bulk mutation is not an ordered script. It is a bag of mutation inputs Shopify will process asynchronously. If one line depends on the result of another, split the work into phases. Do not rely on file order. Shopify already told you not to. Believe them. They wrote the thing.
How to process JSONL results safely
JSONL handling is where good Bulk Operation implementations separate themselves from “works on my laptop” demos. Shopify’s docs explicitly show the right instinct here: stream the file line by line. Do not slurp the entire thing into memory unless your goal is to benchmark how fast a worker can become a smoke signal.
Shopify’s bulk result format is JSON Lines. Each line is an independent JSON object.
For nested connections, children are emitted as separate lines after their parents and
linked through __parentId. That means you should think in terms of a stream
processor, not a giant object graph recreation pass unless you truly need that.
# app/services/shopify_bulk_ops/jsonl_processor.rb
module ShopifyBulkOps
class JsonlProcessor
def initialize(sync:, io:, handler:)
@sync = sync
@io = io
@handler = handler
end
def call
line_number = 0
@io.each_line do |line|
line_number += 1
next if line_number <= @sync.last_processed_line_number
payload = JSON.parse(line)
@handler.call(payload)
if (line_number % 1_000).zero?
@sync.update!(
processed_lines_count: line_number,
last_processed_line_number: line_number
)
end
end
@sync.update!(
processed_lines_count: line_number,
last_processed_line_number: line_number
)
end
end
endThe handler should be domain-specific, not generic. Generic processors sound elegant until every resource type needs special casing and your “generic” code turns into a museum of regrettable conditionals.
# app/services/shopify_bulk_ops/handlers/product_export_handler.rb
module ShopifyBulkOps
module Handlers
class ProductExportHandler
def call(payload)
case payload["id"]
when /\Agid:\/\/shopify\/Product\//
upsert_product(payload)
when /\Agid:\/\/shopify\/ProductVariant\//
upsert_variant(payload)
else
Rails.logger.info("Ignoring unsupported bulk payload: #{payload['id']}")
end
end
private
def upsert_product(payload)
Product.upsert(
{
shopify_gid: payload.fetch("id"),
title: payload["title"],
vendor: payload["vendor"],
status: payload["status"],
updated_at: Time.current
},
unique_by: :shopify_gid
)
end
def upsert_variant(payload)
Variant.upsert(
{
shopify_gid: payload.fetch("id"),
product_shopify_gid: payload["__parentId"],
sku: payload["sku"],
price: payload["price"],
updated_at: Time.current
},
unique_by: :shopify_gid
)
end
end
end
endA few practical rules make JSONL processing reliable:
- upsert by stable identifiers such as Shopify GIDs
- checkpoint frequently enough that restart cost is acceptable
- keep each line handler idempotent
- avoid one giant transaction for the entire file
- track processing metrics separately from Shopify execution metrics
- handle
partialDataUrlas a legitimate recovery path, not an afterthought
Also remember that Shopify’s signed result URLs expire after seven days. That is generous if your worker starts quickly and surprisingly short if a job sits forgotten behind a backlog, a deploy bug, and two meetings that should have been emails.
Polling versus webhook completion
Both polling and webhook-driven completion are valid. They are not equally good forever.
Polling is fine when:
- you are building the first version
- you have low volume
- you want a simpler control flow to prove the pipeline end to end
Webhooks are better when:
- you care about wasted API calls
- you run many bulk jobs
- you want fast, event-driven wakeups
- you want the tracker job to do less busy waiting
Shopify’s docs explicitly recommend subscribing to bulk_operations/finish
instead of polling because it limits redundant API calls. The webhook fires when an
operation completes, fails, or is canceled. One subtle but important detail: in the
webhook payload, status and error_code are lowercase, so
your enum mapping should not assume the uppercase values you see in GraphQL responses.
# app/jobs/shopify_bulk_ops/poll_status_job.rb
module ShopifyBulkOps
class PollStatusJob < ApplicationJob
queue_as :integrations
QUERY = <<~GRAPHQL
query BulkOperationById($id: ID!) {
bulkOperation(id: $id) {
id
status
errorCode
objectCount
rootObjectCount
url
partialDataUrl
completedAt
}
}
GRAPHQL
def perform(sync_id)
sync = ShopifyBulkSync.find(sync_id)
return unless sync.state == "running"
client = ShopifyAdminGraphql::Client.new(shop: sync.shop)
response = client.post(query: QUERY, variables: { id: sync.shopify_bulk_operation_id })
op = response.dig("data", "bulkOperation")
raise "bulkOperation not found" if op.nil?
sync.update!(
object_count: op["objectCount"].to_i,
root_object_count: op["rootObjectCount"].to_i,
result_url: op["url"],
partial_data_url: op["partialDataUrl"],
error_code: op["errorCode"]
)
case op["status"]
when "CREATED", "RUNNING", "CANCELING"
self.class.set(wait: 1.minute).perform_later(sync.id)
when "COMPLETED"
sync.update!(state: "downloading", completed_at: Time.current)
ShopifyBulkOps::DownloadAndProcessJob.perform_later(sync.id)
when "FAILED"
sync.update!(state: "failed", failed_at: Time.current)
ShopifyBulkOps::DownloadAndProcessJob.perform_later(sync.id) if sync.partial_data_url.present?
when "CANCELED"
sync.update!(state: "canceled")
else
raise "Unhandled bulk status: #{op['status']}"
end
end
end
endOn API version 2026-01 and later, Shopify also supports up to five concurrent
bulk query operations and up to five concurrent bulk mutation operations per shop. That
means your scheduler should stop assuming “one active bulk job per shop”. Model concurrency
explicitly. A simple way is to count currently running operations per shop and kind before
submitting new work.
Shopify also introduced bulkOperations for listing and filtering operations
and bulkOperation(id:) for direct lookup, replacing the older
currentBulkOperation pattern for modern API versions. In other words, your
operator tooling can finally be a little less 2024.
Idempotency checkpoints and resume strategy
The hard production problem is not “can Rails start a bulk operation?” The hard problem is “can Rails survive a crash halfway through a 4.8 million line result file and resume without creating a small database crime scene?”
The answer is idempotency plus checkpoints.
1. Make line processing idempotent
Every processed line should be safe to replay. In practice that means upserts, dedupe keys, or domain rules that let you say “already handled, move on” without manual cleanup.
2. Checkpoint often enough to be useful
Store at least the last processed line number and domain counters. Checkpoint every few hundred or thousand lines depending on handler cost. Too frequent and you add overhead. Too rare and restart pain becomes theatrical.
3. Separate Shopify execution state from app processing state
Shopify can say the operation is COMPLETED while your app is still very much
not completed. That is why the app-side state machine needs separate states like
downloading and processing.
4. Persist enough metadata to debug later
Keep the operation ID, query or mutation text, purpose, API version, error code, counts, and timestamps. This is boring until a merchant says “your sync missed products yesterday” and you would quite like to answer with something better than a haunted stare.
5. Treat partial results as first-class
Shopify exposes partialDataUrl when an operation fails but still produced
incomplete data. That is not merely a curiosity. It is often enough to salvage useful work,
especially for large exports where partial progress still has value.
# app/jobs/shopify_bulk_ops/download_and_process_job.rb
module ShopifyBulkOps
class DownloadAndProcessJob < ApplicationJob
queue_as :integrations
def perform(sync_id)
sync = ShopifyBulkSync.find(sync_id)
sync.update!(state: "processing")
url = sync.result_url.presence || sync.partial_data_url
raise "No result URL available" if url.blank?
Tempfile.create(["shopify-bulk", ".jsonl"]) do |file|
URI.open(url) do |remote_io|
IO.copy_stream(remote_io, file)
end
file.rewind
handler = ShopifyBulkOps::Handlers::ProductExportHandler.new
processor = ShopifyBulkOps::JsonlProcessor.new(sync: sync, io: file, handler: handler)
processor.call
end
sync.update!(state: "completed", completed_at: Time.current)
rescue => e
sync.update!(
state: "failed",
metadata: sync.metadata.merge(
"processing_error" => {
"class" => e.class.name,
"message" => e.message
}
),
failed_at: Time.current
)
raise
end
end
endThe last detail people often miss is download timing. Because Shopify’s result URLs expire after seven days, queue starvation is now a data-loss risk. If your app lets bulk jobs sit around unprocessed for a week, that is no longer a bulk pipeline. That is decorative optimism.
Common mistakes that make bulk syncs miserable
Treating submission as the whole feature. Starting the operation is the easy 5 percent. Everything after that is the real system.
Using online tokens. Shopify recommends offline tokens for long-running bulk work for a reason.
Building one giant background job. When a single job does submit, monitor, download, parse, and apply, partial failure becomes annoying to recover from.
Reading the whole JSONL file into memory. Shopify’s own docs show the streaming pattern because whole-file reads scale badly.
Ignoring
__parentId. Nested data is flattened. If your handler ignores the parent link, you will recreate an object graph badly and then resent the universe.Assuming line order matters for bulk mutations. Shopify explicitly says the JSONL input is not serially processed.
Not handling partial results. Failed does not always mean useless.
Polling forever at a tiny interval. That is fine in dev. In production it becomes self-inflicted API noise.
Not upgrading operator tooling for 2026-01+. If your app still thinks one current operation tells the whole story, your visibility model is already behind.
Skipping a normal test query first. Synchronous GraphQL errors are usually easier to reason about than bulk failures.
The broad implementation pattern to standardize on is:
- store bulk work in a first-class model
- submit through a dedicated service
- track status with webhook-first, polling as fallback
- download signed results quickly
- stream JSONL into idempotent handlers
- checkpoint progress
- finish with metrics and follow-up jobs
That architecture is not fancy. It is better. Fancy is often just the stage before a future refactor with sad git history.
Best internal links
Sources and further reading
Shopify Dev: Perform bulk operations with the GraphQL Admin API
Shopify Dev: Bulk import data with the GraphQL Admin API
Shopify Dev: BulkOperation object reference
Shopify Dev: bulkOperations query reference
Shopify Dev: currentBulkOperation reference
Shopify Dev: bulk_operations/finish webhook payload reference
Shopify Dev: API limits
FAQ
When should I use Bulk Operations instead of pagination?
Use bulk queries when the dataset is large enough that normal cursor pagination becomes slow, fragile, or operationally expensive. If you only need a small slice of data for an interactive request, normal GraphQL queries are usually simpler.
Should a Rails app poll or use webhooks for completion?
Webhooks are the better default once the app is in production. Polling is fine for a first implementation, but webhook-driven completion reduces redundant API calls and gives cleaner job wakeups.
Can I run Bulk Operations directly from an admin UI extension?
No. Shopify’s direct API access for admin UI extensions does not support bulk operations, so the extension should call your backend and let Rails initiate and manage the work.
Related resources
Keep exploring the playbook
Shopify Admin GraphQL patterns in Rails
Production patterns for using the Shopify Admin GraphQL API from Rails, including service boundaries, pagination strategy, throttling, partial failure handling, and when to switch to bulk operations.
When to use Bulk Operations instead of pagination in Shopify apps
A decision guide for Shopify developers choosing between synchronous GraphQL pagination and Bulk Operations, with practical thresholds based on workload shape rather than generic 'large dataset' advice.
Calling a Rails API from a Shopify customer account extension
A practical guide to calling Rails endpoints from a Shopify Customer Account UI Extension, including session-token verification, endpoint design, and the requests that should not go through your backend at all.