Guides

Developer guide

Shopify Bulk Operations in Rails

A Rails implementation guide for Shopify Bulk Operations covering job orchestration, JSONL downloads, polling versus webhooks, and the service boundaries that make large syncs maintainable.

Updated March 12, 2026
16 min read
Editorial note: This guide assumes your Rails app already stores offline Admin API access tokens per shop and runs background jobs such as Sidekiq, Solid Queue, or GoodJob.

Why Bulk Operations belong in the backend

Shopify Bulk Operations are not a clever frontend optimization. They are a durable backend workflow. That distinction matters because the moment you stop treating them like “just another API call”, your architecture gets better and your on-call future self becomes slightly less cursed.

Shopify explicitly recommends offline access tokens for bulk query work because online tokens expire after 24 hours, while bulk work can continue far longer. Shopify also states that direct API access from admin UI extensions does not support bulk operations. In practice, that pushes the responsibility to your backend, which is exactly where it belongs anyway.

“Subscribing to the webhook topic is recommended over polling.”

The practical rule

Bulk Operations are a state machine, not a request. Model them like durable work and Rails becomes a great fit.

The backend responsibilities are the real story:

  • store the offline token and API version per shop
  • submit and track operation IDs
  • handle retries and partial failure
  • download result files before signed URLs expire
  • stream JSONL safely instead of loading it all into memory
  • resume processing from checkpoints after crashes or deploys
  • write metrics and operator-visible status

None of that belongs in a controller action that hopes for the best. Hope is not a queueing strategy. It is barely a strategy for ordering lunch.

When Bulk Operations beat normal pagination in Rails

Bulk queries are not always the right tool. They are the right tool when the data volume or operational shape makes synchronous pagination painful.

ApproachBest forMain advantageMain caution
Normal GraphQL query with cursorsinteractive screens, small exports, short jobssimple control flow and better immediate error feedbackslow or fragile for very large datasets
Bulk querycatalog syncs, audit jobs, historical backfills, full reindexingShopify executes the heavy read asynchronously and returns JSONLyou must build tracking, download, and processing pipeline code
Bulk mutationlarge imports or repeated write operationsfar fewer client-side throttling concernsline order is not a safe dependency and input size has limits

As a rule of thumb, Bulk Operations become attractive when one or more of these are true:

  • you need a full-shop sync, not just “the next page”
  • the job is okay being asynchronous
  • you need better operational durability than a long loop of paginated requests
  • the output is naturally processed in batches or streams
  • the work belongs in a background job and not in a request-response cycle

Shopify’s bulk query model also has structural rules. A bulk query must include at least one connection field. Shopify describes bulk query execution as a single top-level field query, supports up to five connections, and allows nesting to a maximum depth of two. That matters because you should design queries specifically for bulk export, not blindly copy whatever your admin screen already asks for.

Another important nuance: bulk execution itself is not charged the same way as the equivalent synchronous query body. You still pay the normal cost for the mutation or status request you send, but Shopify runs the heavy bulk execution asynchronously on their side. That is why Bulk Operations feel weirdly cheap to start and weirdly expensive to ignore once the JSONL truck arrives at your backend.

A Rails bulk-sync model that survives reality

The clean implementation is to promote bulk work to a first-class model in your database. Do not hide everything inside one giant service object called SyncEverythingService. That file name is usually a cry for help.

A durable record gives you:

  • a stable place to store the Shopify bulk operation ID
  • explicit states and timestamps
  • checkpoint metadata
  • operator-visible progress
  • a resume point after worker crashes
  • a clean boundary between orchestration and domain processing
# db/schema.rb excerpt
create_table :shopify_bulk_syncs do |t|
  t.references :shop, null: false, foreign_key: true
 
  t.string  :kind, null: false              # query | mutation
  t.string  :purpose, null: false           # products_full_sync | orders_backfill | product_import
  t.string  :state, null: false             # queued | submitting | running | downloading | processing | completed | failed | canceled
 
  t.string  :shopify_bulk_operation_id
  t.string  :api_version, null: false
 
  t.text    :query_text
  t.text    :mutation_text
  t.string  :staged_upload_path
 
  t.text    :result_url
  t.text    :partial_data_url
  t.string  :error_code
 
  t.bigint  :object_count, default: 0, null: false
  t.bigint  :root_object_count, default: 0, null: false
  t.bigint  :processed_lines_count, default: 0, null: false
  t.bigint  :last_processed_line_number, default: 0, null: false
 
  t.jsonb   :metadata, default: {}, null: false
  t.datetime :submitted_at
  t.datetime :started_at
  t.datetime :completed_at
  t.datetime :failed_at
 
  t.timestamps
end
 
add_index :shopify_bulk_syncs, [:shop_id, :state]
add_index :shopify_bulk_syncs, :shopify_bulk_operation_id, unique: true

I like a state machine that looks roughly like this:

StateWhat it meansOwner
queuedrecord exists, not yet submitted to Shopifyscheduler or controller
submittingworker is creating the operationsubmit job
runningShopify accepted it and is executingpoller or webhook handler
downloadingresult URL resolved, file download starteddownload job
processingJSONL is being streamed into handlersprocessor job
completedall lines processed and post-work finishedfinisher job
failedShopify failure or app-side processing failureany stage
canceledcanceled by app or Shopifyoperator or platform

The important design move is separation of concerns:

  • submitter creates the bulk operation
  • tracker refreshes status from Shopify
  • downloader resolves and fetches the signed file URL
  • processor streams JSONL into domain-specific handlers
  • finisher writes metrics and schedules follow-up work

That split looks verbose when the project is small. Then the first worker crash happens, the result URL expires, and suddenly verbosity starts looking suspiciously like good taste.

Running bulk queries from Rails

The query path is usually the first one teams implement. The good news is that Shopify makes the submission itself straightforward. The bad news is that the submission itself is not the hard part.

Start with a thin GraphQL client wrapper that knows the shop domain, access token, and API version. Keep it boring. Boring infrastructure is underrated.

# app/services/shopify_admin_graphql/client.rb
module ShopifyAdminGraphql
  class Client
    def initialize(shop:)
      @shop = shop
    end
 
    def post(query:, variables: {})
      uri = URI("https://#{@shop.shopify_domain}/admin/api/#{@shop.admin_api_version}/graphql.json")
 
      http = Net::HTTP.new(uri.host, uri.port)
      http.use_ssl = true
 
      request = Net::HTTP::Post.new(uri.request_uri)
      request["Content-Type"] = "application/json"
      request["X-Shopify-Access-Token"] = @shop.offline_access_token
      request.body = JSON.dump({ query: query, variables: variables })
 
      response = http.request(request)
      body = JSON.parse(response.body)
 
      raise "Shopify GraphQL HTTP #{response.code}: #{response.body}" unless response.is_a?(Net::HTTPSuccess)
 
      body
    end
  end
end

Then wrap the actual bulk query submission in an application service that records the returned operation ID and handles Shopify user errors cleanly.

# app/services/shopify_bulk_ops/start_query.rb
module ShopifyBulkOps
  class StartQuery
    MUTATION = <<~GRAPHQL
      mutation StartBulkQuery($query: String!) {
        bulkOperationRunQuery(query: $query) {
          bulkOperation {
            id
            status
          }
          userErrors {
            field
            message
          }
        }
      }
    GRAPHQL
 
    def initialize(sync:)
      @sync = sync
      @shop = sync.shop
      @client = ShopifyAdminGraphql::Client.new(shop: @shop)
    end
 
    def call
      @sync.with_lock do
        @sync.update!(state: "submitting", submitted_at: Time.current)
 
        response = @client.post(
          query: MUTATION,
          variables: { query: @sync.query_text }
        )
 
        payload = response.dig("data", "bulkOperationRunQuery")
        user_errors = payload.fetch("userErrors")
 
        if user_errors.any?
          @sync.update!(
            state: "failed",
            metadata: @sync.metadata.merge("submit_errors" => user_errors),
            failed_at: Time.current
          )
          return
        end
 
        bulk_operation = payload.fetch("bulkOperation")
 
        @sync.update!(
          shopify_bulk_operation_id: bulk_operation.fetch("id"),
          state: "running",
          started_at: Time.current
        )
      end
    end
  end
end

A good query design rule is to make bulk queries purpose-built for downstream processing. Do not ask for everything merely because storage is cheap. Storage is cheap right up until you are parsing ten million lines you never needed.

Ask for stable IDs first. Then ask for fields your downstream handlers actually use. For nested data, remember that bulk output is flattened into JSONL and child objects are linked back to parents with __parentId.

PRODUCTS_BULK_QUERY = <<~GRAPHQL
  {
    products {
      edges {
        node {
          id
          title
          updatedAt
          vendor
          status
          variants {
            edges {
              node {
                id
                sku
                price
                updatedAt
              }
            }
          }
        }
      }
    }
  }
GRAPHQL

Also, do one boring but important thing before you bulk anything: run a normal GraphQL query first. Shopify explicitly recommends this because bulk failures give worse feedback than synchronous query failures. That sounds dull until you save yourself an hour of debugging an ACCESS_DENIED caused by one field you forgot needed a scope.

Running bulk mutations from Rails

Bulk mutations are where teams go from “nice, this is powerful” to “why is there a staged upload in the middle of my otherwise respectable Tuesday?”

The flow is different from bulk queries:

  1. create a JSONL file where each line contains variables for one mutation execution
  2. call stagedUploadsCreate to get an upload target
  3. upload the JSONL file to the signed URL Shopify provides
  4. call bulkOperationRunMutation with your mutation string and staged upload path
  5. track completion and process the output JSONL

There are several non-obvious restrictions here. Shopify’s current docs say that bulk mutations can use any GraphQL Admin API mutation except the bulk-operation mutations themselves, the mutation is limited to one connection field, and the input JSONL file cannot exceed 100MB. Shopify also warns that the GraphQL Admin API does not serially process the JSONL lines, so line order is not a dependency you are allowed to romanticize.

# app/services/shopify_bulk_ops/start_mutation.rb
module ShopifyBulkOps
  class StartMutation
    STAGED_UPLOADS_CREATE = <<~GRAPHQL
      mutation {
        stagedUploadsCreate(input: [{
          resource: BULK_MUTATION_VARIABLES,
          filename: "bulk_op_vars.jsonl",
          mimeType: "text/jsonl",
          httpMethod: POST
        }]) {
          stagedTargets {
            url
            resourceUrl
            parameters {
              name
              value
            }
          }
          userErrors {
            field
            message
          }
        }
      }
    GRAPHQL
 
    RUN_MUTATION = <<~GRAPHQL
      mutation RunBulkMutation($mutation: String!, $stagedUploadPath: String!) {
        bulkOperationRunMutation(
          mutation: $mutation,
          stagedUploadPath: $stagedUploadPath
        ) {
          bulkOperation {
            id
            status
          }
          userErrors {
            field
            message
          }
        }
      }
    GRAPHQL
 
    def initialize(sync:, jsonl_path:)
      @sync = sync
      @shop = sync.shop
      @jsonl_path = jsonl_path
      @client = ShopifyAdminGraphql::Client.new(shop: @shop)
    end
 
    def call
      staged_target = create_staged_upload!
      upload_jsonl!(staged_target)
      staged_upload_path = extract_staged_upload_path(staged_target)
 
      response = @client.post(
        query: RUN_MUTATION,
        variables: {
          mutation: @sync.mutation_text,
          stagedUploadPath: staged_upload_path
        }
      )
 
      payload = response.dig("data", "bulkOperationRunMutation")
      user_errors = payload.fetch("userErrors")
 
      if user_errors.any?
        @sync.update!(
          state: "failed",
          metadata: @sync.metadata.merge("submit_errors" => user_errors),
          failed_at: Time.current
        )
        return
      end
 
      @sync.update!(
        staged_upload_path: staged_upload_path,
        shopify_bulk_operation_id: payload.dig("bulkOperation", "id"),
        state: "running",
        started_at: Time.current
      )
    end
 
    private
 
    def create_staged_upload!
      response = @client.post(query: STAGED_UPLOADS_CREATE)
      payload = response.dig("data", "stagedUploadsCreate")
 
      raise "staged upload failed: #{payload['userErrors'].inspect}" if payload["userErrors"].any?
 
      payload.fetch("stagedTargets").first
    end
 
    def upload_jsonl!(target)
      uri = URI(target.fetch("url"))
      form_data = target.fetch("parameters").map { |p| [p.fetch("name"), p.fetch("value")] }
      file = UploadIO.new(File.open(@jsonl_path), "text/jsonl", File.basename(@jsonl_path))
 
      request = Net::HTTP::Post::Multipart.new(uri.path, Hash[form_data].merge("file" => file))
 
      http = Net::HTTP.new(uri.host, uri.port)
      http.use_ssl = true
      response = http.request(request)
 
      raise "JSONL upload failed: #{response.code} #{response.body}" unless response.is_a?(Net::HTTPSuccess) || response.code.to_i == 201
    end
 
    # Shopify expects the staged upload path, not the full signed URL.
    def extract_staged_upload_path(target)
      URI(target.fetch("resourceUrl")).path.sub(%r{^/}, "")
    end
  end
end

Your JSONL generator should be deterministic and boring. One line, one mutation input, stable formatting, no magical side effects.

File.open(path, "w") do |file|
  products.each do |product|
    line = {
      input: {
        id: product.shopify_gid,
        title: product.title,
        vendor: product.vendor
      }
    }
 
    file.puts(JSON.generate(line))
  end
end

The critical mental model is this: a bulk mutation is not an ordered script. It is a bag of mutation inputs Shopify will process asynchronously. If one line depends on the result of another, split the work into phases. Do not rely on file order. Shopify already told you not to. Believe them. They wrote the thing.

How to process JSONL results safely

JSONL handling is where good Bulk Operation implementations separate themselves from “works on my laptop” demos. Shopify’s docs explicitly show the right instinct here: stream the file line by line. Do not slurp the entire thing into memory unless your goal is to benchmark how fast a worker can become a smoke signal.

Shopify’s bulk result format is JSON Lines. Each line is an independent JSON object. For nested connections, children are emitted as separate lines after their parents and linked through __parentId. That means you should think in terms of a stream processor, not a giant object graph recreation pass unless you truly need that.

# app/services/shopify_bulk_ops/jsonl_processor.rb
module ShopifyBulkOps
  class JsonlProcessor
    def initialize(sync:, io:, handler:)
      @sync = sync
      @io = io
      @handler = handler
    end
 
    def call
      line_number = 0
 
      @io.each_line do |line|
        line_number += 1
        next if line_number <= @sync.last_processed_line_number
 
        payload = JSON.parse(line)
        @handler.call(payload)
 
        if (line_number % 1_000).zero?
          @sync.update!(
            processed_lines_count: line_number,
            last_processed_line_number: line_number
          )
        end
      end
 
      @sync.update!(
        processed_lines_count: line_number,
        last_processed_line_number: line_number
      )
    end
  end
end

The handler should be domain-specific, not generic. Generic processors sound elegant until every resource type needs special casing and your “generic” code turns into a museum of regrettable conditionals.

# app/services/shopify_bulk_ops/handlers/product_export_handler.rb
module ShopifyBulkOps
  module Handlers
    class ProductExportHandler
      def call(payload)
        case payload["id"]
        when /\Agid:\/\/shopify\/Product\//
          upsert_product(payload)
        when /\Agid:\/\/shopify\/ProductVariant\//
          upsert_variant(payload)
        else
          Rails.logger.info("Ignoring unsupported bulk payload: #{payload['id']}")
        end
      end
 
      private
 
      def upsert_product(payload)
        Product.upsert(
          {
            shopify_gid: payload.fetch("id"),
            title: payload["title"],
            vendor: payload["vendor"],
            status: payload["status"],
            updated_at: Time.current
          },
          unique_by: :shopify_gid
        )
      end
 
      def upsert_variant(payload)
        Variant.upsert(
          {
            shopify_gid: payload.fetch("id"),
            product_shopify_gid: payload["__parentId"],
            sku: payload["sku"],
            price: payload["price"],
            updated_at: Time.current
          },
          unique_by: :shopify_gid
        )
      end
    end
  end
end

A few practical rules make JSONL processing reliable:

  • upsert by stable identifiers such as Shopify GIDs
  • checkpoint frequently enough that restart cost is acceptable
  • keep each line handler idempotent
  • avoid one giant transaction for the entire file
  • track processing metrics separately from Shopify execution metrics
  • handle partialDataUrl as a legitimate recovery path, not an afterthought

Also remember that Shopify’s signed result URLs expire after seven days. That is generous if your worker starts quickly and surprisingly short if a job sits forgotten behind a backlog, a deploy bug, and two meetings that should have been emails.

Polling versus webhook completion

Both polling and webhook-driven completion are valid. They are not equally good forever.

Polling is fine when:

  • you are building the first version
  • you have low volume
  • you want a simpler control flow to prove the pipeline end to end

Webhooks are better when:

  • you care about wasted API calls
  • you run many bulk jobs
  • you want fast, event-driven wakeups
  • you want the tracker job to do less busy waiting

Shopify’s docs explicitly recommend subscribing to bulk_operations/finish instead of polling because it limits redundant API calls. The webhook fires when an operation completes, fails, or is canceled. One subtle but important detail: in the webhook payload, status and error_code are lowercase, so your enum mapping should not assume the uppercase values you see in GraphQL responses.

# app/jobs/shopify_bulk_ops/poll_status_job.rb
module ShopifyBulkOps
  class PollStatusJob < ApplicationJob
    queue_as :integrations
 
    QUERY = <<~GRAPHQL
      query BulkOperationById($id: ID!) {
        bulkOperation(id: $id) {
          id
          status
          errorCode
          objectCount
          rootObjectCount
          url
          partialDataUrl
          completedAt
        }
      }
    GRAPHQL
 
    def perform(sync_id)
      sync = ShopifyBulkSync.find(sync_id)
      return unless sync.state == "running"
 
      client = ShopifyAdminGraphql::Client.new(shop: sync.shop)
      response = client.post(query: QUERY, variables: { id: sync.shopify_bulk_operation_id })
 
      op = response.dig("data", "bulkOperation")
      raise "bulkOperation not found" if op.nil?
 
      sync.update!(
        object_count: op["objectCount"].to_i,
        root_object_count: op["rootObjectCount"].to_i,
        result_url: op["url"],
        partial_data_url: op["partialDataUrl"],
        error_code: op["errorCode"]
      )
 
      case op["status"]
      when "CREATED", "RUNNING", "CANCELING"
        self.class.set(wait: 1.minute).perform_later(sync.id)
      when "COMPLETED"
        sync.update!(state: "downloading", completed_at: Time.current)
        ShopifyBulkOps::DownloadAndProcessJob.perform_later(sync.id)
      when "FAILED"
        sync.update!(state: "failed", failed_at: Time.current)
        ShopifyBulkOps::DownloadAndProcessJob.perform_later(sync.id) if sync.partial_data_url.present?
      when "CANCELED"
        sync.update!(state: "canceled")
      else
        raise "Unhandled bulk status: #{op['status']}"
      end
    end
  end
end

On API version 2026-01 and later, Shopify also supports up to five concurrent bulk query operations and up to five concurrent bulk mutation operations per shop. That means your scheduler should stop assuming “one active bulk job per shop”. Model concurrency explicitly. A simple way is to count currently running operations per shop and kind before submitting new work.

Shopify also introduced bulkOperations for listing and filtering operations and bulkOperation(id:) for direct lookup, replacing the older currentBulkOperation pattern for modern API versions. In other words, your operator tooling can finally be a little less 2024.

Idempotency checkpoints and resume strategy

The hard production problem is not “can Rails start a bulk operation?” The hard problem is “can Rails survive a crash halfway through a 4.8 million line result file and resume without creating a small database crime scene?”

The answer is idempotency plus checkpoints.

1. Make line processing idempotent

Every processed line should be safe to replay. In practice that means upserts, dedupe keys, or domain rules that let you say “already handled, move on” without manual cleanup.

2. Checkpoint often enough to be useful

Store at least the last processed line number and domain counters. Checkpoint every few hundred or thousand lines depending on handler cost. Too frequent and you add overhead. Too rare and restart pain becomes theatrical.

3. Separate Shopify execution state from app processing state

Shopify can say the operation is COMPLETED while your app is still very much not completed. That is why the app-side state machine needs separate states like downloading and processing.

4. Persist enough metadata to debug later

Keep the operation ID, query or mutation text, purpose, API version, error code, counts, and timestamps. This is boring until a merchant says “your sync missed products yesterday” and you would quite like to answer with something better than a haunted stare.

5. Treat partial results as first-class

Shopify exposes partialDataUrl when an operation fails but still produced incomplete data. That is not merely a curiosity. It is often enough to salvage useful work, especially for large exports where partial progress still has value.

# app/jobs/shopify_bulk_ops/download_and_process_job.rb
module ShopifyBulkOps
  class DownloadAndProcessJob < ApplicationJob
    queue_as :integrations
 
    def perform(sync_id)
      sync = ShopifyBulkSync.find(sync_id)
      sync.update!(state: "processing")
 
      url = sync.result_url.presence || sync.partial_data_url
      raise "No result URL available" if url.blank?
 
      Tempfile.create(["shopify-bulk", ".jsonl"]) do |file|
        URI.open(url) do |remote_io|
          IO.copy_stream(remote_io, file)
        end
 
        file.rewind
 
        handler = ShopifyBulkOps::Handlers::ProductExportHandler.new
        processor = ShopifyBulkOps::JsonlProcessor.new(sync: sync, io: file, handler: handler)
        processor.call
      end
 
      sync.update!(state: "completed", completed_at: Time.current)
    rescue => e
      sync.update!(
        state: "failed",
        metadata: sync.metadata.merge(
          "processing_error" => {
            "class" => e.class.name,
            "message" => e.message
          }
        ),
        failed_at: Time.current
      )
      raise
    end
  end
end

The last detail people often miss is download timing. Because Shopify’s result URLs expire after seven days, queue starvation is now a data-loss risk. If your app lets bulk jobs sit around unprocessed for a week, that is no longer a bulk pipeline. That is decorative optimism.

Common mistakes that make bulk syncs miserable

  • Treating submission as the whole feature. Starting the operation is the easy 5 percent. Everything after that is the real system.

  • Using online tokens. Shopify recommends offline tokens for long-running bulk work for a reason.

  • Building one giant background job. When a single job does submit, monitor, download, parse, and apply, partial failure becomes annoying to recover from.

  • Reading the whole JSONL file into memory. Shopify’s own docs show the streaming pattern because whole-file reads scale badly.

  • Ignoring __parentId. Nested data is flattened. If your handler ignores the parent link, you will recreate an object graph badly and then resent the universe.

  • Assuming line order matters for bulk mutations. Shopify explicitly says the JSONL input is not serially processed.

  • Not handling partial results. Failed does not always mean useless.

  • Polling forever at a tiny interval. That is fine in dev. In production it becomes self-inflicted API noise.

  • Not upgrading operator tooling for 2026-01+. If your app still thinks one current operation tells the whole story, your visibility model is already behind.

  • Skipping a normal test query first. Synchronous GraphQL errors are usually easier to reason about than bulk failures.

The broad implementation pattern to standardize on is:

  1. store bulk work in a first-class model
  2. submit through a dedicated service
  3. track status with webhook-first, polling as fallback
  4. download signed results quickly
  5. stream JSONL into idempotent handlers
  6. checkpoint progress
  7. finish with metrics and follow-up jobs

That architecture is not fancy. It is better. Fancy is often just the stage before a future refactor with sad git history.

Best internal links

Sources and further reading

FAQ

When should I use Bulk Operations instead of pagination?

Use bulk queries when the dataset is large enough that normal cursor pagination becomes slow, fragile, or operationally expensive. If you only need a small slice of data for an interactive request, normal GraphQL queries are usually simpler.

Should a Rails app poll or use webhooks for completion?

Webhooks are the better default once the app is in production. Polling is fine for a first implementation, but webhook-driven completion reduces redundant API calls and gives cleaner job wakeups.

Can I run Bulk Operations directly from an admin UI extension?

No. Shopify’s direct API access for admin UI extensions does not support bulk operations, so the extension should call your backend and let Rails initiate and manage the work.

Related resources

Keep exploring the playbook

Guides

Shopify Admin GraphQL patterns in Rails

Production patterns for using the Shopify Admin GraphQL API from Rails, including service boundaries, pagination strategy, throttling, partial failure handling, and when to switch to bulk operations.

guidesShopify developerAdmin GraphQL API