Guides

Developer guide

When to use Bulk Operations instead of pagination in Shopify apps

A decision guide for Shopify developers choosing between synchronous GraphQL pagination and Bulk Operations, with practical thresholds based on workload shape rather than generic 'large dataset' advice.

Updated March 12, 2026
15 min read
Editorial note: This guide is intentionally opinionated. Shopify documents the platform constraints. The thresholds and decision rules here are app-team heuristics built on top of those constraints.

What this decision is really about

Most teams frame this as a size question. That framing is too shallow.

The real question is whether your query is serving a person or serving a pipeline. If a merchant is waiting on a table, picker, dashboard, or search result, pagination is usually the native shape of the problem. If your app is trying to ingest, export, reconcile, backfill, or resync most of a dataset, then you are no longer doing UI work. You are doing systems work, and systems work should stop pretending it is a next-page button.

Shopify’s GraphQL Admin API is cursor-paginated, with PageInfo and a maximum of 250 resources per page. It is also cost-based, which means every page request consumes query cost and competes with the rest of your app’s traffic on that app-store bucket. Shopify explicitly recommends bulk operations for querying and fetching large amounts of data rather than trying to stretch single queries forever.

“To query and fetch large amounts of data, you should use bulk operations instead of single queries.”

The working model

Use pagination when the query serves a UI. Use bulk when the query serves a pipeline. Record count matters, but workload shape matters more.

Put differently, the deciding signal is not “is this catalog kind of big?” The deciding signal is “am I making page-by-page requests only because I need the whole thing anyway?” Once the answer becomes yes, pagination turns into polite technical debt. It still works. It is still valid. It is also increasingly ridiculous.

Use pagination for bounded interactive work

Pagination is the right default when a human is waiting and the list is naturally bounded. Merchant-facing tables, search results, setup pickers, admin dashboards, and “show me the latest 20 things” screens should almost always stay paginated.

That is not just because pagination is familiar. It is because it aligns with the user experience. The first page arrives quickly. The query shape is narrow. The app only pays cost for data it is actually showing. Cursor-based navigation is stable, and Shopify’s PageInfo model exists specifically for this kind of incremental traversal.

Pagination is also a strong fit for bounded operational jobs, especially when you are not traversing the whole connection. A common example is “pull recently updated orders every few minutes” or “walk a recent time window until there are no more changes.” If your job normally touches tens or a few hundreds of records, stores a durable checkpoint, and can resume from the last cursor or timestamp, bulk may be unnecessary ceremony.

When pagination is still the grown-up choice

  • A person needs the first screenful now, not a file later.
  • You only need a filtered working set, not the entire connection.
  • Your sync logic is incremental, narrow, and resumable.
  • Your query cost is predictable and low enough that throttling is rare.
  • Your nested data needs are shallow enough to keep orchestration simple.

The hidden superpower of pagination is not that it scales forever. It does not. Its superpower is that it is honest about partial retrieval. If the user needs page one, fetching page one is not a compromise. It is the job.

// Good pagination use case:
// render a merchant-facing "recent orders" table
 
query RecentOrders($after: String) {
  orders(first: 50, after: $after, sortKey: CREATED_AT, reverse: true) {
    nodes {
      id
      name
      displayFinancialStatus
      createdAt
      currentTotalPriceSet {
        shopMoney {
          amount
          currencyCode
        }
      }
    }
    pageInfo {
      hasNextPage
      endCursor
    }
  }
}

Nobody wants to click “Orders” and then wait for your app to launch an asynchronous export pipeline, write a JSONL file, parse it, hydrate a staging table, and finally admit that yes, order #1042 still exists. That is not architecture. That is performance art.

Use bulk for system-driven large reads

Bulk Operations are for work where the system needs most or all of a connection and where asynchronous delivery is acceptable. Initial imports, full resyncs, catalog exports, historical backfills, one-off migrations, denormalized warehouse feeds, and “rebuild the whole local mirror” jobs are classic bulk workloads.

Shopify’s bulk query flow exists precisely for this pattern. You submit a bulkOperationRunQuery, Shopify executes the query asynchronously, and the result is made available as JSONL. Shopify recommends webhooks over polling to detect completion, recommends offline access tokens because long-running jobs can outlive online tokens, and documents JSONL streaming because bulk results are intentionally designed for large-file processing rather than nested in-memory responses.

Bulk also changes the economic shape of the job. You stop paying the coordination tax of repeated page fetches, repeated throttle checks, repeated retry edges, and repeated “where was I?” bookkeeping across a long traversal. The work still exists, but it moves from client-side orchestration into Shopify’s asynchronous execution model.

“Subscribing to the webhook topic is recommended over polling.”

Bulk is usually the right call when

  • The app needs nearly all records, not just the first page or two.
  • The job is background work and no human is waiting on it.
  • You are repeatedly traversing an entire connection with pagination.
  • You need nested data where page-by-page fan-out becomes brittle.
  • The workload repeats across many shops, so operational efficiency matters.

Shopify’s current bulk-query guidance matters here. In API versions 2026-01 and higher, Shopify documents support for up to five concurrent bulk query operations per shop. Bulk query results remain available for seven days after completion. The bulk query must include at least one connection field, supports up to five connections, and has a maximum nesting depth of two levels. If the query does not complete within 10 days, Shopify marks it as failed.

That combination tells you what bulk is and what it is not. It is a powerful export mechanism for large reads. It is not a magical replacement for all query design, and it is not an excuse to submit a monster query assembled by a caffeinated raccoon.

mutation RunBulkProductsExport {
  bulkOperationRunQuery(
    query: """
    {
      products(query: "status:active") {
        edges {
          node {
            id
            title
            updatedAt
            vendor
            variants {
              edges {
                node {
                  id
                  sku
                  price
                  updatedAt
                }
              }
            }
          }
        }
      }
    }
    """
  ) {
    bulkOperation {
      id
      status
    }
    userErrors {
      field
      message
    }
  }
}

If that workload were implemented as paginated reads, you would be coordinating page traversal for products, coping with cost and throttling over time, and probably adding extra logic for nested variant hydration anyway. That is the moment to stop fighting the platform and let the bulk pipeline do its job.

Decision signals that matter more than record count

Teams love arguing about whether 5,000 records is “large.” That argument is fun in the same way a sand-filled shoe is fun. It produces motion, but not progress.

Record count matters, but it is not the best first discriminator. The better signals are about interaction model, retrieval coverage, orchestration cost, and repeatability.

SignalLeans paginationLeans bulkWhy it matters
Who is waiting?A person in a UIThe system in the backgroundAsync file generation is wrong for interactive paths and excellent for pipelines.
Coverage neededSmall filtered subsetMost or all recordsWhole-dataset work magnifies pagination coordination cost.
Retry shapeCheap to retry a pageCheaper to rerun a durable background jobBulk turns long traversals into one job lifecycle instead of hundreds of request lifecycles.
Nested dataShallow and boundedDeep enough to cause fan-out painNested pagination orchestration gets fragile fast.
Frequency across shopsOccasional or ad hocRepeated fleet-wide workloadSmall inefficiencies become real infrastructure costs when multiplied across merchants.
Throttle pressureRare and manageableConstant companion, like a sad little metronomeHeavy repeated page traversal burns cost budget that bulk avoids.

Heuristics that work well in real app teams

These are not Shopify hard limits. They are practical operator heuristics:

  • Stay with pagination if the job usually completes in a few pages and exists primarily to support a UI or a narrow incremental sync.

  • Strongly consider bulk if your job is expected to walk dozens of pages or more, especially when it does so just to end up with a local full copy anyway.

  • Switch to bulk early when the job needs parent and child records at scale, such as products plus variants or orders plus line items.

  • Switch to bulk when “resume from the next cursor” has become a mini-subsystem with throttling, retry, idempotency, and checkpoint logic that exists only because you are still paginating the universe.

Shopify’s documented limits reinforce this. Single GraphQL queries are still bounded by requested cost, including a hard maximum single-query cost of 1,000 points, while bulk operations are specifically designed for large reads and are documented as not being subject to those single-query max cost limits or the usual rate-limit model for single queries.

# Smell test:
# if this loop exists only because the app needs "everything",
# it is often a bulk candidate.
 
cursor = nil
 
loop do
  result = client.query(query: PRODUCTS_PAGE_QUERY, variables: { after: cursor })
  nodes = result.data.products.nodes
 
  break if nodes.empty?
 
  nodes.each { |product| upsert_product(product) }
 
  page_info = result.data.products.pageInfo
  break unless page_info.hasNextPage
 
  cursor = page_info.endCursor
end

There is nothing wrong with this loop when the workload is genuinely page-shaped. There is everything wrong with it when the app runs it across every shop every night and then acts surprised that the queue smells like throttle debt.

Concrete workloads and the right choice

Abstract rules help. Concrete examples help more.

WorkloadRecommendationWhy
Merchant UI showing the latest 25 ordersPaginationThe user needs the first result immediately and only a bounded subset is required.
Typeahead product picker during setupPaginationIt is interactive, filtered, and should retrieve only what the user can act on.
Initial catalog import for a newly installed appBulkThe app needs most of the dataset and the job is background-oriented.
Nightly rebuild of local product mirrorBulkWhole-dataset synchronization is pipeline work, not UI work.
Incremental sync of orders updated in the last 10 minutesUsually paginationA bounded recent window with durable checkpoints often does not need bulk.
One-time historical backfill of all orders since 2022BulkPage-by-page traversal adds coordination cost without user-facing benefit.
Export of products and variants to a warehouseBulkNested, large, and asynchronous by nature.
Admin screen with a filtered list of failed jobsPaginationBounded list, immediate feedback, simple retrieval path.

Notice what is missing from that table: a magic record-count cutoff. That is deliberate. A shop with only 2,000 products can still justify bulk if you need a complete daily export with nested variants. Meanwhile, a shop with 200,000 orders can still justify pagination for a screen that only shows the latest 20.

This is why “use bulk for large datasets” is technically true but strategically lazy. The actual decision is about job shape and operational economics.

A useful threshold ladder

  1. UI-first path: default to pagination until you have very strong evidence otherwise.

  2. Incremental background sync: start with pagination if the changed set is naturally small and resumable.

  3. Whole-dataset or nested export: start with bulk instead of first building a paginated crawler you will later regret.

  4. Fleet-wide recurring traversal: bias toward bulk earlier, because per-shop inefficiency multiplies hard.

Rails patterns for both paths

The implementation pattern should mirror the workload. Do not hide an architectural mismatch inside a service object with a confident name and a thousand-yard stare.

Pattern A: paginated service for bounded reads

class Shopify::RecentOrdersPage
  QUERY = <<~GRAPHQL
    query($after: String) {
      orders(first: 50, after: $after, sortKey: CREATED_AT, reverse: true) {
        nodes {
          id
          name
          createdAt
          displayFinancialStatus
        }
        pageInfo {
          hasNextPage
          endCursor
        }
      }
    }
  GRAPHQL
 
  def initialize(shop:)
    @shop = shop
    @client = ShopifyClient.for(shop)
  end
 
  def call(after: nil)
    response = @client.query(query: QUERY, variables: { after: after })
 
    {
      orders: response.data.orders.nodes,
      page_info: response.data.orders.pageInfo,
      cost: response.extensions&.dig("cost")
    }
  end
end

This kind of service is great for screens, bounded jobs, and incremental windows. Keep it simple. Surface cursor information and cost metadata. Do not quietly make it iterate through 800 pages “for convenience.” Convenience is how good abstractions become crimes.

Pattern B: bulk kickoff plus durable ingestion pipeline

class Shopify::StartProductsBulkExport
  MUTATION = <<~GRAPHQL
    mutation($query: String!) {
      bulkOperationRunQuery(query: $query) {
        bulkOperation {
          id
          status
        }
        userErrors {
          field
          message
        }
      }
    }
  GRAPHQL
 
  BULK_QUERY = <<~GRAPHQL
    {
      products(query: "status:active") {
        edges {
          node {
            id
            title
            updatedAt
            vendor
            variants {
              edges {
                node {
                  id
                  sku
                  price
                  updatedAt
                }
              }
            }
          }
        }
      }
    }
  GRAPHQL
 
  def initialize(shop:)
    @shop = shop
    @client = ShopifyClient.for(shop) # use offline token
  end
 
  def call!
    response = @client.query(query: MUTATION, variables: { query: BULK_QUERY })
    payload = response.data.bulkOperationRunQuery
 
    raise payload.userErrors.map(&:message).join(", ") if payload.userErrors.any?
 
    BulkSyncRun.create!(
      shop: @shop,
      shopify_bulk_operation_id: payload.bulkOperation.id,
      status: payload.bulkOperation.status.downcase
    )
  end
end

The important thing here is not the mutation itself. It is the boundary. Starting a bulk operation should create a durable local run record. Treat the Shopify bulk operation as an external async job with your own lifecycle, not as a floating promise you hope to remember.

Pattern C: finish webhook plus streaming JSONL import

require "json"
require "open-uri"
 
class Webhooks::Shopify::BulkOperationsFinishController < ApplicationController
  skip_before_action :verify_authenticity_token
 
  def create
    payload = JSON.parse(request.raw_post)
 
    run = BulkSyncRun.find_by!(
      shop_id: shop.id,
      shopify_bulk_operation_id: payload.fetch("admin_graphql_api_id")
    )
 
    if payload["status"] == "completed"
      ProcessBulkJsonlJob.perform_later(run.id, payload["url"])
      run.update!(status: "completed")
    else
      run.update!(
        status: payload["status"],
        error_code: payload["error_code"]
      )
    end
 
    head :ok
  end
 
  private
 
  def shop
    @shop ||= Shop.find_by!(shopify_domain: request.headers["X-Shopify-Shop-Domain"])
  end
end
 
class ProcessBulkJsonlJob < ApplicationJob
  queue_as :default
 
  def perform(run_id, url)
    run = BulkSyncRun.find(run_id)
 
    URI.open(url) do |io|
      io.each_line do |line|
        row = JSON.parse(line)
 
        case row["id"]
        when /\Agid:\/\/shopify\/Product\//
          upsert_product!(run.shop, row)
        when /\Agid:\/\/shopify\/ProductVariant\//
          upsert_variant!(run.shop, row)
        end
      end
    end
 
    run.update!(ingested_at: Time.current, status: "ingested")
  end
end

This is the bulk mental model in Rails: submit, persist local run state, react to finish, stream JSONL, and make ingestion idempotent. Shopify documents JSONL specifically so clients can parse line by line instead of loading the entire file into memory. For nested connections, the JSONL output includes __parentId so children can be related back to their parent records during reconstruction.

The backend rule

Bulk operations deserve first-class job orchestration. Do not tuck them inside a controller action that returns 200 and a prayer.

Common mistakes that waste weeks

  • Using bulk for interactive screens. If a user is waiting, the file-based async model is usually the wrong UX shape.

  • Using pagination for full imports because it was easier to prototype. Prototype debt becomes queue debt very quickly.

  • Ignoring query cost metadata. Shopify returns cost and throttle information with GraphQL responses. If your paginated sync is constantly near the limit, the system is telling you something.

  • Polling forever. Shopify recommends subscribing to the bulk_operations/finish webhook instead of leaning on redundant status checks.

  • Using online tokens for long-running bulk jobs. Shopify explicitly advises offline access tokens because online tokens can expire before the operation completes.

  • Reading the whole JSONL file into memory. The docs practically wave a large red flag at this. Stream it line by line.

  • Skipping normal-query validation before bulk. Shopify notes that query errors are easier to understand when you test the query normally first.

  • Treating nested bulk output as if it were normal nested JSON. It is not. Parent and child nodes are flattened into JSONL lines, with parent linkage handled through __parentId.

The most expensive mistake is not a syntax error. It is building the wrong retrieval model for the job and then compensating with retries, caches, checkpoints, throttling logic, side queues, and increasingly aggressive optimism.

A practical rule of thumb for app teams

Use pagination when the query is serving a UI. Use bulk when the query is serving a pipeline.

That rule is not mathematically perfect, but it is operationally strong. It prevents the two most common bad decisions:

  • making users wait on an async export system, and
  • making background jobs behave like next-page navigation.

A second rule helps when the first one feels too abstract:

If your Rails job paginates through an entire connection, throttles repeatedly, persists most or all records locally, and then repeats that behavior across many shops, the job is already asking for bulk. It may not be asking politely, but it is asking.

Shopify gives you both tools because they solve different shapes of work. Pagination is for bounded retrieval with immediate feedback. Bulk is for asynchronous, large-scale extraction. Mature Shopify apps use both, and they do not confuse them.

Best internal links

Sources and further reading

Sources checked on March 12, 2026. Bulk-query concurrency guidance changed in API version 2026-01, so older articles and code samples may still repeat the previous one-query-at-a-time rule.

FAQ

Should I switch to bulk just because a connection has many records?

Not automatically. Switch when the app needs most or all of the dataset as background work. A large connection shown interactively can still belong on pagination.

Is bulk faster than pagination?

For whole-dataset and nested export workloads, usually yes in total system effort. For the first result a user sees on screen, no. Bulk is asynchronous and therefore wrong for human-waiting paths.

Can bulk replace incremental syncs?

Not always. Incremental syncs over a small recent window can still fit pagination well, especially when you have a durable cursor or timestamp checkpoint.

What usually forces the switch in production?

Repeated full traversals, high query cost, constant throttling, brittle nested pagination loops, and jobs that exist for the system rather than for a person sitting in front of a screen.

Related resources

Keep exploring the playbook

Guides

Shopify Admin GraphQL patterns in Rails

Production patterns for using the Shopify Admin GraphQL API from Rails, including service boundaries, pagination strategy, throttling, partial failure handling, and when to switch to bulk operations.

guidesShopify developerAdmin GraphQL API
Guides

Shopify Bulk Operations in Rails

A Rails implementation guide for Shopify Bulk Operations covering job orchestration, JSONL downloads, polling versus webhooks, and the service boundaries that make large syncs maintainable.

guidesShopify developerbulk operations