A repository of bitesize articles, tips & tricks
(in both English and French) curated by Mirego’s team.

PostgreSQL bulk insert

I was working on a feature that needs to insert in bulk a bunch of processed data into a PostgreSQL database for a given organization.

This looked quite simple, here's how I did it:

  • Delete the previous data for the given organization
  • Use insert_all/3 bulk insert the data into the PostgresSQL database
  • Wrap that into a single transaction

Here what it looked like:

defp persist_data(data, organization_ids) do
  Repo.transaction(fn ->
    Repo.delete_all(
      from(
        data in Data,
        where: data.organization_id in ^organization_ids
      )
    )

    Repo.insert_all(Data, data, placeholders: %{inserted_at: DateTime.utc_now()})
  end)
end

Everything was working fine, until I ended up facing this error:

** (Postgrex.QueryError) postgresql protocol can not handle 187001 parameters, the maximum is 65535
    (ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:756: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:663: Ecto.Adapters.SQL.insert_all/9
    (ecto 3.7.1) lib/ecto/repo/schema.ex:58: Ecto.Repo.Schema.do_insert_all/7
    (ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:1013: anonymous fn/3 in Ecto.Adapters.SQL.checkout_or_transaction/4
    (db_connection 2.4.0) lib/db_connection.ex:1512: DBConnection.run_transact

It looks like the PostgreSQL protocol has a limit of 65 535 [int16] parameters that it can handle. Before we continue, I forgot to mention that the data that I'm processing come from an external source, therefore it can be of any size! With that new information, I remedy the problem with chunking the data and inserting it in batch, all that again wrapped in a single transaction:

@batch_size 1000

defp persist_data(data, organization_ids) do
  Repo.transaction(fn ->
    Repo.delete_all(
      from(
        data in Data,
        where: data.organization_id in ^organization_ids
      )
    )

    data
    |> Enum.chunk_every(@batch_size)
    |> Enum.each(fn data ->
      Repo.insert_all(Data, data, placeholders: %{inserted_at: DateTime.utc_now()})
    end)
  end)
end

P.S. If you ever face performance problems batching your insert into your PostgreSQL database, you might be intereseted by this post talking about the unnest PostgreSQL function.