I was working on a feature that needs to insert in bulk a bunch of processed data into a PostgreSQL database for a given organization.
This looked quite simple, here's how I did it:
insert_all/3
bulk insert the data into the PostgresSQL databaseHere what it looked like:
defp persist_data(data, organization_ids) do
Repo.transaction(fn ->
Repo.delete_all(
from(
data in Data,
where: data.organization_id in ^organization_ids
)
)
Repo.insert_all(Data, data, placeholders: %{inserted_at: DateTime.utc_now()})
end)
end
Everything was working fine, until I ended up facing this error:
** (Postgrex.QueryError) postgresql protocol can not handle 187001 parameters, the maximum is 65535
(ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:756: Ecto.Adapters.SQL.raise_sql_call_error/1
(ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:663: Ecto.Adapters.SQL.insert_all/9
(ecto 3.7.1) lib/ecto/repo/schema.ex:58: Ecto.Repo.Schema.do_insert_all/7
(ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:1013: anonymous fn/3 in Ecto.Adapters.SQL.checkout_or_transaction/4
(db_connection 2.4.0) lib/db_connection.ex:1512: DBConnection.run_transact
It looks like the PostgreSQL protocol has a limit of 65 535 [int16] parameters that it can handle. Before we continue, I forgot to mention that the data that I'm processing come from an external source, therefore it can be of any size! With that new information, I remedy the problem with chunking the data and inserting it in batch, all that again wrapped in a single transaction:
@batch_size 1000
defp persist_data(data, organization_ids) do
Repo.transaction(fn ->
Repo.delete_all(
from(
data in Data,
where: data.organization_id in ^organization_ids
)
)
data
|> Enum.chunk_every(@batch_size)
|> Enum.each(fn data ->
Repo.insert_all(Data, data, placeholders: %{inserted_at: DateTime.utc_now()})
end)
end)
end
P.S. If you ever face performance problems batching your insert into your PostgreSQL database, you might be intereseted by this post talking about the unnest
PostgreSQL function.