We had a feature in a product that required us to insert and update lots of data on a user action. Here is how we transformed 1000 SQL queries into 1.
The way the code is structured is simple: An operations
table and a translations
table. When the user does an action, we create an operation
record and then process it to project changes in the translations
table.
operation = Repo.insert!(%Operation{action: "new", key: "A", text: "Foo"})
[new_translation] = ProcessOperation.process([operation])
The user action is typically a file upload:
{
"greeting": "Hello, {{username}}",
"success": "This is a success!",
"error": "Oops, something went wrong"
}
This generates a list of operations. The first optimization that was made was to generate a single INSERT INTO
statement for all the operations. So for the file above, we generate 3 operation
records in a single SQL query.
But to process the operations, we simply did a serie of SQL query because it could be an insert, an update, a delete, etc.
operation_1 = Repo.insert!(%Operation{action: "new", key: "A", text: "Foo"})
operation_2 = Repo.insert!(%Operation{action: "new", key: "B", text: "Bar"})
ProcessOperation.process([operation_1, operation_2])
#> INSERT INTO translations VALUES("A", "Foo")
#> INSERT INTO translations VALUES("B", "Bar")
First step was to optimize the insert
. INSERT INTO
is well supported by Ecto
so it was trivial to change the way ProcessOperation
handles operations. Instead of doing SQL query in the same loop as the operation were processed, we return instructions.
operation_1 = Repo.insert!(%Operation{action: "new", key: "A", text: "Foo"})
operation_2 = Repo.insert!(%Operation{action: "new", key: "B", text: "Bar"})
ProcessOperation.process([operation_1, operation_2])
#> [
# {:insert, Translation, %{key: "a", text: "Foo"}},
# {:insert, Translation, %{key: "b", text: "Bar"}}
# ]
With the list of instructions mapped, we reduce them by schema and batch the final SQL statements.
Repo.insert_all(Translation, [%{key: "a", text: "Foo"}, %{key: "b", text: "Bar"}])
The next step was to optimize the UPDATE
statements which is also trivial with Ecto
. The "simple" approach was to update a list of records with the same attributes.
operation_1 = Repo.insert!(%Operation{action: "mark_as_reviewed", translation_id: 1})
operation_2 = Repo.insert!(%Operation{action: "mark_as_reviewed", translation_id: 2})
ProcessOperation.process([operation_1, operation_2])
#> [
# {:update, Translation, 1, %{reviewed: true}},
# {:update, Translation, 2, %{reviewed: true}}
# ]
With the list of instructions, like the INSERT
, we can still group them by schema (and the attributes), aggregate the ids, so we can build our final SQL statement.
Repo.update_all(
from(t in Translation, where: t.id in ^ids),
[set: [reviewed: true]]
)
This is the tricky part. We want to batch update operations with specific values to each operation.
operation_1 = Repo.insert!(%Operation{action: "update", text: "Baz", translation_id: 1})
operation_2 = Repo.insert!(%Operation{action: "update", text: "Aux", translation_id: 2})
ProcessOperation.process([operation_1, operation_2])
#> [
# {:update_dynamic, Translation, 1, %{text: "Baz"}},
# {:update_dynamic, Translation, 2, %{text: "Aux"}}
# ]
The generated SQL statement is not supported natively by Ecto
. Using 2 tricks found on the Elixir forum, (UPDATE wit a join, and unnest
) we can do this:
ids = [1, 2]
text = ["Baz", "Aux"]
Repo.update_all(
from(
t in Translation,
join: values in fragment(
"SELECT * FROM unnest(?::integer[], ?::text[]) as t(id, text)",
^ids,
^text
),
on: values.id == t.id,
update: [set: [text: values.text]]
)
)
By separating the evaluation of an operation and the execution in SQL, we were able to batch a series of statement into a single SQL query. The concept was heavily influenced by the way Absinthe does its resolving of field, with the tagging :ok, :async, :middleware, etc. By delaying the "final" execution until the end of the chain, we can do pretty interesting stuff :)
Check out the complete (and a bit more complicated) code in this pull request