A repository of bitesize articles, tips & tricks
(in both English and French) curated by Mirego’s team.

The fine line between DRY and unnecessary abstraction

We recently had to write a one-off script to update some data in a database. A simple “find and replace” in multiple columns across multiple tables.

We ultimately chose to do something like this:

def process! do
  update_value(User, :first_name)
  update_value(User, :last_name)
  update_value(Post, :title)
  update_value(Comment, :body)
  update_value(Article, :content)
  update_value(Article, :excerpt)
end

def update_value(schema, field, old_thing, new_thing) do
  schema
  |> Repo.all()
  |> Enum.each(fn
    value = Map.get(row, field)
    new_value = String.replace(value, old_thing, new_thing)

    row
    |> Changeset.change()
    |> Changeset.put_change(field, new_value)
    |> Repo.update!()
  end)
end

One might take a look at this code and say “Well, that’s not very DRY code, is it?” Shouldn’t we map out our schemas and fields in a nice list like so:

[
  %{schema: User, fields: [:first_name, :last_name]},
  %{schema: Post, fields: [:title]},
  %{schema: Comment, fields: [:body]},
  %{schema: Article, fields: [:content, :excerpt]}
]

and then iterate over it to make sure we don’t repeat ourselves?

@list [
  %{schema: User, fields: [:first_name, :last_name]},
  %{schema: Post, fields: [:title]},
  %{schema: Comment, fields: [:body]},
  %{schema: Article, fields: [:content, :excerpt]}
]

def process! do
  Enum.each(@list, fn %{schema: schema, fields: fields} ->
    schema
    |> Repo.all()
    |> Enum.each(fn
      changeset = Changeset.change(row)

      changeset = Enum.reduce(fields, changeset, fn field, acc ->
        value = Map.get(row, field)
        new_value = String.replace(value, old_thing, new_thing)

        Changeset.put_change(changeset, field, new_value)
      end)

      Repo.update!(changeset)
    end)
  end)
end

I’m not a big fan of the second one.

The complexity we needed to add feels unnecessary to me, especially for a one-off script where we don’t care about processing two fields in the same schema separately. It feels like an unecessary extra abstraction.

I think the first one strikes just the right balance of DRY:

  • The heavy lifting code is not repeated, it’s encapsulated in a nice function.
  • The other code is simple enough so it doesn’t feel dirty or hard to maintain.

But I guess these things are always a bit subjective 🙂