Parallel Processing in Elixir

Published April 13, 2020

Reading time: 1 minutes.

Part of my work involves making sure content we publish fits our style. I wanted to automate parts of that process by identifying phrases and words we don’t want to see.

I wrote a scanner in Elixir, and my first version wasn’t that efficient:

  # iterate over the data. For each article, scan badwords
  # return new list with additional 'words' field
  defp find_badwords(data) do
    data
    |> Enum.map fn article ->
      words = determine(article[:content])
      %{title: article[:title], content: article[:content], words: words}
    end
  end

This iterates over each article, finds the bad words by calling the determine function, and returns a new data structure with the title, content, and the bad words so the rest of the program can do what it needs to do with them.

When I ran it on a set of 1000 pieces of content, it took around 20 seconds. It’s certainly faster than what I could do by myself manually, but I still thought I could make that better. After all, one of Elixir’s strengths is that you can write concurrent and parallel code without making too many changes to your code.

Using Task.async and Task.await, I rewrote the function so it looked like this:

  # iterate over the data. For each article, scan badwords
  # return new list with additional 'words' field
  defp find_badwords(data) do
    data
    |> Enum.map(fn article -> Task.async(fn ->
      words = determine(article[:content])
      %{title: article[:title], content: article[:content], words: words}
    end) end)
    |> Enum.map(&Task.await/1)
  end

This time, the calls to determine are wrapped in an async task. The tasks are rolled up into a list and collected by Task.await.

When I ran the code again, the process took only five seconds. That’s a pretty good optmization for the amount of work needed.


I don't have comments enabled on this site, but I'd love to talk with you about this article on Twitter. Follow me and say hi.