Parallel Processing in Elixir
Part of my work involves making sure content we publish fits our style. I wanted to automate parts of that process by identifying phrases and words we don’t want to see.
I wrote a scanner in Elixir, and my first version wasn’t that efficient:
# iterate over the data. For each article, scan badwords
# return new list with additional 'words' field
defp find_badwords(data) do
data
|> Enum.map fn article ->
words = determine(article[:content])
%{title: article[:title], content: article[:content], words: words}
end
end
This iterates over each article, finds the bad words by calling the determine
function, and returns a new data structure with the title, content, and the bad words so the rest of the program can do what it needs to do with them.
When I ran it on a set of 1000 pieces of content, it took around 20 seconds. It’s certainly faster than what I could do by myself manually, but I still thought I could make that better. After all, one of Elixir’s strengths is that you can write concurrent and parallel code without making too many changes to your code.
Using Task.async
and Task.await
, I rewrote the function so it looked like this:
# iterate over the data. For each article, scan badwords
# return new list with additional 'words' field
defp find_badwords(data) do
data
|> Enum.map(fn article -> Task.async(fn ->
words = determine(article[:content])
%{title: article[:title], content: article[:content], words: words}
end) end)
|> Enum.map(&Task.await/1)
end
This time, the calls to determine
are wrapped in an async task. The tasks are rolled up into a list and collected by Task.await
.
When I ran the code again, the process took only five seconds. That’s a pretty good optmization for the amount of work needed.
Thanks for reading
I don't have comments enabled on this site, but I'd love to talk with you about this article on BlueSky, Mastodon, Twitter, or LinkedIn. Follow me there and say hi.