How Jobandtalent runs web apps

Job&Talent Engineering

Published in

Job&Talent Engineering

11 min readJan 3, 2023

Learn how we handle HTTP services in such varied web development technologies as Ruby, Elixir, Python and Node.js!

Jobandtalent uses Ruby, Elixir, Python and Node.js as its tech stack for web development.

Written by Francisco Manuel Gonzalez

How does Jobandtalent run its web apps?

The Jobandtalent product is a polyglot microservice environment with:

Ruby,
Elixir,
Python,
and Node.

One of the duties of the Backend Devex team is taking care of the implementation and maintenance of libraries in different languages used among the vast number of JT’s services.

In this blog post, I want to show you how Jobandtalent runs the HTTP services in each technology.

Speaking of serving HTTP requests, each technology has different ways to work (threads, forking, event loop…), and below you will find out how we do it.

Background

This blog post comes from a presentation shared on our backend bi-weekly meeting, a place where each team shares something interesting done from a technical perspective.

You can find various teams covering topics like Kafka, Avro schemas, Elastic search query expansion, or an entity component system.

In this case, it was the turn of the Backend DevEx team, a team that exists to optimize the Developer Experience (DevEx/DX) of the Backend folk at Jobandtalent.

Caution! The scope of this article

A disclaimer first, this blog post stays on the surface of all strategies that you will visit. Each of them can be a dedicated blog post, talk, or a book, when you go deep enough.

This post will go over the four technologies (languages and frameworks) that Jobandtalent uses. We will see how each of them behaves in the actual HTTP applications configuration.

One endpoint per technology does a sleep of one second at the application layer to simulate a slow endpoint. Other does sleep at the database layer to simulate a slow query by SELECT pg_sleep(1);.

Bear in mind that the sample applications built for the purpose of this article are run in production mode but over a MacOS machine using Docker Desktop. This always adds overhead and the execution times might not be 100% precise. They are, however, good enough for showing the intention of this blog post.

Ruby on Rails

Ruby defines itself as ‘a dynamic, open-source programming language with a focus on simplicity and productivity. It has an elegant syntax that is natural to read and easy to write’.

It has different virtual machines, like CRuby, JRuby, or TruffleRuby.

At Jobandtalent we use CRuby, which means that an application will stick to the Global VM lock.

For Ruby applications, Jobandtalent uses Ruby on Rails, the most famous web framework written in Ruby.

To run the Ruby on Rails applications, we use Puma. According to the readme, it can be defined as ‘a simple, fast, multi-threaded and highly parallel HTTP 1.1 server for Ruby/Rack applications’.

This gem can be configured to run in thread mode or in cluster mode.

The threaded mode is one Ruby process that starts a thread per request (internally, it is a bit more complicated than that, but bear with me) and the thread ends when the request ends. You can learn more about the Puma internals in these awesome slides.

The cluster mode starts different processes and each process behaves like the threaded mode.

At Jobandtalent, since we start one process per container, we use the threaded mode.

*Gray: waiting for the GLI, Black: Running*

It is a pretty standard way to work. You can configure puma to start a limited amount of threads and we do that at Jobandtalent. This means that, for example, if you set the RAILS_MAX_THREADS env var to 5, Puma will be able to accept 5 requests at the same time, and if for some reason you get more, the rest will stay waiting in the internal Puma queue.

*White: waiting in the Puma internal queue, gray: waiting for the GLI, black: running CPU-bound code*

Protecting your application is a good thing. Even Active Record (the ORM used by Rails) has a database pool, and Puma is giving you an extra layer of protection at the HTTP application layer. If, for example, you are sending HTTP requests to internal APIs, and a storm of requests comes in, Puma is able to protect the rest of the system that is behind.

In this scenario, we have one process dealing with different threads. You’ll need to bear this in mind if, for some reason, you share some resources on the virtual machine level (like class variables for example). Each thread is sharing the main Ruby process memory.

The Ruby on Rails framework is thread-safe since version 2.2. At this moment, Rails is in 7.0.4, which means you just need to watch out for your code and the libraries that you use.

Let’s see it in action.

If you run a quick and simple load testing with ab you’ll see how Puma is doing its job. Let’s do 100 concurrent requests in batches of 100 to an endpoint that just does a one-second sleep to simulate an IO operation.

ab -n 100 -c 100 [<http://localhost:3000/sleep?seconds=1>](<http://localhost:3000/sleep?seconds=1>)

Requests per second 4.71 [#/sec] (mean)
Time per request    21292.402 [ms] (mean)
Failed requests     0

It shows a lot of contention, but no errors so far. Just Puma accumulating the requests and making the rest wait.

If you repeat the same experiment but doing the request to an endpoint that does a one-second sleep in the database via SQL, you will see the following results:

ab -n 100 -c 100 [<http://localhost:3000/db_sleep?seconds=1>](<http://localhost:3000/sleep?seconds=1>)

Requests per second 4.66 [#/sec] (mean)
Time per request    21452.850 [ms] (mean)
Failed requests     0

The results are the same. Contention, and no errors so far. Even Active Record will protect the database with the connection pool. In this case, Puma has done it before.

It is tempting to increase the number of threads in order to reduce the contention and allow Puma to serve more requests. However, this is discouraged. Every thread comes with a price in the CRuby VM. Wanna go deeper? It is explained here.

Now, let’s check the next technology, which is…

Elixir and Phoenix

For the ones that do not know Elixir, it defines itself as ‘a dynamic, functional language for building scalable and maintainable applications’.

Elixir runs on the Erlang VM, known for creating low-latency, distributed and fault-tolerant systems.

This means that Elixir is able to run the code in parallel. It is not limited like the CRuby virtual machine, for example. And to do so, it relies on the concept of a process. Do not confuse it with the operating system processes, though. BEAM has a mind-blowing, clever internal implementation of the same concept called OTP.

Every process holds its own memory, has its own garbage collector and can run only one instruction at the same time. However, you can start a vast number of processes, and by vast I mean millions.

Hence, you can run code in parallel using all your CPUs, and without race conditions, since data in Elixir is immutable.

*SMP stands for Symmetrical Multi-Processor.* The *image is totally inspired by the one shown in the* algodaily.com article, just using a different color style for consistency.

Take a breath now 🙂 If you want to know more, there is a lot of literature, blog posts, talks, and books about this on the Internet. A personal opinion of this blog posts’ writer is that OTP and Elixir are one of the most exciting things that I have learned in the last ten years.

Phoenix, Cowboy and Ecto

While Elixir is the language, Phoenix is the web framework we use at Jobandtalent. To serve web requests, Cowboy is used. Cowboy defines itself as ‘a small, fast, and modern HTTP server for Erlang/OTP’.

Yes, Erlang. Elixir is fully interoperable with the Erlang ecosystem.

Cowboy is not like Puma. E.g. Cowboy does not set a limit to the number of requests it can handle. Each request gets its own BEAM process and is handled from there. This means that we have lost the protection of the web server, however, Ecto, the Phoenix ORM, comes with a database connection pool.

Let’s see it in action, using the same approach, ab, and a couple of endpoints with a sleep and a sleep in a SQL query.

ab -n 100 -c 100 [<http://localhost:4000/sleep?seconds=1>](<http://localhost:4000/sleep?seconds=1>)

Requests per second 95.54 [#/sec] (mean)
Time per request    1071.512 [ms] (mean)
Failed requests     0

No overhead starting that amount of process to handle the IO operation.

Let’s do the same with the SQL sleep now.

ab -n 100 -c 100 http://localhost:4000/db_sleep?seconds=1

Failed requests 66

This makes sense, since we have a very slow endpoint that gets 100 requests at the same time, and the database connection pool has a timeout. Several requests return an error because they can’t get a connection to the database.

If for the sake of the experiment we increase the connection pool to deal with the known number of 100, we’ll see the following results:

Requests per second 94.02 [#/sec] (mean)
Time per request    1110.931 [ms] (mean)
Failed requests     0

It is virtually the same time as with a simple IO sleep.

It is tempting to increase the number of connections made to the database to handle more requests, however, this comes with a database performance price. Also, you need to be very cautious with that, since the connection pool in our system is per container, and the number of containers can vary (K8 starting more containers due a traffic spike). If there are more services, like a background process system or a message consumer, they might use the database as well.

Next station, Python!

Python and Flask

Another technology that supports Jobandtalent’s products is Python.

It defines itself as ‘a programming language that lets you work quickly and integrate systems more effectively’.

Jobandtalent uses CPython, and there are other implementations like JPython. CPython is limited in terms of running code concurrently due to the GIL. In each python process, you are able to run just one single thing at the same time.

Jobandtalent uses Flask as a framework to build the applications. Flask defines itself in pypi as ‘a lightweight WSGI web application framework. It is designed to make getting started quick and easy, with the ability to scale up to complex applications’.

Flask is a framework that allows you to choose whatever tooling you want to deal with the database. View layer, and the rest of the tools that you will need, takes a different approach than Django.

Gunicorn for and process-based configuration

And to serve the request, Jobandtalent uses Gunicorn.

Gunicorn allows several configurations. We use the process-based (in this case, the process means an operating system process) one. You can read all the different configs in the Gunicorn documentation. It can use threads or an event loop strategy (kind of).

The process-based configuration means that when you start Gunicorn, you create a master process, and each new request is an operating system fork of the master process.

Once it ends, the process is killed.

You can limit the maximum number of processes that can be forked to control the number of requests that you accept.

Similar to what we saw with Puma earlier, Jobandtalent limits it to a maximum of four processes.

In Unix, when you fork a new process, you get a copy of the master process memory. Since it is a Python process, you are copying the virtual machine. However, the operating system is smart and to avoid the overhead of copying everything, uses a copy-on-write strategy.

The *image is totally inspired by the one shown in* *geeksforgeeks.org* *article, just using a different color style for consistency.*

Let’s see it in action, using the same things, ab, and a couple of endpoints with a sleep and a sleep in a SQL query.

ab -n 100 -c 100 [<http://localhost:8080/sleep?seconds=1>](<http://localhost:8080/sleep?seconds=1>)

Requests per second 3.65 [#/sec] (mean)
Time per request    27456.754 [ms] (mean)
Failed requests     0

The numbers might seem lower than in case of Ruby on Rails. It is because the contention, which is set to four processes, is less than what we set with Ruby on Rails. Let’s do the same with the SQL sleep now.

ab -n 100 -c 100 [<http://localhost:8080/db_sleep?seconds=1>](<http://localhost:8080/sleep?seconds=1>)

Requests per second 3.51 [#/sec] (mean)
Time per request    29482.322 [ms] (mean)
Failed requests     0

Similar results as before! Gunicorn is protecting the system when a request HTTP comes.

The tempting thing to do in order to increase the throughput per container is to increase the maximum number of processes. It is, however, discouraged. Find your limits in the Gunicorn documentation about workers.

From the docs: ‘DO NOT scale the number of workers to the number of clients you expect to have. Gunicorn should only need 4–12 worker processes to handle hundreds or thousands of requests per second’.

Next stop: Node.js!

Node and Express

At Jobandtalent, we also use Node.

According to the docs, ‘as an asynchronous event-driven JavaScript runtime, Node.js is designed to build scalable network applications. In the ‘hello world’ example, many connections can be handled concurrently. Upon each connection, the callback is fired, but if there is no work to be done, Node.js will sleep.’

Node uses an event loop, a very smart reactive implementation for non-blocking I/O operations. It is really well explained in the Node documentation.

This is the base to be able to handle thousands of requests without running code in parallel.

Node starts a process with one thread (well, there are more threads, but they are not meant to be used by the app directly).

You are responsible to watch out for the code that you run. If a function takes a long time to finish, and there is neither thread pool, a BEAM scheduler, nor UNIX process scheduler to “balance” the execution, nothing else will be executed until then. We can always schedule tasks to the event loop using promises or some callbacks.

The *image is totally inspired by the one shown on the* *nodejs.org* *page, just using a different color style for consistency.*

As a web framework, Jobandtalent uses Express.

Express defines itself as ‘a minimal and flexible Node.js web application framework that provides a robust set of features for web and mobile applications’. This framework is similar to Flask in the sense that it is very thin and quite unlike Ruby on Rails or Phoenix, where you have an ORM, view layer, and so on, and you have to choose your tools.

To serve the request, there is no need to have anything in the middle. You start the app directly with a node command. No Puma, no Cowboy, no Gunicorn. Just Node.

Let’s see it in action, using the same things, ab, and a couple of endpoints with a sleep and a sleep in a SQL query.

ab -n 100 -c 100 <http://localhost:9229/sleep?seconds=1

Requests per second 44.10 [#/sec] (mean)
Time per request    1489.98 [ms] (mean)
Failed requests     0

Good numbers here! We can see how it shines in terms of non-blocking IO.

Let’s do the same with the SQL sleep now.

ab -v 3 -n 100 -c 100 [<http://localhost:9229/db_sleep?seconds=1>](<http://localhost:9229/db_sleep?seconds=1>)

Requests per second 42.67 [#/sec] (mean)
Time per request    2252.86 [ms] (mean)
Failed requests     0

It is almost the same to wait for the DB or wait for IO.

What is this blog post trying to show?

It shows four different technologies, and four different ways to face a similar problem, waiting for IO and dealing with requests.

A fork-based strategy, OTP, Threads, and the event loop in action. Jobandtalent uses all four of them.

It was a good journey reading about all of them and learning them!

Well, there is an elephant in the room. Which approach is the best?

You tell us!

Some links: