More Insights concerning The Bio CBD Vape Oil

Vaping of bio CBD oil is a smokeless, squat-temperature strategy for breathing in CBD through vaporization. There are conceivable repayment of vaping CBD relying upon the individual, their specific…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Concurrency models for Cloud applications

With the advent of cloud computing and the elasticity introduced, we see the demand of more scalable/concurrent applications growing. Majority of the times we find those applications are i/o intensive applications, meaning that they need to capture network packets and with some business logic applied transfer them to DB.

So if we tend to classify, the workload characteristic is majorly network oriented and involves network sockets. So then the question comes to mind as to what is the right concurrency model for such applications and why.

The concepts we discuss today are agnostic of the cloud platforms you run them on (CloudFoundy,K8s,Mesos etc)

We see atleast 3 models of concurrency prevalent

1. Thread per connection — This is the model mostly used by Java, where each connection is modelled as a Thread on the operating system. With 1:1 mapping this hits the scaling limits soon as number of connections are then bound to the OS resources like memory (thread resources like stack etc).

2. Event loop — This model advocated by Nodejs where one Thread runs in an event loop and listens on events on the different sockets (each socket being a File descriptor). Events vary from connection events to data events on specific sockets. Nodejs uses libuv behind the scene which is based on the epoll mechanism in Linux Kernel

3. Userspace Threads — This model is what golang uses and has threads in userspace called goroutines. Then they have a scheduler in userspace which maps m such userspace threads to n OS threads(processes).

So if we see above the approaches 2 and 3 of handling concurrency are better placed as they allow less threads to do more work by multiplexing a huge number of connections over less number of threads.

So the question is that how do they achieve this? This is primarily the intent of this blog, as to cover the under the hoods of such a concurrency model and try to understand what makes an event loop scalable.
To understand that one has to understand the underlying mechanism of eventloop and which is basically the epoll mechanism in linux. One also has to understand the nuances of what is blocking and what is non blocking i/o.

So lets start with blocking and non-blocking I/O. Everything in Linux can be treated as a file. So a socket, IPC Primitives like Pipes,Queues, Devices can all be treated as files. Whenever a file is opened, the kernel returns back a handle to userspace in form of a file descriptor. So a process having a socket means it has a file descriptor handle, over which it interacts with the socket.

The file descriptor can be opened in blocking and non blocking mode. The difference is that in case of blocking if there is no data either to be received or if the buffers are full and process cannot put data for transmission, the process is put to sleep by kernel. In case of non-blocking the behavior is different and in case of no data or buffer full, the system call of read/write returns with EWOULDBLOCK and EAGAIN error. This means there is no blocking and the process can retry the read/writes.

So in non blocking mode one way for a process is to keep checking if it can read/write , but as the number of fds(say number of open sockets) increase , this easily becomes a non scalable solution.

Luckily Linux provides some system calls like select and poll, with which we can register what fds we want to check on for data readiness. These select and poll themselves block till the time they see if FDS are ready. The problem with them is that everytime we need to check what fds are ready, we need to pass the list of fds to these calls, and they iterate over the list and check internally for readiness. This also leads to scaling issues. To mitigate this, Linux came up with epoll, which is an object created in Kernel and which internally checks the readiness statuses of the fds which process asks it to keep an eye on.

Epoll provides a few system calls

There is one more nuance with epoll in terms of delivery of the events. They are

So as we read above, to write scaling apps, one has to probably understand the guts of the system, to make a judicious call as to whether for a specific workload type, it makes sense to choose something like an event loop and within that also whether the software we use is using select, poll or epoll like mechanisms. The goal of the blog was to understand conceptually as to what are the pros and cons of each of the above approaches and how does one choose the right scaling/concurrency model for their apps. Since majorly we see the workloads are about network i/o it makes lot of sense to evaluate the event based concurrency models. This also allows to use the resources better, rather then directly looking for horizontal scale without even using the existing resources on single node better.

Another aspect which is worth a mention is about i/o multiplexing. What we just read about in terms of epoll, it acts as an i/o multiplexer. Now for i/o multiplexing there are two variations to it

So i/o can be defined as blocking, non blocking from the context of the file descriptors. It can be classified as synchronous and async based on responsibilities. So one can say that a synchronous non blocking i/o would be classified under reactor pattern where in the handler will handle the actual reads and writes till it gets EWOULDBLOCK or EGAIN and again depending on edge and level triggers. In case of async non blocking, the i/o is done by OS and the i/o completion is what the process is notified about.

What the blog also explains is that the term blocking is majorly used in context of how a file descriptor is opened. So a blocking file descriptor with multiplexing mechanisms like epoll and in level triggered mode scales equally as a non blocking one in same mode. But for Edge triggered notification delivery, with the epoll like multiplexer, the fds have to be in non blocking mode for scale. So understanding this nuance is important when one designs applications for cloud.

In next blog would like to touch base on some of the possible glitches of epoll and primarily in context of using file descriptions, rather then file descriptors as part of the active and ready list. Also would like to see if we can write a small event loop program using epoll.

Updated with example of epoll usage

Disclaimer : The views expressed above are personal and not of the company I work for.

Add a comment

Related posts:

Detectando cambios con Zapier

Hace unos meses tuve la necesidad de conocer en qué momentos cambia el contenido de un servicio de datos concreto. Mi primer impulso fue el de hacer un script que se ejecutase periódicamente…

Funny Things Happen To Us On The Way To Old Age

As we get older funny things happen to us — okay some things aren’t so funny but they happen whether we want them to or not. Well, maybe they’re a little funny. Think about your parents and…

How to pass the PMP exam at first attempt

Through this article, I recommend you the PMP exam preparation resources, books, online courses and tips that eligible for 2021 changes.