Distributed Systems- Modernization
Note — DS → Distributed Systems wherever referred
This is a modern take on distributed systems, you can read the other articles written by me here.
So in the first few stories, we discussed what is distributed system is and some things (Replication/Sharding, etc) that make up the crux of DS. So, distributed systems are not new, they've been around and the infra underneath that helps make DS exist and work is evolving.
Today there are applications (websites, apps, gaming rooms) that are live 24/7 and need to respond instantly, and the number of users and amount of data and API calls force the systems to scale and cater to different geographies and timezones, and also avoid latency. Hence any system a client interacts with is some sort of Distributed System in the backend with its supporting infrastructure. What essentially a client interacts with is a system made up of multiple applications running on different machines (with their replicas), all communicating together. The following diagram might help to paint a picture.
So what we see above is a client interacting with a server that is replicated by different machines and a master server that (also might serve requests) and route other requests to different machines( Depending on what the client is and what application it is talking to, the actual naming convention can differ based on the context and purpose, the server handling request can be called BROKER, CONTROLLER, LOAD BALANCER, etc, but it indeed is a machine doing some form of processing). But, the client doesn’t know the internal details of how the system is working in the backend (how it is distributing the workload to different actual machines). This is how distributed systems have been working for a long time or at least the infrastructure resembled close.
Now let's see what has changed with the adoption of modern technologies.
One of the pioneers of improving infra costs and finer-grained control is Docker. You can read more abt docker and Kubernetes if you don't have much idea about these technologies. So the Interaction is largely the same(from the client’s perspective).
The client sends a request to a Server/Broker that may reroute the request to one of many servers/replicas capable of handling the request by reading configuration from some sort of distributed database.
But what changed is the infra underneath. In the traditional world, a “node”(a physical machine) of one type was generally used for one use case or one application. (Like hosting a DB replica, serving web requests, etc) There was hardly one-to-many mapping. For eg If there is a DB server with a master node,( 1machine here) and 3 server replica nodes,(3 machines here) the actual physical machines would generally be used just for the DB replication and would rather not run any other applications except the DB, plus different applications need access to some root directories to set home/config/context, etc and some sort of isolation. Not to mention the other application would steal crucial resources that the core application (DB) might need.
Do you see a problem in above setup. There needs to be individual physical machines present of different sizes catering to different requirements and yest we can not fully (100%) utilize the compute and storage of these machines as they tend to cater to one usecase/problem.
Enter K8. Kubernetes allows you to run isolated containers that have their own network, storage, and root directories basically it's a complete machine but it's still a “soft machine” as you can run numerous containers as long as your underlying physical machine on which the containers are running still have resources. You get additional advantages like you can scale up/down and reuse the resources for other applications by spinning their containers and reducing infra cost. This is what is happening with infra all over where companies run computing on EC2 instances on Large AWS machines and save the cost and hassle of managing and maintaining physical nodes.
Now what is happening is Client is still interacting with a domain name (read Ingress), now this ingress(which is nothing but basically a way to know how to route requests to specific containers or k8 resource group). Now what happens is depending on the application or architecture the client request can be served by any of the containers (C1, C2, C3) which belong to a particular application that may be sharing a Database (read Stateful set) and any of these containers could serve the request.
What we additionally see is more containers (C11, C12 and C33, and C44) belonging to different applications which might serve requests depending on the resource group they belong to.
So, all in all, it still is a Distributed system and all the concepts we discussed in earlier blogs are still applied (sharding, replication, distributed storage, and distributed computing) but the infra on which these runs and the way DS interactions happen have changed largely from the traditional way.
** More on this Later **