Solving mystery of the service discovery with Azure ACS DCOS – Part 1

I been currently working with Azure Container service, and was working with Mesosphere DCOS, Marathon and Mesos to design insanely scalable architecture of 10000 of nodes.

There is a nice tutorial on the Azure website, where it how to deploy the app on the DCOS cluster with Marathon Load balancer.

If you are new to DCOS, Marathon and Mesos, I recommend you to read my previous post which gives you the peep into Docker cluster world.

This post is Level 300, deep dive for people, who need to understand how does service discovery works in Mesosphere DCOS ecosystem.

What is Service Discovery?

Service discovery allows network communication between services.  In Mesos space, containers are known as services. So Service discovery would be knowing well known address of other containers running in the cluster.

There is another post, which I wrote on the Service discovery explained in layman’s term. Check it out.

In DCOS Mesos, this happen in two ways


Mesos DNS

Mesos-DNS is a stateless DNS server for Mesos. Contributed to open source by Mesosphere, it provides service discovery in datacenters or cloud environments managed by Mesos.

What Mesos DNS offer?

Mesos-DNS offers a service discovery system purposely built for Mesos. It allows applications and services running on Mesos to find each other with DNS, similarly to how services discover each other throughout the Internet. Applications launched by Marathon or Aurora are assigned names like search.marathon.mesos or log-aggregator.aurora.mesos. Mesos-DNS translates these names to the IP address and port on the machine currently running each application. To connect to an application in the Mesos datacenter, all you need to know is its name. Every time a connection is initiated, the DNS translation will point to the right machine in the datacenter.


How does it work?


Mesos-DNS periodically queries the Mesos master and retrieves the state of all running applications for all frameworks. It uses the latest state to generate DNS records that associate application names to machine IP addresses and ports. Mesos-DNS operates as the primary DNS server for the datacenter. It receives DNS requests from all machines, translates the names for Mesos applications, and forwards requests for external names, such as, to other DNS servers. The configuration of Mesos-DNS is minimal. You simply point it to the Mesos masters at launch. Frameworks do not need to communicate with Mesos-DNS at all. As the state of applications is updated by the Mesos master, the corresponding DNS records are automatically updated as well.
Mesos-DNS is simple and stateless. Unlike Consul and SkyDNS, it does not require consensus mechanisms, persistent storage, or a replicated log. This is possible because Mesos-DNS does not implement heartbeats, health monitoring, or lifetime management for applications. This functionality is already available by the Mesos master, slaves, and frameworks. Mesos-DNS builds on it by periodically retrieving the datacenter state from the master. Mesos-DNS can be made fault-tolerant by launching with a framework like Marathon, that can monitor application health and re-launch it on failures.
Mesos-DNS defines the DNS top-level domain .mesos for Mesos tasks that are running on DC/OS. Tasks and services are discovered by looking up A and, optionally, SRV records within this Mesos domain. To enumerate all the DNS records that Mesos-DNS will respond to, take a look at the DNS naming documentation

What is Marathon-LB

Marathon-LB is tool provide for containers launch via Marathon App in Mesos. LB stands for Load Balancer, which helps to dynamically add or removing containers from the load balancer running on various Mesos slaves

How Marathon-LB works?

Marathon-lb is based on HAProxy, a rapid proxy and load balancer. Real magic happens when Marathon-lb subscribes to Marathon’s event bus and updates the HAProxy configuration in real time.

It means any new container instantiates, it will add those new containers to load balancer pool automatically in a fraction of second and restart HAProxy with almost zero downtime to route traffic to new containers. Same goes, when container dies out.

Below is the architecture for Marathon-LB

Marathon Load Balancer

Marathon Load Balancer

You can read the nice documentation of Marathon-LB at Mesosphere blog.

Here below, is how Marathon-LB looks on Marathon Web UI Marathon Web UI

Next post would be more interesting about mystery solving of Azure ACS load balance app.


1 thought on “Solving mystery of the service discovery with Azure ACS DCOS – Part 1

  1. Pingback: Solving mystery of the service discovery with Azure ACS DCOS – Part 2 | Technical Insanity

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s