Topics Discussed:
- Linux OS fundamentals
- Why do we need containers
- Building Blocks for creating a container
- Basics of Container networking
- Demo on creating a container from scratch
- Demo on container networking
- Routing
- Bridge
- Promiscuous mode
Linux OS fundamentals :
The session began by covering the fundamentals of the Linux operating system. In Linux, there are two primary components: the User space and the kernel space. Applications operate within the User space, and they communicate with the kernel through libraries. When an application needs to directly interact with the kernel, it utilizes the system call interface, which is commonly referred to as system calls.
Why do we need containers?
- Process isolation
- resource usage restriction
- dependency management
- Lifecycle management
We can run applications directly on a system, where they run as processes. However, if multiple teams are working on different parts of an application, such as one team managing the mail application and another team managing the web application, running them on the same system can lead to potential issues. For example, if one application has a bug or consumes excessive memory, it can impact the performance of other applications and even the entire system, potentially causing out-of-memory situations. The same is true for CPU usage. Additionally, in a system without isolation, even a normal user can view all the running processes by executing a ‘ps’ command, which poses security risks. These challenges highlight the need for containers, as they provide process isolation, which enhances security and prevents interference between applications. Containers package applications along with their dependencies into a single, self-contained unit, known as a container image. This approach simplifies dependency management and ensures consistent and reliable application deployment. Containers can be easily started, stopped, updated, rolled back, and replicated, providing flexibility and agility in managing applications.
The Building Blocks for creating a container
- Namespaces
- Cgroups
- Linux capabilities
Namespaces : Think of namespaces as virtual compartments that isolate different parts of a system.Each compartment (namespace) keeps processes and resources separate from one another . there are various such namespaces in kernel. We can consider namespace as a virtual kernel inside the main kernel. One of the Namespace is “UTS” namespace allows a container to have its own hostname without affecting the host system. If a process require interaction with external work it should connect to a network. On isolating a process it cannot see the system networks. The network namespace facilitates the interfaces to the process.The network namespace enables a container to have its own network interfaces. Namespaces ensure that containers operate in their own isolated environments.
Cgroups : The resource usage is controlled via the Cgroups. We can attach process to a Cgroups. Suppose a system with 20Gb memory and a Cgroups is created and allowed to use only 12 gb. so whatever process we attach to that Cgroups can go maximum of 12gb not beyond altogether. Similarly we can assign CPU also. Even fractions of CPU can be assigned. If we consider kubernetes we usually assign limits & request for cpu and memory, actually it is getting translated and reaches the Cgroups ultimately. Cgroups ensure fair sharing and efficient resource usage. The purpose of Cgroups are :
- Resource Limiting: We can limit the resource usage of the container from the host system
- Prioritizing: Containers with higher priorities will receive more resources compared to those with lower priorities.
- Accounting: Cgroups provide mechanisms for tracking and monitoring resource usage by containers. You can view resource utilization statistics and metrics for individual containers or groups of containers.
- Control/ freeze: it can also pause the process while performing batch jobs and all.
But currently, we are not using the groups to freeze the containers.
Capabilities: Capabilities define the privileges or powers that a process or container has. By default, containers have limited capabilities for security reasons. Privileged containers, however, have all the powers and can do anything on the system. We have heard that it’s not recommended to use privileged containers. So we can impose restrictions on containers via capabilities. This helps maintain security and prevents unwanted actions.
Demo : Container from scratch:
In the demo sessions, created a container from scratch using the “unshare” command. For learning two virtual nodes (VMs are set up and created containers in both nodes) were created. The unshare command creates required namespaces for isolating the container. Before executing the unshare command we need to create the required root file system for our container and download the binaries for busybox. Also create symlink for bins on both nodes . But this symlink does not work as this directory is not a root directory .Then a container is created with current directory as the root directory isolating from host with namespaces the symlinks will work. The chroot command specified makes the directory as a root directory. The steps followed are mentioned in the github link.
https://github.com/ansilh/Talks/blob/main/devops-malayalam-25may2023/bridging.md
The Discussion on container networking starts by Veth pair. Two virtual interfaces connect together via a virtual cable forms the Veth pair. here one of the interface is connected inside the namespace and other will be kept to connect to the system.
The demo for implementing the networking for containers start by creating a Veth pair. The Veth pair can be created with ip link command.
#ip link add vethlocal type veth peer name vethNS
where vethlocal and vethNS are the two virtual interfaces. The vethNS interface need to be connected to the container. For connecting the vethNS to container, first we need to get the PID of the container (from the host system)
>>ps -ef |grep ‘/bin/sh’
>>ip link set vethNS netns <PID>
Now if we execute the ip a command , we could see the interface from the container. now we need to “UP” the interface and assign an Ip address to it.
# Initially if we execute the “ip a” command inside container will not show any interfaces . but later on connecting the interface we are able to see these Virtual interfaces from the container. Then add an address to the interface.
>>ip addr add 10.5.19.10/24 dev vethNS
Then we need to bring up the loopback interface and the vethNS
>>ip link set lo up
>>ip link set dev vethNS up
Also need to bring up the vet local interface in the host system.
>>ip link set dev vet local up
Now for routing the packets to external, we need to create a virtual bridge ( A bridge is a device that simply transfers packets without any intelligence).
There are different ways to implement container networking outside like routing, bridge, VXLan, etc. in this session discussed routing and bridge.
Routing:
As per the demo we have already created a veth pair and based on our configuration our packets can now reach up to vethlocal now, But we need to transfer the packets to outside via enp0s3 after reaching vethlocal(as shown in dotted lines in fig). we need to create that link as a virtual bridge. For implementing the routing a gateway is needed, for that first we create a virtual bridge and assign an Ip address and will consider it as a gateway. Then we will connect the vethlocal to that virtual bridge so the packets can travel to the bridge and then outside. Use the commands in the link for creating the bridge and assigning an Ip address for that bridge. The same steps with different subnet are performed in node 2 as well.
After we have to set up a default gateway to the containers.
https://github.com/ansilh/Talks/blob/main/devops-malayalam-25may2023/routing.md
For forwarding the packets in a Linux system we have to enable the packet forwarding in the Linux for both nodes.
Bridging
https://github.com/ansilh/Talks/blob/main/devops-malayalam-25may2023/brid
In this section we connect additionally a physical interface to the bridge created . Then packet directly moves through this physical interface and no need of this routing and all.Here also we can start the container with unshare command and follows other steps for creating the veth pair on both nodes.
As in fig. We first create a Veth pair with vethns virtual interface to container and vethlocal is connected to the bridge. Here instead of connecting the cbr0 to enp0s3 it has been directly to a free physical interface enp0s8.
There is no need for a gateway in this case as both nodes in same network. The bridging will work in the subnet.It has a limitation as there is a limit in number of Ips so that much container only we are able to create. Routing facilities networking with different subnets also. Hence more Ipspace possible in routing.
Promiscuous Mode :
The linux by default has a behavior that it will not pass the packet which is not destined to it to the kernel. It’s a property of the network interface. A NIC will check the MAC address in the packet and decide that if the MAC address matches then only it passes the packet kernel. This package not even got visible in tcpdump command as it is filtered out in the network layer before entering the system.
Promiscuous mode is a network interface mode that allows a network device to capture and receive all network traffic on a network segment, regardless of the destination MAC (Media Access Control) address. In promiscuous mode, the network interface card (NIC) is not filtering or discarding packets based on their destination address. here it was explained in the session because the container running in the host has a different MAC address so packets destined for the container can get dropped at NIC if the promiscuous mode is not enabled while setting up. But the drawback is the kernel has to process all the packets. here, Briding can work only if promiscuous mode is enabled on the interface.
Extra Infos :
⇒ Usage of strace command : what all things happening while we execute a command in background. We can see what all system calls are involved.
⇒ ARP packets reaches the destination before the actual ICMP packets, for troubleshoot on networking it will be easier if we keep that in mind
>>tcpdump -nnni crb0 arp
Before using the Ip address the source need to know the destination.MAC address is learned through ARP. ARP is a broadcast will be sent to all devices in the network, the devices with matching address will only respond. Also ARP have a cache and has a timeout also. On caching the resolved address the redundant sending of [ackets for finding destinations can be avoided.
>>arp -nnv