Tiktok Interview Preparation

HR Interview

Q: Reason why you apply? what is your status now? A: Currently i'm working as Site Reliability Engineer in Shopee for around 2 years. Before take the SRE role, i spent my coding experience as a fullstack engineer. for my frontend stack, i having experience on developing with React & Vue. for the language commonly using Typescript now. For backend, mainly i just use 2 language Golang & Python.

Technical Skill

Q: What is SRE? A: SRE role is basically using software engineering skills to keep systems running smoothly and reliably. Apart from that i think SRE also responsible to maintain the SLA Metrics of a service also be the first frontline when an incident happening up until it got resolved

Q: Incident handling, do you ever encountered one before? How to handle it? SOP? A: During my times previously at Shopee, of course i already encounter multiple small incidents or a big incident that cause the server downtime. But my team always preparing a SOP for any incident that happening. In short, Our SOP is focusing on service recovery first. For example there are a incident that causing one of my services success rate dropping below 95% for more than 5 minutes. What we do usually is checking the Grafana first prior to anomaly metrics detected. Almost 80% of the incident can be detected by grafana. after i identified the anomaly metrics. Usually what i do is go to both log platform and Jaeger to trace which endpoint is affected. or is this issue causing downtime. For the root cause identification process. usually it can be caused from our middleware failure (DB, Cache, MQ), third-party failure (fraud-service, gameplatform). Or this can also caused if the compute resources not enough causing OOM

Q: Design a monitoring system for TikTok A: Okay, let's break this down. A monitoring system for TikTok needs to cover a lot of ground, given the scale and complexity. Here's a possible design:

Metrics Collection:
- Infrastructure Metrics: CPU usage, memory utilization, disk I/O, network traffic for all servers (using tools like Prometheus with node_exporter).
- Application Metrics:
  - Request latency, error rates, request volume for all APIs (using Prometheus and application-level instrumentation).
  - Database query performance (query execution time, number of queries, slow query logs).
  - Cache hit/miss ratios (Redis, Memcached).
  - Background job processing time and error rates.
- User Experience Metrics:
  - Video upload time, video playback start time, buffering rate (collected from the client-side).
  - Login success/failure rates.
  - Search query latency and success rates.
  - Feed load times.
- Business Metrics:
  - Active users (DAU/MAU).
  - Video views, likes, shares, comments.
  - Ad impressions, click-through rates.
Data Aggregation and Storage:
- Use a time-series database like Prometheus or VictoriaMetrics to store metrics.
- Aggregate logs using tools like Elasticsearch, Loki, or Graylog.
- Consider using a distributed tracing system like Jaeger or Zipkin to trace requests across services.
Visualization and Alerting:
- Use Grafana to create dashboards for visualizing metrics.
- Set up alerts in Prometheus Alertmanager or Grafana to notify on-call engineers of issues. Alerts should be based on SLOs (Service Level Objectives). Examples:
  - API error rate > 1%
  - Video playback start time > 2 seconds
  - 99th percentile latency > 500ms
Specific Monitoring Points:
- Content Delivery Network (CDN): Monitor CDN performance (cache hit ratio, latency, bandwidth).
- Recommendation System: Monitor the performance of the recommendation algorithms (click-through rate, engagement metrics).
- Search Infrastructure: Monitor search query latency and relevance.
- Live Streaming: Monitor live streaming quality (buffering, latency).
- Security: Monitor for suspicious activity (failed login attempts, unusual traffic patterns).
Tools:
- Prometheus: For collecting and storing metrics.
- Grafana: For visualization and dashboards.
- Alertmanager: For alerting.
- Elasticsearch/Loki/Graylog: For log aggregation and analysis.
- Jaeger/Zipkin: For distributed tracing.
- Kafka: For streaming data.
Key Metrics to Watch:
- Error Rates: Track error rates for all APIs and services.
- Latency: Monitor request latency at various points in the system.
- Saturation: Monitor resource utilization (CPU, memory, disk) to identify potential bottlenecks.
- Traffic: Monitor network traffic to detect anomalies.

This is a high-level overview, of course. The specific implementation would depend on TikTok's architecture and specific requirements.

OS

Q: What is the OSI Model? A: The OSI (Open Systems Interconnection) model is a conceptual framework that describes how data travels across a network. It divides the process into seven layers, each with specific functions:

Physical Layer: Deals with the physical cables or wireless signals. It transmits raw data bits.
Data Link Layer: Handles error-free transmission between two directly connected nodes. It uses MAC addresses.
Network Layer: Determines the best path for data packets to travel from source to destination. IP addresses are used here.
Transport Layer: Provides reliable or unreliable data delivery between processes. TCP and UDP operate at this layer.
Session Layer: Manages connections between applications.
Presentation Layer: Handles data formatting, encryption, and decryption.
Application Layer: Provides network services to applications, like HTTP, DNS, and SMTP.

Network Concepts

Q: Describe the structure of URL A: A URL (Uniform Resource Locator) is a string of characters that provides a standardized way to locate a resource on the web. It's composed of several parts:

Scheme: Indicates the protocol used to access the resource (e.g., http, https, ftp).
Authority: Contains the following components:
- User Info (optional): Username and password (e.g., username:password@). Discouraged for security reasons.
- Host: The domain name or IP address of the server hosting the resource (e.g., www.example.com).
- Port (optional): The port number used to connect to the server (e.g., :8080). If not specified, the default port for the scheme is used (e.g., 80 for HTTP, 443 for HTTPS).
Path: Specifies the location of the resource on the server (e.g., /path/to/resource.html).
Query (optional): Contains parameters passed to the server (e.g., ?param1=value1&param2=value2).
Fragment (optional): Identifies a specific part or section within the resource (e.g., #section-name). Used by the browser to jump to that section.

Example: https://www.example.com:8080/path/to/resource.html?param1=value1&param2=value2#section-name

Q: HTTP & HTTPS, TCP/IP Protocols A:

HTTP (Hypertext Transfer Protocol) and HTTPS (HTTP Secure) are protocols used for transferring data over the internet. TCP/IP (Transmission Control Protocol/Internet Protocol) is a suite of communication protocols used to interconnect network devices on the internet. Let's break them down:

HTTP:

Definition: HTTP is the foundation of data communication on the web. It's an application-layer protocol that defines how messages are formatted and transmitted.
Functionality: It enables communication between web browsers and web servers. When you type a URL into your browser, it sends an HTTP request to the server. The server then responds with the requested data, which the browser displays.
Port: By default, HTTP uses port 80.
Security: HTTP is not encrypted, meaning data transmitted over HTTP can be intercepted and read by attackers.

HTTPS:

Definition: HTTPS is the secure version of HTTP. It adds a layer of security by encrypting the data transmitted between the browser and the server.
Functionality: HTTPS uses SSL/TLS (Secure Sockets Layer/Transport Layer Security) to encrypt the communication. This ensures that even if the data is intercepted, it cannot be read without the decryption key.
Port: By default, HTTPS uses port 443.
Security: HTTPS provides confidentiality, integrity, and authentication, making it much more secure than HTTP.

TCP/IP:

Definition: TCP/IP is a suite of protocols that govern how data is transmitted over the internet. It's the fundamental protocol suite that enables communication between devices on a network.
Layers: The TCP/IP model consists of four layers:
1. Application Layer: Provides network services to applications (e.g., HTTP, FTP, SMTP).
2. Transport Layer: Provides reliable or unreliable data delivery between processes (e.g., TCP, UDP).
3. Internet Layer: Handles addressing and routing of data packets (e.g., IP).
4. Network Access Layer: Handles physical transmission of data (e.g., Ethernet, Wi-Fi).
TCP (Transmission Control Protocol):
- Provides reliable, ordered, and error-checked delivery of data.
- Uses a three-way handshake to establish a connection and a four-way handshake to terminate it.
- Suitable for applications that require reliable data transmission (e.g., web browsing, email).
UDP (User Datagram Protocol):
- Provides unreliable, unordered delivery of data.
- Does not establish a connection before transmitting data.
- Suitable for applications that can tolerate some data loss (e.g., streaming, online gaming).
IP (Internet Protocol):
- Handles addressing and routing of data packets between networks.
- Each device on the internet is assigned a unique IP address.

In summary, HTTP and HTTPS are application-layer protocols used for web communication, while TCP/IP is a suite of protocols that governs how data is transmitted over the internet. HTTPS provides a secure version of HTTP by encrypting the data using SSL/TLS. TCP provides reliable data transmission, while UDP provides unreliable data transmission.

Q: Explain about three-hand shake greeting and four handshake termination in TCP A:

The Transmission Control Protocol (TCP) uses a three-way handshake to establish a connection between a client and a server, and a four-way handshake to terminate the connection.

Three-Way Handshake (Connection Establishment):

SYN (Synchronize):
- The client sends a SYN packet to the server, indicating that it wants to establish a connection.
- This packet includes the client's initial sequence number (ISN), which is a random number used to track the data flow.
SYN-ACK (Synchronize-Acknowledge):
- The server responds with a SYN-ACK packet, acknowledging the client's SYN packet and also indicating its own initial sequence number (ISN).
- This packet includes both the server's SYN and an acknowledgment of the client's SYN (ACK).
ACK (Acknowledge):
- The client sends an ACK packet back to the server, acknowledging the server's SYN-ACK packet.
- This completes the three-way handshake, and the TCP connection is established.

Four-Way Handshake (Connection Termination):

FIN (Finish):
- The client sends a FIN packet to the server, indicating that it wants to close the connection.
- This means the client has no more data to send.
ACK (Acknowledge):
- The server sends an ACK packet back to the client, acknowledging the client's FIN packet.
- This indicates that the server has received the request to close the connection.
FIN (Finish):
- The server sends a FIN packet to the client, indicating that it also wants to close the connection.
- This means the server has no more data to send.
ACK (Acknowledge):
- The client sends an ACK packet back to the server, acknowledging the server's FIN packet.
- This completes the four-way handshake, and the TCP connection is terminated.

Q: What happen when you visit a website? A:

Okay, let's break down what happens when I visit a website, step by step:

Enter URL: You type a URL (e.g., www.example.com) into your web browser's address bar and press Enter.
DNS Resolution:
- The browser needs to find the IP address associated with the domain name (www.example.com).
- It first checks its local cache, then the operating system's cache.
- If not found, it queries a DNS (Domain Name System) server, starting with a recursive DNS resolver (usually provided by your ISP).
- The resolver may query root servers, top-level domain (TLD) servers (e.g., .com), and finally the authoritative name server for example.com to get the IP address.
- The DNS resolver returns the IP address to your browser.
TCP Connection:
- The browser initiates a TCP connection with the web server at the obtained IP address, typically on port 80 (for HTTP) or 443 (for HTTPS).
- This involves a three-way handshake:
  - SYN: The browser sends a SYN (synchronize) packet to the server.
  - SYN-ACK: The server responds with a SYN-ACK (synchronize-acknowledge) packet.
  - ACK: The browser sends an ACK (acknowledge) packet to the server.
HTTP/HTTPS Request:
- Once the TCP connection is established, the browser sends an HTTP or HTTPS request to the server.
- For example:
```
GET /index.html HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) ...
```
- If it's an HTTPS request, the browser and server negotiate a secure connection using SSL/TLS before sending the HTTP request. This involves exchanging certificates and establishing encryption keys.
Server Processing:
- The web server receives the request and processes it.
- It may retrieve the requested resource (e.g., index.html) from its file system or generate it dynamically.
- The server may also interact with databases or other backend systems to fulfill the request.
HTTP/HTTPS Response:
- The server sends an HTTP or HTTPS response back to the browser.
- For example:
```
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1234

<!DOCTYPE html>
<html>
<head>
<title>Example Website</title>
</head>
<body>
<h1>Hello, world!</h1>
</body>
</html>
```
- The response includes a status code (e.g., 200 OK), headers (e.g., Content-Type), and the content of the requested resource (e.g., HTML).
Browser Rendering:
- The browser receives the HTTP/HTTPS response and starts rendering the web page.
- It parses the HTML, CSS, and JavaScript code.
- It fetches any additional resources (e.g., images, stylesheets, scripts) specified in the HTML.
- It executes the JavaScript code, which may modify the DOM (Document Object Model) and make further requests to the server.
- Finally, the browser displays the rendered web page to you.
Connection Closure (Optional):
- After the response has been sent, the TCP connection may be closed.
- In HTTP/1.1, the connection may be kept alive for subsequent requests (persistent connection).
- In HTTP/2 and HTTP/3, connections are typically persistent and multiplexed.
- The connection can be closed by either the client or the server, using a four-way handshake (FIN, ACK, FIN, ACK).

Linux Operations

Q: On 1-10 Scale. what is your skill on operating linux? A: To be honest. i will not rate myself as expert in linux operation or sysadmin. I always try to solve case-by-case issue and use the necessary command to debug it. Also i often using my navi command to support for daily issue.

Q: General commands -> situational conditions on how to use it while debugging A: My Guidelines to do troubleshooting

When troubleshooting anything, first ensure you have the metrics that you will use to monitor/validate the error.
Create hypothesis.
Prove the hypothesis by observing the metrics.
Create the solution.

Here's how to use these commands in different situations:

CPU Performance Issues:
- top: Use top to get a real-time view of CPU usage. Look for processes with high CPU utilization. If you see high %iowait, it indicates that the CPU is waiting for I/O operations.
  - Usecase: Identifying a runaway process consuming excessive CPU, leading to system slowdown.
- vmstat 1: Use vmstat to check CPU usage (us, sy, id, wa, st). High wa (I/O wait) suggests I/O bottlenecks. Also, check r (runnable processes) to see if the CPU is overloaded.
  - Usecase: Detecting a sustained period of high CPU utilization across the system, indicating a need for capacity planning or optimization.
- pidstat -u 1 -p ALL: Use pidstat to identify which processes are consuming the most CPU.
  - Usecase: Drilling down to specific processes responsible for high CPU usage identified by top or vmstat.
- perf top: Use perf top to identify the functions that are consuming the most CPU time. Requires installation of the perf tool.
  - Usecase: Identifying hot spots in the code that are causing high CPU usage.
- Example Scenario: A service is running slowly. Use top to identify a process consuming high CPU. Then, use pidstat to confirm and investigate further with profiling tools like perf top to identify the specific functions causing the bottleneck.
Memory Performance Issues:
- free -m: Use free -m to check total, used, free, buff/cache memory. If available memory is low, the system is under memory pressure.
  - Usecase: Quickly assessing the overall memory usage of the system and identifying potential memory exhaustion.
- vmstat 1: Use vmstat to monitor memory usage (swpd, free, buff, cache, si, so). High si and so values indicate swapping, which slows down the system.
  - Usecase: Detecting excessive swapping, indicating that the system is running out of physical memory.
- cachestat: Use cachestat to monitor the Linux page cache.
  - Usecase: Analyzing the effectiveness of the page cache and identifying potential I/O bottlenecks.
- cachetop: Use cachetop to identify processes that are heavily using the page cache.
  - Usecase: Identifying processes that are heavily reading from or writing to disk.
- pmap -x <pid>: Use pmap -x to show the memory map of a process, including the size of each memory region.
  - Usecase: Identifying memory leaks or excessive memory allocation by a specific process.
- memleak: Use memleak to detect memory leaks in C/C++ programs.
  - Usecase: Detecting memory leaks in C/C++ applications.
- Example Scenario: An application is crashing with out-of-memory errors. Use free -m to check memory usage. If memory is low, use top or pidstat to identify memory-hogging processes. If swapping is high, investigate memory leaks using pmap -x or optimize memory usage.
I/O Performance Issues:
- vmstat 1: Use vmstat to check I/O wait (wa) and disk I/O (bi, bo). High wa and significant bi/bo values indicate I/O bottlenecks.
  - Usecase: Detecting a system-wide I/O bottleneck.
- pidstat -d 1 -p ALL: Use pidstat to identify processes performing heavy I/O.
  - Usecase: Identifying specific processes that are contributing to the I/O bottleneck.
- iostat -xz 1: Use iostat -xz to get detailed I/O statistics for each disk, including utilization, queue length, and service time.
  - Usecase: Identifying specific disks that are experiencing high I/O load.
- iotop: Use iotop to display real-time I/O usage by process. Requires installation of the iotop tool.
  - Usecase: Identifying processes that are performing a lot of I/O operations.
- Example Scenario: A database query is running slowly. Use vmstat to check I/O wait. Then, use pidstat to identify the database process and investigate slow queries. Use iostat to check the disk I/O and iotop to identify the process that generate high I/O.
Network Performance Issues:
- netstat -antp: Use netstat -antp to display active network connections and listening ports.
  - Usecase: Identifying established connections and the processes associated with them.
- ss -s: Use ss -s to display network socket statistics.
  - Usecase: Getting a summary of network usage, including the number of TCP and UDP sockets in different states.
- tcpdump -i <interface> -n -s 0: Use tcpdump to capture network traffic on a specific interface. Requires root privileges and installation of the tcpdump tool.
  - Usecase: Analyzing network traffic to identify bottlenecks or errors.
- Example Scenario: A web server is experiencing slow response times. Use netstat to check the number of established connections. Use ss -s to check the socket statistics. Use tcpdump to capture network traffic and analyze the packets.
General Debugging:
- top: Provides a dynamic real-time view of running processes, CPU usage, memory usage, and more.
  - Situational Use: Quickly identify resource-intensive processes.
- vmstat: Reports virtual memory statistics, including CPU usage, memory usage, swapping, and I/O.
  - Situational Use: Diagnose overall system performance bottlenecks.
- pidstat: Provides detailed statistics for processes, including CPU usage, memory usage, I/O, and more.
  - Situational Use: Pinpoint specific processes causing resource issues.
- free: Displays the amount of free and used memory in the system.
  - Situational Use: Check overall memory availability and usage.
- cachestat: Shows statistics of the Linux page cache.
  - Situational Use: Analyze the effectiveness of the disk cache.
- cachetop: Real-time display of kernel cache usage.
  - Situational Use: Identify processes that are heavily using the page cache.
- memleak: Detect memory leaks in C/C++ programs.
  - Situational Use: Find memory leaks.

Docker/Container/VM

Q: Difference between Container and VM? A: VMs (Virtual Machines) are like having separate, complete computer systems running on your machine. Each VM has its own operating system, kernel, and resources allocated from the host machine. Containers, on the other hand, are more lightweight. They share the host OS kernel but have their own isolated user space, libraries, and dependencies. So, VMs are heavier and provide more isolation, while containers are lighter and more efficient in terms of resource usage.

Q: How are layers created in Docker Images? A: Okay, here's the breakdown of how Docker image layers are created, step by step:

Dockerfile: It all starts with a Dockerfile. This file is like a recipe for building your image. It contains a series of instructions.
Instructions: Each instruction in the Dockerfile adds a new layer to the image.
Base Image: The first instruction usually specifies a base image (e.g., FROM ubuntu:latest). This is the foundation layer.
Commands: Instructions like RUN, COPY, ADD each create a new layer.
- RUN executes commands inside the container (e.g., installing software).
- COPY copies files from your computer into the container.
- ADD is similar to COPY but can also extract archives.
Layer Caching: Docker caches each layer. If a layer hasn't changed, Docker reuses it from the cache, which makes builds faster.
Union File System: Docker uses a union file system (like AUFS or OverlayFS) to combine these layers into a single image. Each layer is read-only, except the top layer, which is read-write.
Image Size: The final image is the sum of all the layers.

Q: Explain detail about kubernetes components A: Kubernetes is composed of several key components that work together to manage and orchestrate containers. Here's a breakdown of the main ones:

Control Plane Components: These components make global decisions about the cluster and detect and respond to cluster events.
- kube-apiserver: The API server is the front end for the Kubernetes control plane. It exposes the Kubernetes API, which is used by all other components to interact with the cluster.
- etcd: etcd is a consistent and highly-available key-value store used as Kubernetes' backing store for all cluster data.
- kube-scheduler: The scheduler watches for newly created Pods with no assigned node, and selects a node for them to run on.
- kube-controller-manager: The controller manager runs controller processes. Logically, each controller is a separate process, but to reduce complexity, they are all compiled into a single binary and run in a single process. These controllers include:
  - Node Controller: Responsible for noticing and responding when nodes go down.
  - Replication Controller: Maintains the desired number of Pods for each replication controller object.
  - Endpoint Controller: Populates the Endpoints object (that is, joins Services & Pods).
  - Service Account & Token Controllers: Create default accounts and API access tokens for new namespaces.
- cloud-controller-manager: The cloud controller manager runs controllers that interact with the underlying cloud provider. This is cloud-provider-specific controller logic.
Node Components: These components run on each node and maintain the running Pods and provide the Kubernetes runtime environment.
- kubelet: An agent that runs on each node in the cluster. It receives instructions from the API server and manages the containers running in a Pod.
- kube-proxy: kube-proxy is a network proxy that runs on each node in the cluster. It implements part of the Kubernetes Service concept.
- Container Runtime: The container runtime is the software that is responsible for running containers. Kubernetes supports several container runtimes: Docker, containerd, CRI-O, and any other CRI (Container Runtime Interface) compliant runtime.

Q: Describe the process of creating new pods in kubernetes (How the request goes to each component) A: Okay, here's a simplified step-by-step explanation of how a new Pod gets created in Kubernetes:

User Request: The process starts when a user (or a system) sends a request to create a Pod. This request is typically made using kubectl, the Kubernetes command-line tool, or through the Kubernetes API. The request includes a YAML or JSON file that describes the desired state of the Pod (e.g., which container image to use, resource requirements, etc.).
API Server: The request first reaches the kube-apiserver, which is the central management component of the Kubernetes control plane. The API server authenticates and authorizes the request. If the request is valid, the API server stores the desired state of the Pod in etcd, the cluster's distributed key-value store.
Scheduler: The kube-scheduler component watches the API server for new Pods that have not yet been assigned to a node. When it sees a new Pod, the scheduler tries to find the best node to run the Pod on, based on factors like resource availability, node affinity, and other constraints.
Kubelet: Once the scheduler has selected a node, it updates the Pod's definition in the API server with the selected node. The kubelet component, which runs on each node, watches the API server for Pods that have been assigned to its node. When the kubelet sees a new Pod assigned to its node, it takes over.
Container Runtime: The kubelet instructs the container runtime (e.g., Docker, containerd) to pull the required container image(s) from the specified registry (if they are not already present on the node) and then to create and start the containers defined in the Pod specification.
Pod Running: The container runtime creates and starts the containers. The kubelet continuously monitors the health of the Pod and its containers, reporting status back to the API server. If a container crashes or the Pod becomes unhealthy, the kubelet will attempt to restart it, based on the Pod's restart policy.
kube-proxy: The kube-proxy component, also running on each node, ensures that network traffic to the Pod is properly routed. If the Pod is part of a Service, kube-proxy configures the node's networking rules to forward traffic to the Pod.

Q: How scaling works, optimization for cluster. reach HA A: Okay, let's break down how scaling works in Kubernetes, focusing on achieving High Availability (HA) and cluster optimization:

1. Scaling:

Horizontal Pod Autoscaling (HPA): HPA automatically adjusts the number of Pod replicas in a deployment or replication controller based on observed CPU utilization, memory consumption, or custom metrics. To use HPA, you define a target metric value (e.g., 70% CPU utilization). HPA controller continuously monitors the metrics and adjusts the number of replicas to maintain the target value. kubectl autoscale deployment <deployment_name> --cpu-percent=70 --min=1 --max=10
Vertical Pod Autoscaling (VPA): VPA automatically adjusts the CPU and memory requests/limits of your pods to right-size them. It can either recommend values or automatically update the pods in place. VPA can help improve resource utilization and prevent pods from being scheduled on nodes that don't have enough resources.
Cluster Autoscaler: Cluster Autoscaler automatically adjusts the size of the Kubernetes cluster by adding or removing nodes based on the resource requests of pending pods. If there are pods that cannot be scheduled due to insufficient resources, the Cluster Autoscaler will provision new nodes. If nodes are underutilized, the Cluster Autoscaler will evict pods and remove the nodes.

2. High Availability (HA):

Replication: Running multiple replicas of your pods across different nodes ensures that your application remains available even if one or more nodes fail. Deployments and ReplicaSets are used to manage the desired number of replicas.
Multi-Zone/Multi-Region Clusters: Distributing your Kubernetes nodes across multiple availability zones (within a region) or multiple regions provides resilience against zone-level or region-level failures. Kubernetes can automatically reschedule pods from a failed zone to a healthy zone.
etcd Backup and Recovery: Regularly backing up the etcd cluster is crucial for disaster recovery. In case of etcd failure, you can restore from a backup to bring the cluster back to a consistent state.
Control Plane HA: For production environments, it's essential to run multiple control plane nodes (API servers, schedulers, controllers) to ensure that the control plane itself is highly available. A load balancer is used to distribute traffic across the API servers.
Pod Disruption Budgets (PDBs): PDBs allow you to specify the minimum number of replicas that must be available during voluntary disruptions, such as node maintenance or upgrades. This prevents disruptions from causing application downtime.

3. Optimization:

Resource Requests and Limits: Properly configuring resource requests and limits for your pods is essential for efficient resource utilization and preventing resource contention. Requests specify the minimum amount of resources a pod requires, while limits specify the maximum amount of resources a pod can use.
Node Affinity and Taints/Tolerations: Node affinity allows you to constrain which nodes your pods can be scheduled on, based on labels on the nodes. Taints and tolerations are used to repel pods from certain nodes, or to allow specific pods to be scheduled on those nodes. These mechanisms can be used to optimize resource utilization by scheduling pods with specific requirements on nodes that have the appropriate resources.
Resource Quotas: Resource quotas limit the total amount of resources that can be consumed by a namespace. This can prevent a single namespace from consuming all of the cluster's resources.
Limit Ranges: Limit ranges specify default resource requests and limits for pods in a namespace. This can help ensure that all pods in a namespace have appropriate resource configurations.

Processes & Thread

Q: What are latency & throughput? A: Latency is the time it takes for a request to travel from the client to the server and back. Throughput is the amount of data that can be processed per unit of time.

Coding Algorithm

Q: Binary Search some Dynamic Programming with substring A:

Q: https://leetcode.com/problems/best-time-to-buy-and-sell-stock/description/ A: Okay, so the goal is to find the maximum profit you can make by buying a stock on one day and selling it on a later day. Here's how you can do it:

Keep Track of the Minimum Price: Start by assuming the first day has the minimum price.
Iterate Through the Prices: Go through the prices day by day.
Update Minimum Price: If you find a price lower than the current minimum, update the minimum price.
Calculate Profit: For each day, calculate the potential profit by subtracting the minimum price from the current day's price.
Track Maximum Profit: Keep track of the maximum profit you've seen so far.
Return Maximum Profit: After going through all the prices, return the maximum profit.

In essence, you're trying to buy low and sell high.

HR Interview​

Technical Skill​

SRE Related​

OS​

Network Concepts​

Linux Operations​

Docker/Container/VM​

Processes & Thread​

Coding Algorithm​