ByteByteGo Technical Interview Prep Kit
Launching the All-in-one interview prep. We’re making all the books available on the ByteByteGo website.
What's included:
System Design Interview
Coding Interview Patterns
Object-Oriented Design Interview
How to Write a Good Resume
Behavioral Interview (coming soon)
Machine Learning System Design Interview
Generative AI System Design Interview
Mobile System Design Interview
And more to come
This week’s system design refresher:
The Lifecycle of a Kubernetes Pod
CI/CD Pipeline Explained
The Open Source RAG Stack
Some of the Most Popular Versioning Strategies
Production-Ready Data Science Book
The Testing Pyramid
SPONSOR US
The Lifecycle of a Kubernetes Pod
The Pod manifest is submitted to the API server and stored in etcd.
The scheduler selects a node for the Pod based on resources, affinity rules, and binds the Pod to that node.
The kubelet prepares the Pod by creating its network namespace, assigning an IP, mounting volumes, and pulling images if needed.
Containers move from Waiting to Running, with kubelet monitoring health probes for liveness and readiness.
Kubernetes tracks the Pod’s high-level phase from Pending to Running to Succeeded/Failed/Unknown.
Upon termination, Kubernetes sends SIGTERM (and SIGKILL if needed) for the individual containers in the Pod.
After termination, the resources are cleaned up, and the Pod details are removed from etcd.
Over to you: What else will you add to understand the Kubernetes Pod Lifecycle?
CI/CD Pipeline Explained
A CI/CD pipeline is a tool that automates the process of building, testing, and deploying software.
It integrates the different stages of the software development lifecycle, including code creation and revision, testing, and deployment, into a single, cohesive workflow.
The diagram below illustrates some of the tools that are commonly used.
The Open Source RAG Stack
Frontend Frameworks: Used to build frontend interfaces for the RAG apps. Some tools that can help are NextJS, VueJS, SvelteKit, and Streamlit.
LLM Frameworks: High-level orchestration of LLM pipelines, prompts, and chains. This includes tools such as LangChain, LlamaIndex, Haystack, HuggingFace, and Semantic Kernel.
LLMs: Generates final responses using large language models. Some open-source options include Llama, Mistral, Gemma, Phi-2, DeepSeek, Qwen, etc.
Retrieval and Ranking: Retrieves relevant chunks and ranks them based on relevance. Tools like Meta FAISS, Haystack Retrievers, Weaviate Hybrid Search, ElasticSearch kNN, and JinaAI Rerankers can help.
Vector Database: Stores and enables similarity search over vector embeddings. Common options include Weaviate, Milvus, Postgres pgVector, Chroma, and Pinecone.
Embedding Mode: Converts text/data into vector representations using ML. Some open-source tools include HuggingFace Transformers, LLMWare, Nomic, Sentence Transformers, JinaAI, and Cognita.
Ingest/Data Processing: Extracts, cleanses, and prepares raw data for indexing and retrieval. Tools like Kubeflow, Apache Airflow, Apache NiFi, LangChain Document Loads, Haystack Pipelines, and OpenSearch can help.
Over to you: Which other open-source tool will you add to the list?
What are some of the most popular versioning strategies?
Versioning helps developers communicate release changes clearly, whether it’s for software packages, APIs, or operating systems.
Semantic Versioning (SemVer): A versioning scheme that conveys meaning about the scope of changes in a release using the format MAJOR.MINOR.PATCH.
Calendar Versioning (CalVer): A scheme that uses the release date (year and month) as the version number to indicate when the release occurred.
Sequential Versioning: A simple numbering approach where versions increase in sequence without encoding compatibility details.
API Versioning: A method of embedding the version in an API’s URL path to manage and separate breaking changes.
Over to you: Which versioning strategy do you prefer, and why?
Production-Ready Data Science Book
My friend Khuyen Tran wrote a book called Production-Ready Data Science. If you’re looking for a book that bridges the gap between quick prototypes and robust solutions, check it out.
The book features practical examples and clear explanations that will help you master techniques for:
Transforming messy notebooks into organized, maintainable code,
Creating reproducible environments across teams and deployments,
Writing modular, reusable, and testable Python code,
Implementing robust data validation and error handling,
Among others.
Check out the GitHub repository containing practical implementations of every concept discussed in the book here.
The Testing Pyramid
Testing is the backbone of reliable software. The Testing Pyramid is a widely accepted strategy for structuring tests into three key layers:
Unit Tests: These are the foundation of the pyramid. Unit tests are fast, isolated, and low-cost to write and maintain. They test individual functions, methods, or components.
Integration Tests: These tests validate interactions between components, such as APIs, databases, and external services. They are slower than unit tests and require more setup.
E2E Tests: These simulate real user flows from start to finish across the full system. They are expensive to write and maintain and tend to be slow to execute.
As you go up the pyramid, the cost of test development, execution, and maintenance increases.
Over to you: Which layer do you find most valuable in your testing strategy, and why?
SPONSOR US
Get your product in front of more than 1,000,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing sponsorship@bytebytego.com.
No anything, about being or using, spam? I got an apology, please, have a good weekend.