Google's software engineering practices have evolved to manage our large scale. However, the underlying principles driving these practices are valuable and transferable to organizations of any size. This isn't about blindly copying Google, but about understanding the why behind their methods and adapting the what to your context. While I’ll draw from my own experience at Google, this article builds from the excellent open-source book "Software Engineering at Google" (SWEG), as well as other publicly available resources, to provide a pragmatic and nuanced perspective.
1. Cultivating a comprehensive testing culture
Google's "Beyoncé Rule" ("If you liked it, you should have put a test on it") is catchy, but the real takeaway is a deep commitment to developer-driven automated testing. This isn't just about catching bugs but about enabling change.[1] As SWEG states, automated tests allow software to change. They provide the confidence to refactor, upgrade dependencies, and add new features without fear of breaking existing functionality. See the chapter on Testing Overview for more details.
Test everything that matters: Focus on critical features and core business logic. Don't aim for 100% coverage overnight; start with the most important parts and expand gradually. This aligns with the Pareto Principle – 20% of the code often accounts for 80% of the value (and risk).
Test-driven development (TDD) or test-along: Writing tests before or alongside code clarifies requirements and encourages better design. It forces you to think about how the code will be used and what constitutes correct behavior.
Automated test execution: Continuous Integration (CI) is crucial. Every code change should trigger automated tests. Even small teams can use tools like GitHub Actions, Jenkins, CircleCI, or GitLab CI to automate this process.
Balance unit and integration tests: Unit tests provide fast feedback on individual components. Integration tests verify that components work together correctly. End-to-end (E2E) tests, while valuable, are typically slower and more brittle, so use them sparingly (SWEG suggests a mix of 80% unit tests and 20% broader-scoped tests in the chapter on Unit Testing).
The test pyramid: The Test Pyramid illustrates the general principle that the fast, and therefore numerous, unit tests should compose the basis of your testing, and UI tests should be the fewer tests, as they are more brittle and time consuming.
Practical considerations for smaller teams:
Start small: Begin with unit tests for critical functions. Introduce integration tests as the system grows.
Focus on testing areas prone to change or where bugs would have a high impact.
Mocking frameworks can isolate units for testing, but overuse can lead to brittle tests that don't reflect real-world interactions.
2. Establishing effective code review processes
Google's code review process is a cornerstone of their engineering culture.[2] It's a mechanism for knowledge sharing, mentorship, and maintaining code quality.[2] The SWEG chapter on Code Review provides extensive detail.
Pre-merge reviews: All code changes should be reviewed before merging into the main codebase. This is non-negotiable.
Small changes: This is perhaps the most crucial and universally applicable principle. Smaller changes are easier to review, understand, and test. They reduce cognitive load for the reviewer and minimize the risk of introducing large, complex bugs. SWEG and other sources strongly emphasize this. Studies have even shown that automated partitioning of reviews leads to fewer errors.
Code review as learning: Reviews should be constructive and educational. They're an opportunity for senior engineers to mentor junior engineers and for everyone to learn from each other.
Timeliness: Google often aims for reviews within 24 hours or less. Prompt reviews keep the development process flowing.
Practical considerations for smaller teams:
Establish clear guidelines: Document expectations for reviewers and authors. Google's own code review guidelines are publicly available.
Rotate reviewers: Ensure different team members review code to spread knowledge and prevent bottlenecks.
Don't be afraid to say "No": Reviewers should feel empowered to reject changes that don't meet quality standards.
3. Prioritizing comprehensive documentation
Documentation is often treated as an afterthought, but at Google, it's considered as important as the code itself. The key is to keep documentation alive and close to the code. The SWEG chapter on Documentation dives into this.
Document as you develop: Write documentation alongside the code, not as a separate task. This ensures it stays up-to-date.
Focus on the "why": Explain the reasoning behind design decisions, not just the what. This is crucial for future maintainers (including your future self).
Keep documentation close to code: Store documentation in the same repository as the code, ideally in formats like Markdown.
Treat documentation like code: Review, test, and maintain documentation with the same rigor as code. Outdated documentation is worse than no documentation.
Types of documentation: SWEG highlights the importance of different types of documentation: reference documentation, design docs, tutorials, conceptual documentation, and landing pages.
Practical considerations for smaller teams:
Use a wiki or shared document system: Google Docs, Notion or Confluence can be effective for collaborative documentation.
Require documentation updates as part of code changes.
Schedule time to review and update documentation, removing outdated or irrelevant information.
Even for smaller projects, writing a brief design doc before starting major work can save time and prevent costly mistakes.
4. Fostering a culture of knowledge sharing[3][4]
Google fosters a culture of knowledge sharing to prevent silos and accelerate learning.[2] This is discussed in SWEG's chapter on Knowledge Sharing.
Design docs and reviews: Design documents are detailed proposals for significant changes. They capture the rationale, trade-offs, and implementation details. Design doc reviews provide early feedback and ensure alignment.
Postmortems: Blameless postmortems are crucial for learning from incidents. They focus on identifying systemic causes and preventing recurrence, not assigning blame.
Internal tech talks and training: Regularly sharing knowledge through presentations, workshops, or brown bag lunches helps disseminate expertise.
"g3doc" (Google's internal documentation system): While you won't have access to g3doc, the principle of a centralized, searchable knowledge base is valuable.
Practical considerations for smaller teams:
Encourage documentation of decisions: Record key decisions and their rationale in a shared location.
Hold regular knowledge-sharing sessions: Even informal presentations or discussions can be beneficial.
Promote a culture of asking questions: Create an environment where it's safe to ask questions and seek help.
Blameless culture: If someone makes a mistake, ensure that people learn and there is no blame.
5. Implementing disciplined dependency management
Google's monorepo approach is somewhat unique, but their underlying principles for managing dependencies are universally applicable. SWEG has a dedicated chapter on Dependency Management.
Be deliberate: Understand the full impact (including security, licensing, and maintenance) before adding a new dependency.
Version carefully: Use semantic versioning (SemVer) and explicit dependency declarations (e.g., package.json, requirements.txt, Gemfile).
Consider long-term maintenance: Dependencies add ongoing overhead. Choose well-maintained, actively developed libraries.
Using dependency management tools helps to improve software quality, and minimizes vulnerabilities and bugs.
Practical considerations for smaller teams:
Regularly audit dependencies: Check for security vulnerabilities and outdated packages. Tools like npm audit or Dependabot can help.
Vendor dependencies (when necessary): For critical dependencies, consider vendoring to ensure availability and control.
6. Utilizing progressive rollouts and feature flags
Google's approach to releasing changes incrementally minimizes risk and allows for rapid iteration. While not directly covered as a standalone chapter, these concepts are interwoven with testing and deployment practices discussed throughout SWEG.
Progressive rollouts: Release changes to a small percentage of users initially, then gradually increase exposure while monitoring key metrics.
Feature flags (feature toggles): Control feature availability through configuration, not code changes. This allows you to enable or disable features without deploying new code.
Monitoring: Closely track metrics (performance, errors, user behavior) during rollouts to catch issues early.
Practical considerations for smaller teams:
Use a feature flag service: LaunchDarkly, Split, or other services provide feature flag management capabilities.
Start with simple rollouts: Begin with internal users or a small percentage of external users.
Automate rollbacks: Make it easy to roll back a feature if problems arise.
7. Refining incident response procedures
Google's approach to incident response is widely discussed and emphasizes learning and improvement. This aligns with the principles of blameless postmortems discussed in the context of knowledge sharing.
Clear response roles: Designate incident commanders and communicators.
Document everything: Keep detailed logs of what happened, decisions made, and actions taken.
Blameless postmortems: Focus on systems and processes, not individuals. The goal is to understand why the incident occurred and how to prevent it from happening again.
Follow through on action items: Ensure that lessons learned lead to concrete improvements.
Practical considerations for smaller teams:
Establish an on-call rotation: Even small teams should have a process for handling production issues.
Use a shared incident log: A simple document or spreadsheet can be sufficient for tracking incidents.
Hold regular postmortem meetings: Even informal discussions can be valuable for identifying areas for improvement.
8. Strategically managing technical debt
Technical debt is unavoidable, but it needs to be managed proactively. SWEG addresses this in the context of code health and maintenance.
Make technical debt visible: Track and quantify debt explicitly (e.g., using issue trackers or dedicated tools).
Allocate time for maintenance: Don't just focus on new features. Regularly dedicate time to refactoring and addressing technical debt.[2]
Refactor incrementally: Improve code quality gradually rather than through massive, risky rewrites.
Balance short-term and long-term: Consider both immediate business needs and long-term engineering health.
Practical considerations for smaller teams:
Static analysis tools can help identify areas of technical debt.
Prioritize debt reduction: Focus on addressing debt in areas that are frequently changed or cause the most problems.
Include debt reduction in sprint planning: Allocate time for technical debt tasks alongside feature work.
9. Prioritizing psychological safety in teams
Google's research (Project Aristotle) identified psychological safety as the most important factor in team effectiveness. This is foundational to Google's culture and is discussed in the SWEG chapter on How to Work Well on Teams.
Blameless culture: Focus on learning, not punishing.
Encourage speaking up: Value dissenting opinions and questions.
Admit mistakes: Leaders should model vulnerability and admit their own mistakes.
Structured feedback processes: Regular, constructive feedback helps improvement.
Practical considerations for smaller teams:
Create a safe space for team members to share ideas and concerns.
Regularly ask for feedback and be receptive to it.
Demonstrate vulnerability and a willingness to learn from mistakes.
Psychological safety significantly impacts success, proactively keep an eye on it.
I cover this topic in more depth in “Leading Effective Engineering Teams”
10. Leveraging AI in software engineering: Google's approach
Google is heavily invested in applying AI across its teams including software engineering. While much of this is internal, some key areas and takeaways are emerging:
AI-Assisted code completion and generation: Tools can provide intelligent code suggestions, autocomplete entire code blocks, and even generate code from natural language descriptions. This aims to boost developer productivity and reduce boilerplate.
Automated code review and bug detection: AI is being used to analyze code for potential bugs, security vulnerabilities, and style violations. This augments human code reviewers, catching issues early and improving code quality.
Test case generation and optimization: AI can help generate test cases, prioritize tests based on risk, and identify gaps in test coverage.
Refactoring and code transformation: AI tools can assist with large-scale refactoring tasks, identifying code smells, and suggesting improvements to code structure and design.
Practical considerations for smaller teams:
AI as a tool, not a replacement: AI should be viewed as a powerful tool to augment human developers, not replace them. Human judgment, creativity, and domain expertise remain essential.
Focus on augmentation, not automation: The most effective applications of AI in software engineering currently focus on assisting developers, not fully automating tasks. This is becoming more feasible for prototypes and MVPs.
Data quality and bias: AI models are trained on data, and the quality and biases in that data can significantly impact results. It's crucial to be aware of potential biases and limitations.
Conclusion
Not every Google practice is suitable for every organization. The key is to understand the principles behind these practices and adapt them to your specific context.
Start small, focus on the most impactful changes, and iterate. Remember that even Google didn't implement all these practices at once – they evolved over time.
The goal is to build a strong engineering culture that values quality, learning, and continuous improvement. These practices, at their core, are about building sustainable, maintainable systems and empowering engineers to do their best work.
Nice summary of engineering practices at Google!
Another principle I'd add is the number of checks and launch process dates depend on the criticality of the change (Google's Launch/Ariane system). A data migration or a privacy change will get a lot extra checks and consideration than a simple UI change.