# About Name: Facets.cloud Description: Platform Engineering made easy for large and complex infra setups. Set Ops guardrails and enable developer self-service with Facets' Internal Developer Platform. URL: https://blog.facets.cloud # Navigation Menu - Get a Demo: https://www.facets.cloud/demo - Home: https://blog.facets.cloud/ # Blog Posts ## Why Terraform Automation Tools Fall Short - And What Comes Next Author: Rohit Raveendran Published: 2025-05-08 Category: Tech Articles Meta Title: Facets vs. Terraform Automation Platforms Meta Description: Exploring the gap between Terraform workflow automation and true self-service infrastructure, with insights on governance and collaboration solutions. Tags: env0, terraform enterprise, self service infrastructure, developer self service, Terraform URL: https://blog.facets.cloud/terraform-alternative Tools like Terraform Enterprise, env0 etc helped us automate IaC workflows. They solved for execution, state management, and guardrails. But for platform engineers and DevOps leads, a deeper problem remains Infrastructure as Code is hard to maintain, and nearly impossible to scale across teams without bottlenecks. The reason? These tools are built with the assumption that only a core group of **Terraform practitioners aka infrastructure/platform engineers** write, maintain and execute the IaC. But in reality, for operations to scale, terraform practitioners, product developers and security teams must effectively collaborate to build and maintain their organization’s infrastructure. This requires careful abstraction and isolation. Problem: Workflow Automation ≠ Self-Service Infrastructure ---------------------------------------------------------- Let’s be clear: remote state, plan approvals, and policy-as-code are useful — but they’re table stakes now. Tools like Terraform Enterprise, env0 etc offer pipeline-based orchestration, but they stop short of solving the deeper operational bottlenecks. Teams still face  challenges in: \- Effectively collaborating over a single terraform project \- Dependency management across terraform projects \- Ensuring that environments do not drift apart \- Terraform sprawl — with modules copied, tweaked, and forgotten across teams \- So-called “self-service” that still expects product teams to know how to write or wire Terraform modules Why? Because everything still operates at the same low-level abstraction. Platform engineers are responsible for the how — and product teams are still dragged into it. Every environment, every service, every tweak requires a human manually stitching Terraform together. You can automate Terraform workflows — but if every team still has to think in Terraform, debug modules, and manage state, it’s not really self-service. It's a delegation dressed as automation. ### Facets.cloud: A Higher-Level Abstraction for IaC Facets.cloud takes a different route. It provides a **type-safe, declarative model over Terraform.** * **Platform teams define how it’s built**: using typed, versioned modules * **Product teams define what they need**: a database, a service, a cluster * **Facets generates Terraform to provision** using the same modules written by the platform team. ![Terraform vs](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/untitled-3-1746697439881-compressed.png) It’s not just a workflow engine. It’s an **abstraction layer that enables safe, scalable self-service**. 3 Real Problems Facets Solves ----------------------------- ### **1\. Collaboration Without Collisions** Terraform workflow automation tools don’t stop teams from stepping on each other’s toes. With Facets : * Project-specific Terraform is replaced by declarative blueprints, ensuring each environment has a clear, [centralized source of truth](https://readme.facets.cloud/docs/basic-concepts#blueprint:~:text=a%20Project%20documentation.-,Blueprint,-Blueprints%20are%20declarative). * Every module invocation is isolated by design—teams can safely mutate infrastructure without risking downstream breakage, while dependencies are still respected. * Facets supports [selective releases](https://readme.facets.cloud/reference/trigger-a-hotfix-release), allowing teams to promote changes module-by-module—enabling safe, independent deployments across teams and environments. **2\. Self-Service for Developers ** Other platforms offer “templates”—but they still expect developers to write or understand Terraform. With Facets: * Developers choose intents (e.g. “web service”, “database”), not low-level resources or IaC. * Infrastructure is provisioned automatically with policies, constraints, and best practices already enforced. * No Terraform knowledge required—platform teams define the how, so [developers can focus on the what](https://www.facets.cloud/articles/what-is-a-developer-self-service-platform). **3\. Governance by Design ** Most tools layer policies on top of Terraform. Facets builds governance into the core workflow. With Facets: * Platform teams define standards at the module level, ensuring every provisioned resource is compliant by default. * Inputs are typed, constrained, and validated—reducing errors before they happen. * Drift can’t sneak in—because changes only happen through approved blueprints and workflows. This isn’t just policy-as-code. It’s platform-as-code—built for scale and safety.** Facets doesn’t replace Terraform expertise — it makes it scale. Your team writes it once, and every other team benefits from it safely.** Don’t Just Automate Terraform. Abstract It. ------------------------------------------- Most IaC platforms help you run Terraform faster, safer, and in sequence. But they still expect every team to write, reuse, and wire modules on their own. **Facets flips the model:** Platform teams define how infrastructure is built — once — as typed, versioned modules with guardrails. Developers declare what they need. Facets generate the Terraform. Infrastructure scales like code: through safe reuse, strict contracts, and zero duplication. Platform engineers ship modules. Developers consume them — no tickets, no drift, no guesswork. ![Terraform alternative](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1746696352144-compressed.png) Facets vs. Terraform Platforms ------------------------------ **#** **Challenge** **Facets.cloud** **env0** **TFE (Terraform Enterprise)** 1 **Writing modular IaC with the right abstraction** ✅ Yes — type-safe module outputs, abstraction boundaries enforced by platform ❌ No — up to teams to design ❌ No — up to teams to design 2 **State management and automated execution** ✅ Yes — built-in orchestration, remote state, retries, drift detection ✅ Yes ✅ Yes 3 **Isolation across environments** ✅ Yes — environments are first-class with overrides and policies ✅ Yes ✅ Yes 4 **Rollout from lower to higher environments** ✅ Yes — native environment promotion workflows ⚠️ Partial — manual/pipeline-based ⚠️ Partial — requires scripting 5 **Isolation between logical resources to enable collaboration** ✅ Yes — dependency graph, per-resource visibility, and RBAC ⚠️ Partial — via modules/repos ⚠️ Partial — via workspace structure 6 **Well-defined dev workflow for IaC developers** ✅ Yes — versioned automation, testing flow, enforced interfaces ⚠️ Partial — VCS and hooks ✅ Yes — VCS, Sentinel policies 7 **Self-service for product teams with IaC safeguards** ✅ Yes — product teams define needs via UI, platform enforces how via typed automation ❌ No — product teams must write IaC ❌ No — requires IaC + Sentinel 8 **No need for project-specific automation** ✅ Yes — IaC is generated from high-level blueprint intent ❌ No — per-project pipelines required ❌ No — config and state per project 9 **Higher-level abstraction on top of Terraform** ✅ Yes — typed modules, structured inputs/outputs, architectural modeling ❌ No ❌ No Your Platform Is a Product. Build It That Way. ---------------------------------------------- If you're responsible for making infrastructure reusable, safe, and scalable — you're not just writing scripts. You're building a platform that other teams depend on. Terraform gave us automation. Facets gives you modularity, control, and developer enablement by design. * You define the building blocks — typed, versioned, and guardrailed. * Developers self-serve with confidence, without tickets, without drift. ​[Facets.cloud](https://www.facets.cloud/) doesn’t just run Terraform. It turns your platform team into product engineers — and your infrastructure into a service anyone can use. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Acceldata’s Journey to High-Velocity Enterprise Software Delivery Through Developer Empowerment Author: Facets.cloud Published: 2025-04-11 Category: Updates Meta Title: How to measure developer producitvity- an acceldata blueprint Meta Description: This blog takes you through how Acceldata improved enterprise software delivery by focusing on developer productivity, removing SDLC bottlenecks, and enabling self-service and automation. The team reached a true developer flow state without compromising on speed or quality. Tags: Acceldata, developer velocity, software delivery, developer speed, developer flow state URL: https://blog.facets.cloud/how-to-measure-developer-productivity-acceldata-blueprint ![](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/blogs-acceldata-webinar-1744375814645-compressed.png) In enterprise software delivery, **speed** has long been treated as a tradeoff, something to be sacrificed in favor of reliability, security, and stability. Releases were expected to take time; weeks or even months. Well, the folks at Acceldata, who handle data observability for giants like global telecoms and major banks (yeah, high stakes!), decided that wasn't good enough anymore. They looked around at modern blitzscaled enterprise products like Snowflake and Databricks and realized that enterprise expectations have changed over the last decade. They expect to receive continuous innovative features. In fact, **_Speed_** _is the_ **_only moat_** _to build a successful modern enterprise software company._ So they asked the hard question: **“Are our developers productive or can we ship faster?”** That’s where Acceldata’s journey began. ### **Setting the Stage** Acceldata had been around for 6 years. Their R&D team is 90 people – devs, QA, ops, project managers. And their tech stack? A bit of everything: Scala, Kotlin, Java, Go, Python and TypeScript running on Kubernetes across AWS and GCP. We're talking over 3 million lines of code; Complex stuff. Despite all that, they managed to ship **30 major releases in just 6 months.** That kind of pace is unusual for an enterprise software company. But it wasn’t just brute force or hiring more engineers, rather, it was the result of a deliberate focus on removing friction from the developer experience. In a recent webinar hosted by Acceldata CTO, Ashwin Rajeeva and Facets Co-founder and CEO, Pravanjan Choudhury,  shared the thinking behind their approach to developer productivity. His message was clear: > “It's not about tracking more metrics, it's about solving real, day-to-day bottlenecks that waste engineers’ time.” **Framing the Right Questions** ------------------------------- Like many teams, Acceldata evaluated industry benchmarks; DORA metrics, SPACE framework as well as Engineering KPIs. There’s no shortage of frameworks out there – each promising to quantify developer productivity. But here’s the problem: these models often assume a mature, stable process and baseline that doesn’t reflect the chaos or ambition of a growth-stage company. And with hundreds of metrics to choose from, it quickly becomes overwhelming. Worse, they don’t reflect how work actually happens. So the team went back to basics. What does developer productivity actually _mean_? Because let’s be honest- developer productivity is hard to measure. How do you measure productivity of knowledge workers? A developer may be more productive in 4 focused hours than working 12 distracted hours. So coding hours, lines of code and many such metrics are just report cards rather than giving any practical direction to improve velocity. The real question was: **Where is developer time being wasted?** ![Developer productivity](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/b64-1744373787173-compressed.png) And from there, they started asking the kind of questions that anyone running a software team should be asking: * Can _any_ developer create a build on their own—and how long does it take? * Can they test code locally, without waiting? * Is quality assurance automated and accessible? * Are developers waiting for shared QA environments to get free? * And how many manual steps still stand between a commit and production? Ashwin's view was that answering “yes” to these questions is a better proxy for productivity than any single metric. These questions formed the blueprint for how Acceldata approached improvement. Once they started answering these, the conversation shifted from improving metrics to eliminating waste. They also noticed another pattern: things were piling up at the tail end of the SDLC. Builds, tests, approvals, all clustered toward release time, requiring re-work and dragging down velocity. Acceldata then laid down the principles and designed systems and processes to improve developer productivity. ** Building the right systems with right principles** Acceldata’s principles revolved around close observations of developer friction and wastage - * Availability of compute cannot be a bottleneck to velocity * Automate everything * Everyone can, should and must be able to deploy - Dev, QE and even PMs and Documentation team * No manual intervention from commit to production With these principles in mind, Acceldata built internal platforms; namely KAASMOS and Sentinel to provide developer autonomy and reduce friction. With **KAASMOS**, any engineer could spin up a full-stack Kubernetes environment on demand—no Ops ticket, no waiting. This required engineering efforts for e.g., to boot up an environment < 5mins. Many companies shy away from developer environments assuming it is cost prohibitive. However, with right engineering investments, Acceldata could run 80 environments at a given time with monthly cost < $5000. Ashwin notes  > “CPU is cheap, context switching is not!” With **Sentinel**, everything from builds and tests to security scans and releases moved into a single, self-serve platform. Everyone can see what is happening and what is blocked. Many failures that used to get detected later in the SDLC stage and caused the teams to scramble, now get captured early. **Achieving Developer Flow State** ---------------------------------- This is where things really changed. With the internal platforms in place, developers moved with purpose. They were no longer dependent on Ops or waiting on approvals. They had the autonomy to build, test, and ship on their own terms. They could control their own workflow—from the moment they wrote code to the moment it went live. Automation took over the manual, repetitive tasks, things that used to steal hours of focus. Developers weren’t stuck doing Ops work anymore. They were doing _their_ work. And security? It wasn’t an afterthought anymore. It was just part of the flow. Every build came with integrated scans and checks, so issues were caught early when they were easier (and cheaper) to fix. Over time, something even bigger started to shift: **the culture**. Because the tools were built around how developers actually work. The workflows made sense. And because everything from tests to releases to sign-offs lived in one place, collaboration became way easier. Everyone could see what was happening. **Lessons for Engineering Leaders** ----------------------------------- Acceldata’s story is a blueprint for any team looking to increase velocity in a demanding, enterprise-grade environment. A few core takeaways: * **Start with common sense.** Focus on eliminating obvious bottlenecks before layering in complex metrics. * **Invest in foundational infrastructure.** Reliable, on-demand environments are table stakes for speed. Often, compute is cheaper than the developer wastage. * **Build for your process.** Tools should reflect how your teams actually work,not how a vendor thinks they should. * **Reduce developer waste.** TicketOps, manual testing, and waiting for environments are all forms of productivity debt. * **Keep communication tight.** When introducing new systems, clear rationale and visibility are key to adoption. **Final Word** -------------- Acceldata didn’t achieve enterprise-grade delivery velocity by cutting corners. They did it by investing in the right foundations, platforms, processes, and principles, that empowered their teams to do great work at speed. Their experience shows that even in highly regulated, high-stakes domains, it’s possible to move fast and maintain quality, if you remove the friction and build for your developers. Watch the full conversation with Acceldata’s CTO, where he breaks down the exact tools, decisions, and mindset shifts that made this transformation possible. **👉 \[[Access the on-demand webinar now](https://www.facets.cloud/webinars/acceldata-developer-productivity)\] and see what it really takes to build for speed and scale.** --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## How MPL Migrated 130 Services from AWS to GCP Without Sacrificing Feature Velocity Author: Facets.cloud Published: 2025-04-07 Category: Updates Meta Title: MPL's AWS to GCP cloud migration Meta Description: Discover how Mobile Premier League (MPL) executed a high-stakes cloud migration from AWS to GCP while improving developer efficiency and cutting infrastructure costs by 20%. Learn strategies for incremental migration, SDLC transformation, and cross-cloud orchestration using Facets.Cloud. Tags: Mobile premier league, Cloud migration, AWS to GCP, developer self service URL: https://blog.facets.cloud/mpl-aws-to-gcp-cloud-migration ![aws to gcp cloud migration](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/blogs-mpl-webinar-1-1744375909796-compressed.png) The Unavoidable Migration Challenge ----------------------------------- For [Mobile Premier League (MPL)](https://www.facets.cloud/case-study/ggx)—a gaming platform serving 120+ million users across 34 geographies with 15,000-20,000 concurrent users per game—cloud migration was not a decision taken lightly. As Kaustubh Bhoyar, SVP of Engineering at MPL, had a stark admission: > _"Don't do migrations unless it's absolutely required."_ This wasn't just theory. MPL had already optimised their AWS infrastructure costs by over **50%** through targeted optimisations by their SRE team. ### **💡 CTO Insight:** Exhaust all in-platform optimisation options before considering migration. Assign a small task force to identify cost-saving opportunities—focus on right-sizing resources, reserved instances, and storage optimisation. However, in 2023, regulatory changes increased operational costs, pushing MPL to cut down on cloud spending. After exhausting all traditional cloud optimization avenues, they needed a long-term partner and found GCP to be the best fit due to a favorable pricing agreement and strategic alignment. But migration came with risks: * **Downtime would cost millions in GMV**—business continuity was critical. * **Feature development couldn't stop**—"all hands on deck" wasn't an option. * **Incomplete service documentation**—130+ microservices with unknown dependencies. * **An already stretched SRE team**—just **8 engineers** to execute the migration. **Strategic Migration: Turning Migration into an Opportunity to Clear SDLC Debt** --------------------------------------------------------------------------------- Like most high-growth startups, MPL focused on rapid product development, accumulating software delivery lifecycle (SDLC) debt along the way. They realised that if they could address SDLC inefficiencies alongside migration, they could land on GCP not just cost-effectively, but with significantly improved developer efficiency—delivering higher ROI in the long run. ### **Key SDLC Challenges Before Migration:** * **Time-consuming Releases:** Handovers between QA, Dev, and SRE teams reduced feature velocity. * **Ops Burnout:** SRE teams spent more time managing releases than on engineering excellence.  * **Toolchain Fragmentation:** Multiple tools (Terraform, Harness, Security, Observability) lacked centralised management. ### **CTO Insight** 💡 Use migration as an opportunity for transformation. Identify and prioritize 3-5 key bottlenecks in delivery workflows and design migration. Design your migration plan to address these issues concurrently with the infrastructure transition. **Executing the Migration Without Sacrificing Feature Velocity** ---------------------------------------------------------------- To ensure smooth migration, MPL formed a task force and partnered with key platform providers: ### **1️⃣ Central Migration Team** *  Managed infrastructure, security, and database migrations. *  Handled Kafka clusters and core platform services. *  Configured cross-cloud connectivity. ### **2️⃣ GCP Professional Services** *  Provided cloud-specific expertise. *  Helped define network architecture & project structure. *  Conducted knowledge transfer sessions. ### **3️⃣ Platform Orchestrator (Facets.Cloud)** Initially, MPL considered building its own Terraform modules but realized they needed an advanced platform to streamline provisioning, automation, and developer self-service. * **Terraform-based orchestration** for provisioning GCP environments and configuring all operational tool-chains.  * **Config-switching**—easily migrate or roll back services between AWS & GCP; in a self-service manner  * **Self-serve release management** for developers, without depending on SRE teams ### **CTO Insight:** 💡 Before migrating, select a platform that supports orchestrating cloud and toolchains. Standardize configuration management to ensure consistency across environments. > _As Kaustubh put it: "If a platform exists, why build it in-house?"_ **Technical Implementation: Architecting a Reliable Migration** --------------------------------------------------------------- ### **1️⃣ Dedicated Cross-Cloud Connectivity** The cornerstone of MPL's migration architecture was dual (active-active) dedicated fiber connections between AWS Mumbai and GCP Mumbai: *  Sub-2ms latency—equivalent to intra-cloud AZ latency. *  Primary link for service-to-service communication. *  Secondary link for database replication & failover. ### **CTO Insight:** 💡 **Invest in dedicated cross-cloud connectivity**—it minimizes risk and ensures a seamless incremental migration. As Kaustubh described it as "just like running a service in another availability zone," giving the team confidence to proceed with an incremental approach. ### **2️⃣ Service Discovery & Traffic Routing** *  Dual ingress controllers (AWS & GCP). * ZooKeeper-based service discovery for cross-cloud compatibility. *  Hot-reload configurations for real-time updates. * Configuration-based traffic control for seamless rollbacks. ➡️ This de-risked the migratio**n** by allowing services to be toggled between clouds with minimal disruptions. ### **3️⃣ Migrating Service "Islands" Together** Instead of migrating services in any order, MPL: * **Mapped dependencies** to identify tightly coupled services. * **Grouped interdependent services** into "islands."  * **Migrated islands together** to minimise cross-cloud chattiness. ➡️ Result: Reduced latency and ensured a smooth user experience. ### **4️⃣ Database Migration Strategies** For different services, MPL used different strategies: * **Low-criticality services:** Brief (2-3 min) downtime during off-peak hours. * **High-criticality services:** Real-time replication, and validation before cutover.  * **Limited database interaction:** Migrated services first while keeping databases on AWS. ➡️ Dedicated fiber lines ensured stable cross-cloud DB communication during migration. ### **5️⃣ Overcoming Technical Challenges** The migration wasn't without challenges. Kaustubh identified several technical hurdles they overcame: * **IP exhaustion** in Kubernetes. * **GCP’s 50 DBs per project limit**—requiring architectural adjustments. * **Migration pauses during IPL season** due to traffic spikes. ➡️ These challenges added one month to the timeline but were handled within buffer limits. **The Results: More Than Just Cost Savings** -------------------------------------------- * 20% infrastructure cost savings achieved in 7 months. * Release velocity increased to 8-10 services per week. * QA teams now handle releases independently via self-service. * Performance testing environments provisioned in minutes (instead of days). * Better observability & resource utilization. * SREs focus on engineering excellence, not ops grunt work. **As Kaustubh summed it up:** > _"Was it challenging? Absolutely. Was it worth it? Without a doubt—our engineering operations are running far more efficiently now."_ **Conclusion: The Strategic Value of a Well-Executed Migration** ---------------------------------------------------------------- MPL’s journey shows that cloud migration is not just about infrastructure—it’s about: ✅ Driving efficiency across development and operations. ✅ Improving SDLC and release processes. ✅ Reducing operational burdens on engineering teams. ✅ Future-proofing scalability while cutting costs. For CTOs and engineering leaders, the lesson is clear: if migration is necessary, leverage it as a transformation opportunity, not just a lift-and-shift project. Facets' Take on MPL's Cloud Migration ------------------------------------- MPL’s incremental migration approach, enabled by Facets' orchestration, allowed them to maintain business continuity while fundamentally improving engineering operations. Their focus on developer empowerment, cross-cloud connectivity, and automation helped them achieve long-term efficiencies—proving that a well-planned migration can be a game-changer. 📌 This article is based on insights shared by Kaustubh, SVP of Engineering at Mobile Premier League, during a [Facets.Cloud webinar](https://www.facets.cloud/webinars/mpl-cloud-migration-aws-to-gcp) on cloud migration strategies. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Mastering Cloud Migration: Converting Challenges into Strategic Advantages Author: Facets.cloud Published: 2025-04-05 Category: Updates Meta Title: Porter's large-scale cloud migration from AWS to GCP Meta Description: Learn how Porter successfully executed large-scale AWS to GCP cloud migration for their logistics platform. Discover phased migration strategies, database innovations, and automation techniques that transformed 100+ microservices while maintaining business continuity for 15M+ customer base. Tags: AWS, PORTER, GCP, Cloud migration URL: https://blog.facets.cloud/large-scale-cloud-migration-from-aws-to-gcp ![Large scale Cloud migration from AWS to GCP](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/blogs-porter-webinar-1745240778257-compressed.png) In an exclusive webinar hosted by Facets Cloud, we explored transformative insights from a successful large-scale cloud migration with Ankit, VP of Engineering at Porter. This article captures key insights from their strategic journey from AWS to GCP, highlighting how unexpected challenges led to organizational evolution and technical innovation. About Porter: ------------- Porter, one of India's top logistics companies, operates in 22 cities and two international markets. Serving 11.5 crore customers with 7.5 lakh driver partners, they offer services from two-wheeler deliveries to house movers. Their 250+ engineers manage a complex tech stack including Kotlin, Ruby on Rails, Kafka, and MySQL. Featured Tech Leader: --------------------- VP of Engineering at Porter, Ankit leads a 125+ engineering team after building mission-critical systems at Amazon Pay and Myntra. His expertise spans from scaling distributed systems to transforming engineering organizations. **What You'll Learn:-**  ------------------------ Strategic approaches to [cloud migration](https://www.facets.cloud/cloud-migration) at scale * How losing a cloud service partner became a catalyst for self-reliance * Why conventional migration approaches sometimes need to be challenged  * How to balance team autonomy with migration speed - Critical technical decisions that shaped the migration's success  * Practical insights for leaders planning similar migrations * Post-migration optimization strategies **Why This Matters:-** ---------------------- For technology leaders planning cloud migrations, the journey often seems straightforward: engage partners, follow established playbooks, and execute.  However, real-world migrations rarely follow the textbook approach. This story offers valuable insights for organizations facing similar challenges, especially those dealing with: * Legacy systems that are built over time. * Serving real-time critical workloads. * Complex organizational dynamics  * Ambitious modernization goals  * Cost optimization imperatives Migration Context And The Story After: -------------------------------------- When Porter, a rapidly growing logistics platform, initiated their cloud migration, the stakes were significant. With millions of users depending on their services and hundreds of microservices to migrate, every decision carried weight. Their journey began with familiar objectives—infrastructure cost optimization and software delivery lifecycle modernization—but evolved into a masterclass in building self-reliant capabilities. >  "The higher the skin in the game, the higher the commitment and better the results." Technical Landscape and Challenges: Porter's technical landscape reflected the complexity familiar to many growing organizations: a decade-old Ruby on Rails monolith, gradually surrounded by over 100 microservices written in Kotlin. Their data layer handled millions of queries per second, with a MySQL database that had grown to hundreds of terabytes. The system was stable but showed the architectural decisions of different eras—exactly the kind of complexity that made their cloud partners nervous. Expectations Vs. Reality : -------------------------- The migration started with a traditional approach: engaging recommended cloud service partners, following standard patterns, and planning for single-phase execution. However, two significant challenges emerged: their asset footprint had grown 40% during the planning phase alone, and their initial partner's estimated budget proved insufficient for their scale. >  "At some point it felt like a crazy call, but it turned out to be a brave call." And The Pivot : Instead of seeking another full-service partner, Porter made a strategic decision to own the migration while hiring individual developers from partner companies. This approach required rapidly building expertise in: * Kubernetes orchestration * Infrastructure as Code with Terraform * Modern CI/CD practices with ArgoCD * Cloud-native architecture patterns The Power of Phased Migration: ------------------------------ Despite cloud providers advocating for single-phase migrations, Porter chose a three-phase approach that proved transformative: 1\. Learning Phase: Starting with newer, less critical services 2\. Momentum Phase: Covering moderate-scale applications 3\. Core Migration: Handling high-traffic critical services >  "What took nine days in the first batch was accomplished in half a day by the final batch." Some Technical Deep Dives : --------------------------- ### Database Migration Innovation Porter developed a multi-region migration strategy that significantly reduced their migration window. Their approach focused on regional proximity and internal network optimizations, maintaining data consistency while minimizing transfer times. ### Automation Evolution One of the most striking transformations was their cutover process evolution. What began as high-stress events requiring manual coordination across war rooms evolved into automated, monitored processes. "When I enter a room during our final cutover and ask 'What's going on?', the team is yawning. Everything is automated now—just scripts executing with a click of a button. Compare this to our first migration where everyone was tense, focused, and executing manual steps every few minutes. That's when you know your automation is working—when your cutovers become boring." **Lessons That Stayed With Porter** ----------------------------------- The most valuable insights from our journey weren't about technical tools or cloud features—they were about organizational capability and control: * **Own Your Journey**: While partners can provide expertise, maintaining strategic control internally leads to better outcomes. * **Invest in Learning**: What seems like slow progress initially accelerates dramatically as team capability grows. * **Embrace Iteration**: Each phase should make the next one better. Perfect processes aren't built, they're evolved. * **Automate Progressively**: Start with manual steps to understand the process, then automate incrementally. Platform Engineering Perspective: ------------------------------------ At Facets Cloud, we've observed that successful cloud migrations share common elements with effective platform engineering practices. Porter's journey exemplifies how organizations can use migration as an opportunity to build robust platform engineering capabilities: * Standardizing infrastructure provisioning * Implementing developer self-service * Establishing automated compliance * Building reusable components Looking Ahead:  --------------- For organizations planning [cloud migrations](https://www.facets.cloud/articles/dynamic-cloud-interoperability-redefining-cloud-agnosticism), Porter's journey offers valuable insights into balancing partner expertise with internal capability building. Their experience demonstrates that successful cloud migration isn't just about moving workloads—it's about building lasting technical capabilities and organizational resilience. **A Bit About Facets.Cloud:** ----------------------------- Planning your cloud migration journey? Explore how platform engineering can accelerate your cloud transformation.  Connect with [Pravanjan](https://www.linkedin.com/in/pravanjan) to discuss your platform engineering challenges, or check out [Facets Cloud](https://facets.cloud) to see how our platform engineering solution can help standardize your cloud infrastructure and empower your development teams. \-- \*This article is part of Facets Cloud's technical leadership series, where we explore real-world platform engineering and cloud transformation journeys.\* --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## How We Scaled GitHub App Integration for Per-customer deployments Author: Anuj Hydrabadi Published: 2025-04-02 Category: Tech Articles Meta Title: How To Scale GitHub App Integration for Per-customer deployments Meta Description: Facets solved GitHub App integration challenges in a multi-tenant architecture using a centralized callback service. Discover solutions for GitHub callback URL limits, enterprise GitHub support, and secure token management across isolated control plane instances. Tags: github, multi-tenant environments, github app integrations, cross control plane service, github enterprise URL: https://blog.facets.cloud/scaling-github-app-integration-for-per-customer-deployments At Facets, we recently overhauled our GitHub integration strategy for our control plane product. Unlike traditional multi-tenant SaaS platforms that share a single backend, our architecture provides each customer with a dedicated control plane instance, deployed under a unique domain. While this design ensures strong isolation and flexibility, it posed challenges when integrating with GitHub, specifically: 1. **Limited Callback URLs**: GitHub Apps only support up to 10 callback URLs, which becomes problematic when each customer has a dedicated control plane instance with its own domain. 2. **Missing State in Approval Flows**: When non-owners initiate installations requiring approval, GitHub doesn't preserve the state parameter in the approval callback, making it difficult to route back to the originating control plane. This is a [well-documented limitation](https://github.com/orgs/community/discussions/42351) in the GitHub community. 3. **Enterprise GitHub Support**: Supporting both GitHub.com and self-hosted GitHub Enterprise instances requires handling different API endpoints, authentication flows, and routing mechanisms. This blog post details how we designed a lightweight "cross-control-plane" service that acts as an intermediary between GitHub and individual control plane instances, solving these challenges while maintaining our multi-tenant security boundaries. The Problem with Personal Access Tokens --------------------------------------- Initially, we relied on Personal Access Tokens (PATs) for GitHub integration. While this approach worked, it came with significant limitations: * PATs are tied to individual user accounts rather than organizations * They have indefinite lifespans unless manually rotated * They are less secure * GitHub's [official guidance](https://docs.github.com/en/apps/creating-github-apps/about-creating-github-apps/deciding-when-to-build-a-github-app#github-apps-can-act-independently-of-or-on-behalf-of-a-user) recommends GitHub Apps for production integration scenarios This approach created **real operational issues**—for example, one of our customers lost GitHub access when the engineer who set up their integration left the company. Their **PAT expired without warning**, breaking their automation. Since implementing GitHub Apps, we've eliminated common PAT-related issues: * No more unexpected token expirations * No disruptions when the PAT creator leaves the company * No functionality breakage due to insufficient permissions The Challenge: GitHub Apps in a Multi-Tenant Environment -------------------------------------------------------- GitHub Apps offer significant advantages—granular permissions, installation-based access, and automated token rotation. However, implementing GitHub Apps in our multi-tenant architecture presented unique challenges: **The Callback URL Limitation**: When registering a GitHub App, you can specify up to 10 callback URLs. With each customer having a dedicated control plane instance, this limit becomes problematic at scale. Consider our customer control plane URLs: * **cust1.facetsapp.cloud** *  **facets.cust2.com** *  **cust3.console.facets.cloud** ... and many more The callback URL is essential as it receives the installation ID required to complete the app installation process. Our Solution: Cross-Control-Plane Service ----------------------------------------- We designed a centralised, lightweight service (which we call the "cross-control-plane" service) that acts as an intermediary between GitHub and individual control plane instances: The cross-control-plane service has two primary responsibilities: 1. Accept callbacks from GitHub and redirect them to the appropriate control plane 2. Maintain a small MongoDB database to temporarily store installation requests ### Integration Flow Below is a simplified diagram of our integration flow, showing both standard and enterprise GitHub scenarios: ![standard and enterprise GitHub scenarios](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/mermaid-diagram-2025-04-02-140927-1743585143955-compressed.png) ### Standard Flow (Owner Initiates) **1\. Installation Initiation** : When a user who is an organisation owner initiates GitHub App installation from their control plane (e.g., **cust1.facetsapp.cloud**) we: 1. * Generate a unique state parameter containing the originating control plane URL * Direct the user to GitHub's app installation flow with this state parameter **2\. Callback Handling**: GitHub redirects to our central callback endpoint with: https://cross-control-plane.facets.cloud/github/callback?installation_id=58587767&setup_action=install&code=94251652e13379fded53&state=xyz **3\. Redirection**: Our cross-control-plane service: 1. * Unpacks the state parameter to identify the originating control plane * Forwards the installation ID to the correct control plane This simple flow works when the user initiating the installation has sufficient permissions. ### Approval Flow (Non-Owner Initiates) For organisations where a non-owner initiates the installation (e.g., from **cust3.console.facets.cloud**): **1\. Initial Request**: GitHub sends a request callback: https://cross-control-plane.facets.cloud/github/callback?code=94251652e13379fded53&setup_action=request&state=def **2\. Database Storage**: Our cross-control-plane service: * Exchanges the code to get the GitHub user ID * Calls the GitHub API to find the installation request * Stores in MongoDB: { targetId: "12345", // GitHub organization ID custId: "default", // "default" for github.com state: "def" // Original state parameter } **3\. Approval Callback**: Later, when an organization owner approves the installation: https://cross-control-plane.facets.cloud/github/callback?code=aafa9de8250d49d080de&installation\_id=58587767&setup\_action=install **4\. State Recovery**: We get installation details to find the target organisation ID: * Look up the state in MongoDB using the composite key (targetId, custId) * Redirect to the original control plane with the recovered state and installation ID * Remove the record from MongoDB The database schema uses a composite unique key of  **(targetId, custId)** to handle duplicate requests. Enterprise GitHub Considerations -------------------------------- For customers using self-hosted GitHub Enterprise instances, like **github.cust2.com** we implemented additional workflows: #### Path-Based Routing We use path-based routing in our cross-control-plane service: * **/github/callback-** For github.com installations * **/github/callback/cust2**\- For the enterprise GitHub at Example enterprise GitHub callbacks: https://cross-control-plane.facets.cloud/github/callback/cust2?code=88776655e13379fdaabb&setup_action=request&state=ghihttps://cross-control-plane.facets.cloud/github/callback/cust2?code=ccdd9de8250d49d0effa&installation_id=98765432&setup_action=install In our database, we store: { targetId: "67890", // Enterprise GitHub organisation ID custId: "cust2", // Enterprise customer identifier state: "ghi" // Original state parameter} Implementation Challenges ------------------------- During implementation, we encountered several significant challenges: ### 1\. Managing the Approval Flow The most difficult aspect was handling the organization approval flow correctly. Specific challenges included: * **Duplicate Requests**: Multiple installation requests might be initiated for the same organization (same target ID). We needed to implement logic to handle these duplicates and maintain a single entity in our database per organization. * **Correlation Without State**: Designing a reliable method to correlate approval callbacks with the original requests without having the state parameter required careful API usage and database design. This limitation is well-documented and remains an open issue in the GitHub community. You can follow the ongoing discussion about this challenge in [this GitHub community thread](https://github.com/orgs/community/discussions/42351). ### 2\. Enterprise GitHub Setup Integrating with enterprise GitHub instances presented unique challenges: * Each enterprise instance requires separate configuration * Routing callbacks correctly without explicit enterprise identifiers * Managing multiple sets of credentials (client IDs, secrets, and private keys) * Testing complex flows across various GitHub Enterprise versions Token Management and Security ----------------------------- Our control plane instances use the installation ID to: * Generate installation access tokens (valid for one hour) * Cache these tokens for 59 minutes to avoid rate limits * Use these tokens to access the customer's blueprint repositories Additional security measures include: * One-time webhook tokens to prevent replay attacks * OAuth user verification to ensure the installation was performed by the expected user * Correlation of GitHub user IDs with installation requests * Automatic token rotation Key Tips for GitHub App Integration in Multi-Tenant Environments ---------------------------------------------------------------- * Does your app support Github enterprise server? If so, once the customer creates the app in their server from the manifest, make sure you store the private key and client secret securely. * Not getting the **code** param in the callback? Check "Request user authorisation (OAuth) during installation" setting in the app configuration. * A user can cancel an installation request and there is no callback or webhook from GitHub for this, so if they re-request after canceling you may get duplicates in your DB - make sure you handle this! Conclusion ---------- Implementing GitHub Apps in a multi-tenant architecture required creative solutions to address GitHub's callback limitations. Our cross-control-plane service approach provides a scalable and secure way to integrate with GitHub while maintaining isolation between customer instances. The lightweight design of our cross-control-plane service—focusing solely on callback handling and temporary request storage—allows us to maintain a clean separation of concerns while solving the multi-tenant callback challenge. By sharing our approach, we hope to help other development teams facing similar challenges with multi-tenant architectures and GitHub integration. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Why Environment Lifecycle Management is Broken – And How to Fix It? Author: Rohit Raveendran Published: 2025-03-31 Category: Tech Articles Meta Title: Environment management with CRUD Operations Meta Description: A practical take on fixing environment lifecycle management. Learn how developers can instantly Create, Read, Update, and Delete cloud environments with full automation, clear visibility, and zero Ops bottlenecks. Tags: CRUD Operations, environment lifecycle, environment management, Environment CRUD URL: https://blog.facets.cloud/environment-management-and-crud-operations Companies often think of cloud environments as assets they've built, but in reality, they should be commodities—something that can be **created**, **updated**, or **deleted** at any time. **What do I mean by that?** When environments are treated as fixed investments, teams hesitate to spin up new ones, seeing them as too much effort for little utility. But when environments are easily provisioned and managed, they become enablers rather than burdens. A system that makes the cloud environment lifecycle effortless unlocks: * Scaling across Regions - Easy launch and no overhead of maintenance when you expand your business to new regions or cloud * Performance testing on demand – No waiting, just spin up, test, and tear down. * Feature validation without constraints – Test changes in isolation, without impacting shared environments. * Fast disaster recovery – Recreate known-good states instead of firefighting broken ones. * Lower MTTR (Mean Time To Recovery) – Detect configuration drifts easily and roll out fixes in minutes The real asset isn't the environment. It's the centralization of information that facilitates a repeatable environment creation and maintenance process that is the real asset. **Why Do "Environments" Matter More?** -------------------------------------- **Scenario 1:** Modern agile teams often work on large features in parallel and require dedicated test beds for functional and performance testing. A developer wraps up their feature and says, _"Can I have an isolated environment to test this?"_ But what they really mean goes beyond just pushing code. It means ensuring that \*\*everything—code, dependencies, infrastructure, configurations, and monitoring—\*\*is in place and working together seamlessly. The question is, how long will this take? **Scenario 2:** There is a cloud outage, and the management asks, “_Can we switch over to a different cloud region?_” Even if you have a copy of your latest data in another region, how long will it take to launch a fully functional environment and switch over? In cloud-native development, infrastructure is described as Kubernetes clusters, databases, and load balancers. But for developers, infrastructure alone isn’t enough. What they truly work with are environments—staging, UAT, production, and preview—where their code runs end-to-end. These environments aren’t just collections of infrastructure components; they also include deployments, security policies, observability, and the context necessary to ensure applications function as expected. In the next section, we’ll introduce how environment lifecycle management is broken—and what needs to change. **Exploring Alternatives for Environment Lifecycle Management** --------------------------------------------------------------- Instead of explaining environments as just another infrastructure challenge, let's apply a familiar **CRUD (Create, Read, Update, Delete) analogy**, i.e., how environments are created, monitored, updated, and eventually retired. Let’s walk through each stage. ![CRUD Operations](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/b64-1743419648825-compressed.png) 1\. Create: Why does Spinning up an environment take so long? ------------------------------------------------------------- Imagine a developer who wants to launch a new Performance Testing environment, similar to the regular QA environment, with a higher autoscaling configuration. Ideally, they’d just press a button, and everything—code, dependencies, infrastructure, security policies, and monitoring—would come together seamlessly. ### **Traditional Approach:** Creating an environment takes days or even weeks. It often requires: * Multiple teams to handle different components. * A long run book with manual steps; Setting up networking, base infrastructure through IaC, secrets, credentials, deployment pipelines, databases, and domains. Over time, the runbook itself becomes stale and outdated. #### **What Needs to Change?** * Declarative, not Procedural: Rather than a procedural run book, environment should be declaratively defined: * Requirements specified by the developers, e.g., a Postgres * What builds to select, e.g., the master branch * Which configurations to override. e.g., autoscaling limits * These definitions should live in a single source of truth, inheriting standard configurations while overriding specific characteristics where necessary. ### **Facets Approach:** Facets unifies infrastructure, code, and configuration into cohesive blueprints, eliminating silos, ensuring a [single source of truth](https://www.facets.cloud/features#Infrastructure-Config:~:text=Single%20Source%20of%20Truth%20\(Blueprints\)), enabling environment-specific overrides for [consistency across environments](https://www.facets.cloud/features#Infrastructure-Config:~:text=Zero%20Drift%20Between%20Environments). ![Environment CRUD](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1743419651182-compressed.png) **2\. Read: Why is information about an environment scattered?** ---------------------------------------------------------------- A common frustration in software teams is **not knowing what’s running where**. A developer investigating a performance issue might ask: * _"Are all production environments affected, or just one?"_ * _"When was the last release deployed?"_ * _"Did Ops modify any configurations?"_ ### **Traditional Approach:** * Scattered tools force teams to dig through different tools like logs, dashboards, and release management for answers, where information may not be consistently tagged * Configuration changes lack visibility, making debugging slow and frustrating. * Cloud resources are tagged with environments, but are all assets of environments consistently tagged like configurations, change sets, and metrics? #### **What Needs to Change?** * **Explicit lineage:** Every environment should have real-time traceability from Teams to resources and configurations * **Changesets:** e.g., what all changes have been pushed to an environment in the last 6 hours? * **Configurations:** e.g., how does one environment configuration differ from others? * **Alerts and more:** what are the open alerts in my production environment? * **Cost:** e.g., what is the cost of my QA environment? * Developers should have **instant access** to environment data through a **single, centralised view**. ### **Facets Approach:** By consolidating all environment insights into a single view, we let teams make faster, more informed decisions about their infrastructure. Real-time visibility across resource utilisation and performance metrics enables proactive management, helping teams identify and address potential issues before they impact operations. ![ CRUD](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1743419652692-compressed.png) **3\. Update: How do I push changes consistently?** --------------------------------------------------- Updating an environment should be seamless, yet today, it’s a high-stakes, fragmented process, spanning across owners of various pipelines, code, configuration, infrastructure, database, monitoring, alerting, tagging -- as many as 32 different categories of pipelines that may modify an environment. ### **Traditional Approach:** * Code, infrastructure, and [configurations](https://blog.facets.cloud/why-traditional-configuration-management-challenges/) have separate pipelines, creating unnecessary inter-pipeline dependencies. * Updating an environment means triggering multiple pipelines in a specific order. #### **What Needs to Change?** * A single environment orchestrator: * Orchestrates multiple pipelines automatically. * Ensures eventual consistency across environments. * Eliminates the need for ticket-based infra changes. * A system where developers can deploy, provision resources, and update configurations without depending on Ops. ### **Facets Approach:** Managing changes becomes simpler when every update - from code deployments to [configuration changes](https://www.facets.cloud/features#Infrastructure-Config:~:text=Configuration%20Management%20as%20Code) - follows one consistent process. With built-in pipelines, updates are handled seamlessly, ensuring consistency and minimising disruptions. Once an environment is launched, users can scale resources, modify configurations, or deploy new application versions within the same streamlined workflow. The platform ensures that all changes are applied consistently across environments, reducing the risk of errors. ![CRUD](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-2-1743420333190-compressed.png) ![CRUD](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-1-1743420357777-compressed.png) ![CRUD](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-3-1743420409534-compressed.png) **4\. Delete: Why Do Old Environments Linger?** ----------------------------------------------- Ever heard someone say, _"Do we still need that test environment?"_—only to realise it’s been running for months, racking up cloud costs? ### **Traditional Approach:** * Cleaning up environments is manual and risky. * No automated cleanup policies, leading to resource sprawl. * Teams hesitate to delete environments, fearing accidental loss of critical data. #### **What Needs to Change?** * Automated cleanup policies to define expiration rules for non-production environments. * Guardrails against accidental deletions, ensuring clear ownership and alerts before termination. * Dynamic environment hibernation, adjusting infrastructure usage based on demand. ### **Facets Approach:** When an environment is no longer needed, Facets makes it easy to destroy all resources and delete the cluster. This ensures cost efficiency and prevents resource sprawl. ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1743419659329-compressed.png) **Conclusion: A New Approach to Environment Management** -------------------------------------------------------- Traditional environment management is slow, fragmented, and resource-intensive. Developers deal with inconsistent environments, DevOps teams are bogged down by manual configurations, and organizations end up wasting time and cloud costs on inefficiencies. #### **What Needs to Change?** * **Environments should be commodities, not assets** – treating environments as first-class entities ensures they are consistently available and reliable. * **Automation should replace manual intervention** – reducing reliance on runbooks, ad-hoc scripts speeds up deployment cycles. * **Visibility should be real-time and centralized** – developers should have access to clear lineage tracking, monitoring, and configuration changes. * **Environment cleanup should be proactive** – automated policies for hibernation and deprovisioning ensure cloud costs remain under control. And here are the benefits that you can expect - * **Drift-Free Environments:** Every environment stays consistent and error-free, with environment-specific overrides applied without drift. * **Reducing Complexity:** No more Terraform sprawl, manual Helm charts, and brittle runbooks. Automation eliminates repetitive tasks and ensures a reliable setup. * **Saving Time:** What once took weeks or months now happens in minutes, allowing DevOps to focus on high-value work. * **Lowering Costs:** Automated cleanup and optimisation prevent cloud waste, keeping costs under control. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Our Journey to Making Facets.Cloud Enterprise Ready Author: Pravanjan Choudhury Published: 2025-02-18 Category: Tech Articles Tags: facets.cloud, rbac management, enterprise readiness, change management, audit logs URL: https://blog.facets.cloud/a-guide-to-become-enterprise-ready We recently shared a detailed [LinkedIn post](https://www.linkedin.com/posts/pravanjan_enterprise-readiness-is-about-making-your-activity-7295320754965958656-kuuk?utm_source=share&utm_medium=member_desktop) about our complete journey in implementing all 12 recommendations from the [EnterpriseReady.io](http://enterpriseready.io) framework. While that post covers our comprehensive transformation, we wanted to focus here on a few key transformations: some of which didn’t just help us meet enterprise expectations; they also became a competitive advantage, addressing gaps in the market that enterprises faced. When we started adapting Facets for enterprise use, we expected many changes would simply be checking boxes— standard requirements like SSO and role-based access. What we didn't expect was finding significant needs that enterprises were struggling with. Read this blog further to understand what were the most impactful transformations that became market advantages for us: Key Transformations That Made the Difference -------------------------------------------- ### Product Assortment: From One-Size-Fits-All to Flexible Components The traditional enterprise-ready approach simply suggests offering different product tiers. But as we worked with enterprises, we discovered a deeper problem: teams within the same organisation had vastly different needs and maturity levels. Some lacked automation entirely, while others had sophisticated systems in place. **Our edge:** We transformed this challenge into an advantage by creating a modular platform that lets enterprises have both standardisation and flexibility: * Teams can use our pre-built automation or bring their own * Enterprises maintain central governance while teams keep their preferred workflows * The API-first approach lets teams integrate with existing developer portals The impact has been substantial: our customers now standardise cloud automation without forcing one-size-fits-all solutions on their teams. One of our customers (A Fortune 50 company) has replaced multiple tools with our flexible platform, reducing costs while improving developer satisfaction. ![Enterprise readiness](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/frame-1-1739943220472-compressed.png) Audit Logs Enterprises need clear visibility into who makes changes and when, not just for security and compliance but also for operational transparency. We discovered central IT teams cobbled together information from 20+ tools for audits and compliance and needed a single view for all the changes across their development toolchain. **Our edge:** We evolved basic audit logging into a unified governance solution that captures and correlates changes across the entire software delivery lifecycle. By creating a single source of truth for all infrastructure and environment changes, we've enabled seamless ITSM integrations that automate compliance workflows. Our structured data approach makes audits remarkably efficient. ![Audit logs](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-22-1739854476625-compressed.png) Deployment Options Enterprises wanted modern cloud tools but couldn't use multi-tenant SaaS due to security requirements. Most vendors offered clunky "on-prem" versions that felt like afterthoughts. From the start, we built Facets to be self-hosted inside enterprise environments, ensuring data security. But as we expanded, customers needed more flexibility, including: * Multi-cloud support across AWS, Azure, and GCP. * High-availability and backup configurations for resilience. * Deployment models that integrate seamlessly with enterprise security policies. **Our edge:** We've developed secure support workflows that respect their security boundaries without compromising on service quality. Facets became a solution not just for enterprises but also for SaaS companies looking to distribute their own software in secure environments. ![AWS, GCP, Azure](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1739854194374-compressed.png) ### Role based access control As organisations scale, it's harder to control the granular permissions across cloud tools, creating unmanageable IAM structures that can’t map to their complex organisational hierarchies. Companies then need to create hundreds of custom roles, and teams waste countless hours translating organisational structures into tool-specific permissions. **Our edge:** Our Context-Aware Access Control system fundamentally reimagines how permissions work. Instead of creating technical roles, permissions automatically align with organisational structure. A user's access seamlessly adapts based on context – they might be a viewer in production environments but have developer access in staging. The system understands organisational hierarchy, making it natural to implement concepts like "team lead" or "department manager" without creating special roles for each case. ![User management in Facets](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1739854195506-compressed.png) ### Change Management Enterprise tools often push updates without considering complex production environments, leading to disruptions and upgrade anxiety. We turned this pain point into an opportunity for innovation. **Our edge:** Our change management system provides each enterprise with dedicated sandbox environments that mirror our staging branch. Teams can catch integration issues early, long before they affect production. Critical updates, particularly for components like Kubernetes, are controlled through intuitive UI prompts that let administrators test thoroughly in lower environments first. The system includes sophisticated rollback capabilities that enterprises rarely find in cloud tools. Looking Forward --------------- While we've made significant progress, we see this as just the beginning. Enterprise needs continue to evolve, and we're committed to staying ahead. We're currently exploring AI-powered analytics, expanding our integration ecosystem, and building automation that makes complex enterprise governance feel effortless. For those on a similar journey, remember: enterprise-readiness isn't just about checking boxes – it's about understanding deep organisational needs and turning them into opportunities for innovation. Want to learn more about our enterprise journey or share your experiences? Let's connect! Visit [Facets.cloud](https://www.facets.cloud/) or [book a demo here](https://www.facets.cloud/demo). --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Part 3: Why You Should Use IaC to Deploy Your Applications Author: Anshul Sao Published: 2025-02-14 Category: Tech Articles Meta Title: Why You Should Use IaC for Application Deployments Meta Description: Traditional CI/CD pipelines rely on hidden manual processes, making deployments fragile. Learn how Infrastructure as Code (IaC) can automate infra, pipelines, and configurations—eliminating manual steps and ensuring true automation. Tags: Infrastructure as Code (IaC), secrets management, Terraform, ci/cd, everything as code URL: https://blog.facets.cloud/why-you-should-use-iac-to-deploy-your-applications Automation is supposed to reduce manual effort, right? But if you zoom out a little, you’ll notice something ironic—humans are still deeply embedded in "**automated**" CI/CD processes. **The Traditional Approach: Separate CI/CD Pipelines** ------------------------------------------------------ Let’s talk about how application deployments are typically handled today. We create separate Continuous Deployment (CD) pipelines that assume infrastructure is already in place. These pipelines—whether using [ArgoCD](https://argoproj.github.io/), [Helm](https://helm.sh/), or [Kubernetes](https://kubernetes.io/) manifests—deploy applications on top of an existing cluster. Sounds great, but here’s where the problems start: ### **Who Creates These Pipelines?** Each new application needs its CD pipeline. And who writes these? Humans. * Option 1: Developers manually copy self-serve templates (if available). * Option 2: Ops teams create them on request (time-consuming). **Problem**: Every new service requires a new pipeline. Every new environment requires an even bigger effort, as multiple pipelines must be created from scratch. ### **Ordering & Secrets Management Is Manual** These pipelines work under the assumption that infrastructure exists. So, when a new environment is needed: 1. Someone provisions infrastructure. 2. Someone manually copies values and secrets. 3. Pipelines use these shared or individual secrets for deployments. If you zoom in, this seems fine—everything runs in the right order. But when you zoom out, you see the hidden inefficiencies: * Humans are responsible for manually fulfilling infrastructure outputs and passing them to pipelines. * Configuration management issues persist, as seen in [Part 2](https://blog.facets.cloud/embracing-infrastructure-as-code-iac-for-secure-and-efficient-configuration-management/). * Only a handful of “heroes” in the team know the end-to-end process, making scaling difficult. ### **“But My Pipelines Are Automated!”—Is It?** At this point, some people might say: "My pipelines aren’t manual! I have a script for it. It runs on a Git commit—it's GitOps! " And yes, that’s great. But here’s the catch: * These scripts are still imperative and sequential. They execute step-by-step instructions in a set order, just like a human would. * They are fragile. If something breaks midway (e.g., a missing secret, a resource that isn't ready yet), the entire process halts. * They require human intervention when things go wrong. Someone has to debug, re-run, or fix the missing piece. * They are not self-healing. If an environment drifts from the expected state, the script won’t detect and fix it automatically—it needs to be manually rerun. Essentially, these non-declarative, big sequential scripts are as fragile as humans. They are still built on assumptions: ✔️ Infra already exists. ✔️ Secrets are already available. ✔️ Things will execute in the right order. When reality doesn’t match these assumptions, things break. Unlike Terraform, which continuously enforces state and corrects drift, these scripts just execute once and hope everything works. If something changes later, they won’t fix it on their own. **What If All of This Was Just a Terraform Project?** ----------------------------------------------------- Now, imagine an alternative: **[Everything as Code (EaC)](https://docs.aws.amazon.com/wellarchitected/latest/devops-guidance/everything-as-code.html)**—where your entire infrastructure, configuration, and deployment pipelines are defined in a single declarative system. ### **No More Manual Ordering or Wiring** * Terraform automatically manages dependencies and ordering between infrastructure, CD pipelines, and configuration management. * Outputs from one module flow into another—no need for manual fulfilment. * Infrastructure changes trigger application deployments automatically. * 3 way diff and state management further aids the self healing nature of it. ### **One IaC Trigger for Everything** Instead of triggering multiple pipelines manually or maintaining conventions to ensure order, you: * **Run Terraform** → it provisions infrastructure, generates configurations, and even sets up CD pipelines. * **Deploy with ArgoCD or Helm** → but provisioned through Terraform, not manually set up. Result: Adding a new service is as simple as invoking a Terraform module with a few details. New environment? One-click. No extra wiring is required. ### **CD Pipelines Become Dynamic & Declarative** Your pipelines don’t need to be hand-written per application. Instead, Terraform provisions a templated pipeline that dynamically picks up: * Java version * Image URL * Helm chart values * Any other app-specific details **Terraform at Scale: It Shouldn’t Be Scary** --------------------------------------------- People often fear large Terraform projects, thinking they become unwieldy. But the goal is not to treat Terraform as a fragile thing that must be run once in a blue moon. Infrastructure should be continuously managed—Terraform should be run multiple times a day without fear. This is why we built [Facets](https://www.facets.cloud/), and tools like [Terragrunt](https://terragrunt.gruntwork.io/) exist—to make large-scale Terraform manageable. ### **Conclusion:** ​[IaC](https://www.facets.cloud/articles/guide-iac-with-terraform) shouldn’t just stop at infrastructure. It should extend to deployments, configurations, and pipelines, making them fully declarative. ✔️ No more manually created pipelines. ✔️ ️No more manual secret management. ✔️ No more rigid infrastructure assumptions. Instead, a single Terraform project provisions everything—infra, pipelines, config management, and even deployment strategies. This doesn’t mean ditching tools like ArgoCD—it means making them a part of your IaC strategy rather than treating them as separate, manual processes. One-click to deploy a new service. One-click to spin up a new environment. No hidden humans in the automation loop. And that’s what true automation looks like. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Rethinking IAM: How Facets Lineage-Based RBAC Transforms Cloud Security Author: Facets.cloud Published: 2025-02-10 Category: Tech Articles Tags: cloud security, rbac management URL: https://blog.facets.cloud/rethinking-iam-how-facets-lineage-based-rbac-transforms-cloud-security-cm6yqs7lk002kfnwt08fa0dxw For modern enterprises, Identity and Access Management (IAM) is no longer just about authentication—it’s about precision, compliance, and usability at scale. Security teams grapple with two major challenges: 1. The explosion of IAM roles and variations that must be managed across different teams and services. 2. Disconnected IAM models across tools like cloud providers and release management systems, creating inconsistencies and operational inefficiencies. At Facets, we’ve built a Role-Based Access Control (RBAC) system that prioritizes lineage-aware permissions, ensuring that every access decision is rooted in a structured mapping of Teams > Project > Environment > Resource. This approach provides security teams with simplified visibility and control, eliminating ambiguity in access rights while streamlining governance across cloud environments and the software development lifecycle (SDLC). **Traditional IAM vs. Facets RBAC: Why Lineage Matters** -------------------------------------------------------- IAM solutions like AWS IAM, Azure AD, or Okta often rely on static, service-specific policies, which fail to provide the necessary context to manage cloud resources effectively. Managing access to virtual machines, databases, serverless functions, and Kubernetes clusters typically requires multiple overlapping policies, leading to role explosion, manual audits, and security blind spots. Facets solves this by embedding lineage-aware permissions at the core of access management, ensuring that security teams always know the precise scope of every user’s access. Here’s how: Challenge in Traditional IAM How Facets Solves It Siloed Permissions Per Service – AWS IAM, Azure RBAC, and GCP IAM require separate policies for EC2, S3, Lambda, etc., leading to fragmented governance. Context-Based RBAC – Permissions are tied to Teams > Project > Environment > Resource, ensuring access is always linked to operational needs rather than broad service roles. No Environment-Aware Context – Roles like S3-Writer apply globally, ignoring environment boundaries (e.g., Prod vs. Dev). Environment-Scoped Permissions – Ensure that s3:DeleteObject permissions apply exclusively to Development environment buckets, preventing accidental deletions in Production. Manual Multi-Cloud Audits – Teams waste hours correlating permissions across AWS, Azure, and GCP consoles. Cross-Cloud Access Analyzer – Provides structured visibility into permissions across all environments, mapped to the appropriate project and operational scope. Static Permissions – Cloud IAM roles lack time-bound or approval-based access, forcing risky "standing" privileges (persistent, excessive permissions). Automated Temporary Access – Assign context-driven, time-limited permissions for cloud resources (e.g., 2-hour access to Prod RDS) via Jira/Slack, ensuring least privilege at all times. Role Explosion – Hundreds of roles for services like Lambda, EC2, and S3 create unmanageable sprawl. Resource Groups + Contextual Inheritance – Group cloud resources based on business use cases while ensuring permissions remain tied to operational needs rather than arbitrary roles. **Why Facets Wins: Lineage-Based Access Governance** ---------------------------------------------------- Feature Traditional IAM Facets RBAC Scope Siloed (per cloud service or K8s) Lineage-aware policies tied to Project, Environment, and Resource. Role Granularity Broad service-level roles (e.g., EC2-Admin) Action + Environment-Level Control (e.g., ec2:StartInstance allowed only in Dev). Multi-Cloud Governance Manual role replication across AWS/Azure/GCP Consistent, structured lineage-driven governance across all cloud environments. Temporary Access Standing privileges or manual revocation Context-driven, automated temporary access with enforced expiration. Audits Export logs from each cloud console Cross-cloud/K8s audit trails, mapped to project and environment for better risk assessment. ### **Real-World Impact: Why Lineage Matters More Than Just Role Management** **🚀 Use Case 1: Least Privilege for Serverless & Databases** A fintech company needed to restrict DeleteTable permissions in DynamoDB to Prod environments only. With AWS IAM, this required separate roles for Dev and Prod. Facets solved this with a single DB-Admin role, scoped to environments, ensuring clear access context and reducing the number of roles. **🔄 Use Case 2: Multi-Cloud Resource Groups** A SaaS startup using AWS Lambda and Azure Functions struggled with role duplication. Facets’ lineage-aware RBAC allowed them to define Backend-Team access in alignment with project and operational structures, cutting policy management time. **🔐 Use Case 3: Auto-Revoking Cloud Credentials** A security team eliminated standing access to Prod EC2 instances by enforcing contextual approvals via Jira-based workflows, ensuring time-limited access only when needed, reducing breach risk. > ### **Security Teams' Perspective** > > “One thing that I really like was that we can edit the prebuilt, predefined roles. If I feel they’re too privileged or not privileged enough, I can always go into the code, make changes, and that’s something I really like.” > > “The recent change that you released about assigning multiple user groups—I think that is a big relief because before that, our IAM was a mess. At some point, you always mess up someone's permissions unknowingly.” > > “We have a lot of granularity in our RBAC, controlling each and every action and entity. A few days ago, someone performed a full release and things broke down. Because of the granularity, we were able to remove just the full release portion while leaving developers free to work.” > > — Vandan Rohatgi, Security Engineer, MPL ### **Why This Matters for Enterprises** ✅ Agility – Developers self-serve cloud resource access without waiting for ops teams, with clear contextual guardrails.  ✅ Risk Reduction – Permissions are always tied to environmental and project-specific constraints, reducing security gaps. ✅ Cost Savings – 60–80% fewer roles/policies to manage, with structured access context baked into every permission. Facets is not just another IAM tool - it simplifies security design for the entire organization, enabling security teams to enforce access governance at an organizational level rather than trying to reconcile fragmented IAM models across different tools. ### **The Future of IAM: Why Lineage-First Access is the Key to Security** Facets’ RBAC isn’t about ticking compliance boxes - it’s about ensuring that every access decision is rooted in an operational context. By aligning cloud and Kubernetes permissions under a structured Project > Environment > Resource model, Facets eliminates silos, sprawl, and security gaps of traditional IAM. For enterprises like MPL, the result is clear: 🚀 Stronger Security – Every permission is explicitly scoped to its operational context.  ⚡ Faster DevOps – Self-service access is safe because context-driven constraints are always in place.  📉 Less Overhead – 80% fewer roles to manage, with built-in access governance. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## The Bottleneck with Terraform Locks and Deployment Queues Author: Facets.cloud Published: 2025-01-08 Category: Tech Articles Tags: RELEASE MANAGEMENT, Terraform URL: https://blog.facets.cloud/the-bottleneck-with-terraform-locks-and-deployment-queues-cm5nv1jzn004x11gpx2wzf9mt ![](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image3-1738650229239-compressed.png) Deployments, when managed with [Infrastructure as Code(IaC)](https://blog.facets.cloud/guide-iac-with-terraform/), face a fundamental challenge: they cannot be performed concurrently or in parallel due to Terraform state locks. These locks, while ensuring consistency by preventing simultaneous modifications to shared state, also create bottlenecks that slow down deployment processes. This article explores the technical challenges of managing parallel deployments with IaC, why splitting Terraform projects isn’t scalable ([as detailed in our dedicated blog](https://blog.facets.cloud/why-a-big-unified-terraform-project-is-the-way-to-seamless-operations-cm5nuobib004u11gpq2tcbsv0/)), and the innovations developed to address these challenges. The Problem: Terraform Locks and Deployment Queues -------------------------------------------------- ### Sequential Deployments Terraform’s state locking mechanism ensures that only one operation can modify the infrastructure state at a time. While this guarantees consistency, it introduces significant delays: State Locking: [Terraform](https://www.terraform.io/) locks its remote state file (e.g., in AWS S3) during operations to prevent concurrent modifications. Queue Formation: If multiple teams or services need to deploy updates, they must wait in a sequential queue for the lock to be released. Impact on Productivity: Developers and operations teams face delays, reducing agility in high-frequency deployment environments. ### Example Scenario Team A updates Service X, locking the Terraform state. Team B, needing to update Service Y, is blocked until Team A’s operation completes. As the number of services grows, this sequential process increases deployment times exponentially. Splitting Terraform into smaller projects is often proposed as a solution to avoid these queues. However, as detailed in [Why a Unified Terraform Project is the Way to Seamless Operations](https://blog.facets.cloud/why-a-big-unified-terraform-project-is-the-way-to-seamless-operations-cm5nuobib004u11gpq2tcbsv0/), such approaches introduce their own challenges, including fragmented state management and complex dependency orchestration. Facets' Technical Innovation: Breaking the Queue ------------------------------------------------ ​[Facets](https://www.facets.cloud/) tackled this issue head-on by introducing Parallel Releases (Parallel Terraform applies), a feature that redefines how deployments interact with Terraform’s locking mechanism and shared infrastructure. This innovation allows multiple releases to occur concurrently without compromising consistency or safety. Key Innovations in Parallel Releases ------------------------------------ ![](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image2-1738650257935-compressed.png) ### Remote State Plan Generation **How It Works:** Deployment plans are generated directly from the remote state with the lock disabled. This ensures plans are accurate and consistent without locking the state. **Purpose:** Avoids conflicts during plan generation while maintaining visibility into the current state. ### Scoped Plan Validation **How It Works:** Deployment plans are validated to ensure they only target specific services, such as those managed by Helm charts. **Purpose:** Prevents unintended changes to shared infrastructure. **Fallback:** If validation fails, deployments revert to using the locked remote state for added safety to ensure consistency. ### Service-Specific Isolation **How It Works:** Releases are isolated to individual services by running a Terraform-targeted operation. This ensures that changes are scoped specifically to the desired service without affecting shared infrastructure or unrelated services. **Purpose:** Prevents resource conflicts and ensures parallel operations are safe and consistent. ### Post-Release State Sync **How It Works:** State syncing is deferred to scheduled maintenance windows, ensuring remote states are updated without relying on the local state. **Purpose:** Decouples operational state changes from deployment-specific workflows. ### Selective Locking **How It Works:** Locks are retained only for shared infrastructure components—service-specific configurations bypass locks, enabling parallelism. **Purpose:** Balances safety for critical resources with flexibility for service-level updates. ### Helm Integration for Rollbacks **How It Works:** Helm provides robust rollback mechanisms for service configurations. **Purpose:** Enables quick recovery from failures without affecting other services or releases. Benefits of Parallel Releases ----------------------------- ### **Reduced Deployment Time:** Developers can deploy multiple services simultaneously, eliminating queues and accelerating the application of infrastructure changes. ### **Improved Scalability:** Teams can handle a growing number of microservices without introducing deployment bottlenecks. ### **Enhanced Reliability:** Helm’s rollback capabilities ensure safe and consistent deployments, even during failures. ### **Better Developer Experience:** By removing the need to wait for queued releases, developers can focus on delivering value rather than managing deployment conflicts. Adopting Scoped Locking: Key Takeaways -------------------------------------- Based on our learnings from addressing Terraform state lock bottlenecks and improving deployment workflows, here’s how teams can implement a similar strategy: ### **Modular Plan Generation Without Global Locks:** Generate deployment plans from the remote Terraform state with locks disabled. This prevents unnecessary contention and allows teams to proceed with plan generation concurrently. ### **Scoped Plan Validation:** Validate the Terraform plan to ensure it only targets the intended modules (e.g., services deployed as Helm charts) and does not affect shared infrastructure. If validation fails, revert to using the locked remote state. ### **Local State for Apply Operations:** Always perform terraform apply using a local copy of the Terraform state whenever locks are disabled. This prevents race conditions during simultaneous updates. ### **State Syncing During Maintenance:** Defer syncing of the remote state until maintenance operations by running terraform refresh during scheduled intervals. ### Terraform: Retain locks only for shared infrastructure components while enabling module-specific updates to proceed independently. Conclusion ---------- Facets’ Parallel Releases demonstrate a cutting-edge solution to overcoming deployment bottlenecks caused by Terraform state locks and shared dependencies. Organizations can achieve faster, safer, and more scalable deployments by leveraging scoped operations, selective locking, and Helm integrations. For teams struggling with Terraform queues, these innovations offer a blueprint to unlock efficiency and agility in their workflows while maintaining the consistency Terraform is known for. Written By - [Ishan Kalra](https://www.linkedin.com/in/ishaankalra16/)​ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Why a Big, Unified Terraform Project is the Way to Seamless Operations Author: Anshul Sao Published: 2025-01-08 Category: Tech Articles Tags: multi environment management, orchestration, Terraform URL: https://blog.facets.cloud/why-a-big-unified-terraform-project-is-the-way-to-seamless-operations-cm5nuobib004u11gpq2tcbsv0 ![terraform](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image3-1738647864065-compressed.png) As Infrastructure as Code (IaC) practices mature, organizations often face a pivotal decision: Should they maintain a single, unified Terraform project or split it into smaller, modularized configurations? While splitting Terraform into smaller projects can seem like a practical solution to address immediate challenges, such as reducing state lock contention, it often introduces significant operational complexities that hinder scalability and reliability. In this blog, we will explore why maintaining a unified Terraform project remains the gold standard for seamless, drift-free operations and why alternatives, though appealing at first, fail to scale effectively. **The Advantages of a Unified Terraform Project** ------------------------------------------------- ### **1\. Comprehensive Dependency Management** ![terraform](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image1-1738647899729-compressed.gif) Terraform’s built-in Directed Acyclic Graph (DAG) ensures that all resource dependencies are automatically resolved. This means that resources are created, updated, or destroyed in the correct order, without requiring manual intervention or external orchestration. Example: When deploying a database and a dependent application, Terraform ensures the database is fully provisioned before applying the application’s configuration. In contrast, splitting these resources into separate projects requires custom scripting or tools like Terragrunt to enforce the correct sequence. ### **2\. Drift-Free Infrastructure** ![drift free infrastructure](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image7-1738647928531-compressed.gif) In a unified Terraform project, the state file serves as the single source of truth for the entire infrastructure. By consolidating state management, teams reduce the likelihood of infrastructure drift, where the actual state of resources diverges from the declared configuration. Unified projects allow for comprehensive planning and validation, ensuring that no changes are inadvertently omitted. In fragmented setups, multiple state files increase the risk of drift, as changes in one project may not be reflected in dependent projects. ### **3\. Simplified Orchestration** ![terraform orchestration](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image6-1738647962162-compressed.gif) **​**A unified project leverages Terraform’s built-in orchestration capabilities to manage dependencies, reducing the need for external tools like Terragrunt or custom CI/CD scripts. This simplicity minimizes operational overhead and accelerates the deployment process. With a single terraform plan or terraform apply, teams can preview and apply changes across the entire infrastructure. Fragmented setups require running multiple plans and applies, often with manual intervention to manage outputs and dependencies. ### **4\. Comprehensive Planning and Validation** ![terraform](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image2-1738647988512-compressed.gif) **​**A single Terraform project provides a holistic view of the infrastructure, enabling teams to preview all changes before applying them. This level of visibility is critical for assessing the impact of updates and avoiding unintended consequences. Example: Adding a new VPC or modifying a shared resource like a load balancer can be evaluated in the context of the entire infrastructure. In modularized setups, plans must be generated individually for each project, making it difficult to assess the overall impact of changes. ### **5\. Lower Risk of Partial Failures** ![terraform](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image4-1738648018261-compressed.gif) Terraform’s unified state ensures that failures during an apply are isolated to specific resources, while the overall state remains consistent. This reduces the risk of cascading errors that can arise in fragmented setups. Example: If provisioning a new S3 bucket fails, other resources in the same project are unaffected, and the state remains valid. In contrast, fragmented setups risk creating inconsistencies if dependencies are applied out of order or fail midway. ### **6\. Simplified Multi-Environment Management** ![Multi-Environment Management](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image5-1738648197754-compressed.gif) A unified Terraform project makes it easier to manage multiple environments (e.g., development, staging, production) by leveraging workspaces or parameterized modules. Example: Using Terraform’s workspace feature, teams can deploy identical configurations across environments while maintaining separate state files for each environment. In fragmented setups, managing environments requires replicating and synchronizing configurations across multiple projects, increasing the risk of inconsistencies. Why Alternatives Don’t Scale ---------------------------- ### **1\. Complex Dependency Management** Splitting Terraform into smaller projects forces teams to manage dependencies manually. Tools like Terragrunt can help, but they add an extra layer of complexity and require constant maintenance. Example: Consider a scenario where a VPC created in one project needs to pass its subnets to an application module in another project. With Terragrunt, you must define explicit dependencies and manage outputs between projects. This setup requires meticulous orchestration to ensure the VPC is updated before the application module is applied. Terragrunt’s dependency model is helpful but not automatic. Each dependency must be declared explicitly, and managing these interconnections can quickly become overwhelming as infrastructure grows. ### **2\. Fragmented State Management** With multiple state files, teams lose the ability to view the entire infrastructure as a single entity. This fragmentation makes it difficult to detect and resolve conflicts or inconsistencies across projects. Example: Imagine updating an IAM role in one project that is used by multiple other projects. If those projects do not reference the updated state, the changes may not propagate, leading to drift and potentially broken dependencies. ### **3\. Increased Operational Overhead** Managing multiple Terraform projects requires additional tooling, scripts, and processes to ensure consistency. This overhead grows exponentially as the infrastructure scales. Example: In a microservices architecture with dozens of services, each service might require its own Terraform project. Teams must then maintain scripts to orchestrate updates, ensuring that interdependent services are applied in the correct sequence. Debugging failures becomes more challenging when issues span multiple state files and outputs. ### **4\. Scaling Challenges** As infrastructure complexity increases, the limitations of fragmented setups become more apparent. Teams face bottlenecks in coordinating updates, managing dependencies, and ensuring consistent state across projects. Example: A company deploying a global Kubernetes cluster may split networking, compute, and storage into separate projects. Updating a shared resource like a DNS record or load balancer requires coordinated applies across all these projects. Without strict controls, this process becomes error-prone and difficult to scale. ### **5\. Terragrunt-Specific Challenges** While Terragrunt offers features like dependency management and parallelization, it introduces challenges of its own: Inconsistent Planning: Terragrunt can only plan individual projects after their dependencies are executed. This means there is no way to generate a unified plan showing all changes, which limits visibility and increases the risk of conflicts. Manual Error Recovery: If a dependent project fails, teams must manually resolve the issue and reapply, potentially delaying deployments. Learning Curve: Terragrunt’s syntax and structure add complexity, requiring additional knowledge and maintenance effort. Conclusion ---------- While modularizing Terraform into smaller projects may offer short-term benefits, such as reducing state lock contention, it introduces significant long-term challenges that hinder scalability and reliability. A unified Terraform project, on the other hand, leverages Terraform’s strengths—comprehensive dependency management, holistic state, and built-in orchestration—to deliver seamless, drift-free operations. Moreover, managing multiple environments becomes significantly easier with a unified project, as teams can reuse configurations and isolate state files using workspaces. For organizations seeking to scale their IaC practices while maintaining operational simplicity, a unified approach remains the cornerstone of effective infrastructure management. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Part 2: Embracing Infrastructure as Code (IaC) for Secure and Efficient Configuration Management Author: Anshul Sao Published: 2025-01-03 Category: Tech Articles Meta Title: Infrastructure as Code (IaC) for Secure and Efficient Configuration Management Meta Description: Explore how Infrastructure as Code (IaC) can address challenges associated with traditional configuration and secret management by automating configurations and securely managing secrets across environments. Tags: Infrastructure as Code (IaC), kubernetes secrets, secrets management, CONFIG MANAGEMENT URL: https://blog.facets.cloud/embracing-infrastructure-as-code-iac-for-secure-and-efficient-configuration-management In the [first part of this series](https://blog.facets.cloud/why-traditional-configuration-management-challenges/), we explored the challenges associated with traditional configuration and secret management, including manual updates, scattered secrets, and security vulnerabilities. In this part, we'll explore how Infrastructure as Code (IaC) can address these challenges by automating configurations and securely managing secrets across environments. Why Secrets Management Matters Secrets are sensitive pieces of information required for applications to function correctly. They typically fall into two categories: 1. **External Secrets**: Credentials and keys needed to access external resources or services, such as third-party API keys for services, observability systems, etc. 2. **Internal Secrets**: Sensitive data used within the application's ecosystem, like database passwords, inter-service auth tokens, etc. ​Mismanagement of these secrets can lead to security vulnerabilities, operational inefficiencies, and deployment inconsistencies. How IaC Addresses Secret Management Challenges ---------------------------------------------- ### **Full Context Awareness** [IaC](https://blog.facets.cloud/guide-iac-with-terraform/) tools define and manage infrastructure resources, generating most internal secret values. By integrating secret management into the IaC workflow:​ * Secrets are created and consumed within the infrastructure's context, preventing misuse outside the code. * IaC ensures the fulfilment of secrets for applications without any manual intervention. This is especially relevant for the secrets that are created by IaC itself, like Database Passwords, IAM roles, etc. ### **Consistent Secret Provisioning** By defining infrastructure and its dependencies, IaC can standardize secret management across environments: * Automated fulfilment of secrets such as database passwords and IAM roles. * Elimination of manual interventions during deployment. ### **Integration with Secret Stores** IaC integrates with secret management systems like AWS Secrets Manager, HashiCorp Vault, and Kubernetes Secrets to securely store and retrieve sensitive information. This ensures a centralized, secure repository for all secrets. ​**Centralized External Secret Fulfilment** Leveraging IaC for external secret management offers significant benefits: * **Single Source of Truth**: Secrets are defined and managed in one place, making it easier to maintain and audit them. * **Environment-Specific Customization**: ​While maintaining a centralized definition, IaC allows for environment-specific values (e.g., different API keys for development, staging, and production environments). * **Isolation**: ​IaC ensures that secrets are isolated and only accessible by authorized applications, preventing cross-environment access. **Decoupling Secrets from Applications** ---------------------------------------- In traditional setups, applications often directly access secrets, which poses security risks and increases complexity, as each application should have access to a central secret store and write code to retrieve value. With secrets management through IaC, we can decouple secrets from application code by: * **Injecting Secrets at Deployment Time**: Secrets are provided to applications during the deployment process, eliminating the need for applications to fetch or manage secrets themselves. * **Using Standardized Access Methods**: Applications access secrets through environment variables or mounted files, IaC is responsible for fulfilling the right value at the runtime from the secret store.  Templated Configuration Files with IaC -------------------------------------- ### **Challenges with Traditional Configuration Management** Applications often rely on separate configuration files for each environment, which: * Increases the risk of errors. * Requires coordination between teams to ensure correct values. ### **IaC Solution: Templated Configurations** IaC allows for the creation of environment-agnostic configuration templates, filled with actual values during deployment. Benefits include: * **Simplified Management:** One template serves all environments. * **Eliminated Configuration Drift:** IaC ensures consistent values across environments. ### **IaC’s Role in Template Fulfilment** During deployment, IaC processes templates by: * **Resolving Placeholders:** IaC replaces placeholders with the actual secrets and configuration values appropriate for the target environment. * **Ensuring Integrity:** By controlling the template processing, IaC ensures that only valid and authorized values are injected, enhancing security. * **Automating Wiring:** The configurations are then provisioned to the appropriate services as environment variables or config maps from  Kubernetes Secrets. ### Enhancing Developer Experience with IaC * **Secret Key Catalogs**: IaC can maintain a catalog of available Secret Keys and their descriptions, serving as a reference for developers. * **Validation Tools**: Before deployment, IaC can validate that all placeholders in the templates have corresponding values, preventing runtime errors due to missing configurations. * **Developer Support**: Integration with development tools can provide autocomplete suggestions and syntax checking, improving productivity and reducing mistakes. ### **Before and After: Example of Templatizing Configuration Files** Before Templatization (Separate Config files) **​# dev-config.yaml api\_key: dev-12345 host: dev.example.com \# prod-config.yaml api\_key: prod-67890 host: prod.example.com ** After Templatization \# config-template.yaml api\_key: {{API\_KEY}} host: {{HOST}} **Benefits of Adopting IaC for Configuration and Secrets Management** --------------------------------------------------------------------- ### **Enhanced Security** * **Reduced Exposure of Secrets**: By removing hard-coded secrets from configurations and avoiding manual handovers, the risk of accidental exposure is minimized. * **Centralized Access Control**: IaC allows for fine-grained control over who can access and modify External secrets, supporting compliance and auditing requirements. * **Automated Secret Rotation**: IaC can facilitate the regular rotation of secrets, enhancing security without imposing additional workload on developers. Like Database passwords etc.  ### **Increased Operational Efficiency** * **Automation of Repetitive Tasks**: Routine tasks like provisioning resources and injecting secrets are automated, freeing up time for more strategic activities. * **Consistency and Reliability**: Automated processes reduce human error, leading to more reliable deployments and consistent environments. * **Faster Deployment Cycles**: Streamlined processes enable quicker deployments, allowing teams to move fast. ### **Improved Scalability and Flexibility** * **Easier Scaling**: New infrastructure and environments can be provisioned programmatically, with configurations and secrets managed by IaC. (Remember the last time you had to launch feature environments how many configs you had to change especially in case of a large number of microservices) * **Multi-Environment Support**: Managing multiple environments becomes more straightforward, as configurations and secrets are not manually fulfilled and are guaranteed  * **Cloud-Agnostic Deployments**: IaC can abstract underlying cloud provider specifics, enabling more effortless migration between different cloud platforms, if needed. ### **Better Collaboration and Transparency** * **Version Control Integration**: IaC code can be stored in version control systems, providing a history of changes and facilitating collaboration among team members. * **Clear Separation of Concerns**: Developers can focus on application logic, while IaC manages infrastructure and configurations, but without having explicit dependencies and hence enhancing productivity. **Conclusion** -------------- By embracing Infrastructure as Code for configuration and secrets management, organizations can overcome the challenges of traditional methods. IaC provides a robust framework for: * **Centralizing Secrets Management**: Ensuring that secrets are managed consistently and securely across all environments. * **Automating Configuration Management**: Reducing manual effort, minimizing errors, and speeding up deployment cycles. * **Enhancing Security and Compliance**: Implementing best practices for secret handling, access control, and auditing. * **Improving Scalability and Flexibility**: Facilitating easy scaling and adaptation to changing business needs. In the next part of this series, we'll explore why you should use IaC to deploy your applications, delving into how IaC can further streamline deployment processes, enhance consistency, and support modern development practices. **Stay Tuned for Part 3: Why You Should Use IaC to Deploy Your Applications** --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Facets.cloud and Spotify Backstage: A Unified Vision for Platform Engineering Author: Anshul Sao Published: 2025-01-03 Category: Tech Articles Meta Title: Facets Vs Spotify Backstage Meta Description: Backstage and Facets.cloud work together to simplify platform engineering. Learn how Facets.cloud builds on Backstage’s strengths, offering a unified solution for efficient, scalable, and standardized software delivery. Tags: spotify backstage, facets.cloud, platform engineering URL: https://blog.facets.cloud/spotify-backstage-vs-facets-cloud As platform engineering evolves, tools like Backstage and Facets.cloud are emerging as indispensable for organizations seeking efficiency, standardization, and developer satisfaction. Despite their shared goal of streamlining software delivery, these tools are not competitors but complementary forces—each excelling in its domain. This article dives into how Facets.cloud builds on Backstage’s strengths and provides unique value to platform engineering teams. The Philosophy Behind Backstage and Facets.cloud ------------------------------------------------ ​[Backstage](https://backstage.io/), developed by [Spotify](https://open.spotify.com/), addresses a critical pain point in modern development: the sprawl of tools, services, and documentation. It brings them together in a centralized portal, giving developers a single pane of glass to manage their workflows. Backstage is the _frontend_ of platform engineering, empowering developers with visibility and consistency across their environment. ​[Facets.cloud](https://www.facets.cloud/), on the other hand, focuses on the _backend_ of [platform engineering](https://blog.facets.cloud/handbook-to-platform-engineering-journey/). It simplifies complex workflows by integrating application, infrastructure, and configuration management into cohesive systems. Platform engineers can create reusable automation components that developers consume effortlessly, reducing manual coordination and increasing efficiency. Furthermore, it ensures consistency across products, environments, and clouds by centralizing organizational knowledge, [preventing drift](https://blog.facets.cloud/comprehensive-approach-to-maintaining-a-drift-free-infrastructure/), and [standardizing processes](https://blog.facets.cloud/standardization-in-security-cost-compliance-and-observability/). Facets.cloud integrates seamlessly with Backstage but also stands alone, offering an inbuilt portal for organizations that may not have adopted a dedicated developer portal. High-Level Overview: What Each Tool Brings ------------------------------------------ ### Aspect ### ​**Backstage**​ ### Facets.cloud Core Purpose Centralized developer portal Infrastructure orchestration Primary Users Developers, Team Leads Platform Engineering Teams Key Functionality Aggregates tools, docs, and workflows Integrates workflows, automates configurations, and ensures standardization Integration Capabilities Extends via plugins Integrates with portals like Backstage Standalone Capabilities Requires additional backend tooling Includes a developer portal option Backstage simplifies the [developer experience](https://www.facets.cloud/articles/developer-experience-dev-ex) by organizing services and tools, while Facets.cloud ensures those services are provisioned and managed seamlessly in the background. This alignment creates a powerful synergy. Backstage + Facets.cloud: Better Together ----------------------------------------- When used in tandem, Backstage and Facets.cloud deliver a complete platform engineering solution. Here’s how they align: 1. **Backstage as the Developer Gateway:** * Provides visibility into services, their owners, and their current state. * Integrates with custom and sometimes non-standard tooling to provide a uniform view for developers. 2. **Facets.cloud as the Operational Engine:** * Automates workflows triggered by Backstage’s templates, ensuring deterministic and automated environment setups. * Reduces complexity by allowing platform engineers to build reusable automation components while developers leverage them without needing detailed infrastructure knowledge. * Standardizes software delivery across multiple environments and products, ensuring operational consistency and efficiency. This pairing ensures [developers can focus on innovation](https://blog.facets.cloud/transforming-developer-productivity-with-platform-engineering/), while platform engineers maintain operational excellence. How Facets.cloud Stands Out --------------------------- Facets.cloud’s unique value lies in its ability to: * **Simplify Workflows**: Integrates application, infrastructure, and configuration pipelines into a unified system, enabling one-click environment setups. * **Enhance Collaboration**: Allows platform engineers to define reusable automation components, reducing duplication and enabling developers to focus on their core tasks. * **Ensure Consistency**: Centralizes alerts, configurations, and infrastructure details, eliminating dependency on individual expertise and preventing drift. * **Offer a Built-In Portal**: Provides organizations without a developer portal like Backstage with a cohesive interface to manage workflows. * **Seamlessly Integrate**: Acts as a backend orchestration layer for Backstage or other portals, ensuring backend processes are efficient and automated. These features make Facets.cloud not just a tool but a comprehensive solution for organizations at varying levels of platform engineering maturity. Architecting for the Future: Key Considerations ----------------------------------------------- ### ​**Question**​ ### Backstage's role ### Facets.cloud's role How do we simplify workflows? Aggregates tools, docs, and templates Automates deployment and provisioning with standardized proc How do we ensure compliance? Promotes organizational best practices Embeds compliance into reusable automation rules How do we scale operations? Standardizes developer access Enables multi-cloud, multi-product, and multi-environment products to operate without maintaining project-specific automations For architects, the message is clear: Backstage and Facets.cloud are not an either/or choice. Instead, they represent two essential pillars of a robust platform engineering strategy. ![Facets vs Spotify backstage](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1735887483486-compressed.png) ### ### Conclusion: A Holistic Approach to Platform Engineering Backstage and Facets.cloud excel in their respective domains, addressing distinct yet interconnected needs. Backstage empowers developers with an intuitive, centralized interface, while Facets.cloud ensures the infrastructure and processes behind that interface are automated, scalable, and reliable. Together, they create an ecosystem where developers can thrive and platform engineers can innovate. By integrating cohesive workflows, fostering collaboration through reusable automation, and centralizing organizational knowledge, Facets.cloud amplifies the efficiency of platform engineering. Organizations looking to embrace platform engineering need not choose between these tools. By leveraging both, they can unlock unparalleled efficiency, collaboration, and satisfaction across their teams. Facets.cloud’s ability to stand alone or integrate seamlessly ensures that every organization, regardless of its starting point, can benefit from this holistic approach to modern software delivery. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## InnerSourcing in Platform Engineering: An inclusive framework for Platform Adoption Author: Pravanjan Choudhury Published: 2024-12-11 Category: Tech Articles Meta Title: InnerSourcing in Platform Engineering: An inclusive framework for Platform Adoption Meta Description: This article discusses the benefits of applying InnerSourcing principles in platform engineering and infrastructure automation and how it can help organizations overcome the challenges of rigid, top-down platform initiatives. Tags: platform engineering meetup, inner source, innersourcing, open source URL: https://blog.facets.cloud/innersourcing-in-platform-engineering-golden-paths At a recent Platform Engineering Conference, I had the privilege of attending an inspiring session led by technology leaders from a major bank. They shared their transformation journey of migrating to a central platform out of siloed cloud automation across teams. This internal platform successfully broke free from team-specific automations, enabling the modernization of hundreds of applications delivered within an impressively short time frame. When asked how they managed to unify disparate teams—each deeply attached to their own custom-built frameworks—their response was as insightful as it was effective: they made the internal platform **open for contributions** from power developers across the organization. By doing so, they empowered developers to feel included and valued, allowing them to contribute their innovations to the central platform. This simple approach fostered a sense of ownership and collaboration, which not only dismantled silos but also enriched the platform with a diverse array of well-tested, robust components. This tells the potential of **innersourcing** in platform engineering and infrastructure automation. Advantage of Inner Sourcing --------------------------- Innersourcing—the practice of applying open-source principles within an organization—has already gained interest in how software development is conducted internally in large organizations, reducing duplicated efforts and fostering shared ownership. These same principles can be effectively applied to **software delivery** and [platform engineering](https://blog.facets.cloud/handbook-to-platform-engineering-journey). By adopting innersourcing practices in platform engineering, organizations can break down silos and encourage contributions from various teams, enhancing adoption of central platform initiatives. Hard Platforms End with Hard Landing ------------------------------------ In our conversations with multiple large enterprises, we’ve seen several failed initiatives - a recurring theme: platform engineering initiatives that start strong but fizzle out after limited adoption. These efforts often stall, not due to technical shortcomings but because of rigid platforms and inflexible structures that fail to resonate with developers. Of course, a platform needs to be opinionated—after all, that’s what makes it a platform. It has to offer guarantees and enforce certain standards, and some aspects must be closely guarded. But when a platform lacks extensibility—or worse, transparency—it’s almost guaranteed to face friction from the very developers it’s supposed to serve. The “just bring your code, and the platform will handle everything” approach sounds great in theory, but it rarely works in large enterprises. [Developers](https://blog.facets.cloud/developer-experience-dev-ex/) want to be involved, to have their [unique needs](https://blog.facets.cloud/transforming-developer-productivity-with-platform-engineering/) addressed, and to feel like valued contributors rather than just users of a monolithic system. Here are some of the developer voices that need to be addressed - **Transparency**: Are the platform’s paved roads transparent enough for me to understand what’s happening under the hood? **Custom Needs**: What if the paved road created by the platform engineering team doesn’t meet my specific project requirements? **Contribution**: I’ve developed well-architected components for my project. Can I contribute these to the central platform? These concerns underline the need for an inclusive approach for platform builders driving a central initiative for an entire organization. Enabling InnerSourcing for Platform Success ------------------------------------------- ![InnerSourcing in platform engineering](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/inner-sourcing-in-platform-engineering-1733919803218-compressed.gif) Innersourcing can transform how organizations build and adopt platforms. 1. **Raises the quality bar for the Platform**: Keeping Innersourcing as a requirement enforces platform builders to design and develop in an open, collaborative environment. This naturally raises the bar for quality, ensuring that all components are well-architected and extensible, capable of adapting to diverse needs across the organization. 2. **Inclusive Approach to Drive Change**: By encouraging contributions from teams across the organization, innersourcing fosters collaboration and inclusivity, empowering teams to shape the platform collectively and more importantly advocate for adoption. 3. **Avoids Platform Obsolescence**: Continuous contributions ensure the platform evolves with changing needs, preventing it from becoming outdated or irrelevant over time. Here’s how platform engineers and I&O leaders can leverage it: **Clarity on the Core and the Context:** Clearly communicate the key idea and guarantees of the platform. Outline what the platform aims to achieve, the boundaries of its core responsibilities, and which parts are open for contributions. This sets expectations and helps developers understand where and how they can add value. **Open for Contributions**: Allow power developers to contribute to the central platform, much like open-source development. This fosters collaboration and inclusivity. **Review & Preview Process**: Establish a transparent and well-documented process for reviewing pull requests (PRs) to the central platform. This ensures quality while keeping the door open for meaningful contributions. Empower teams to preview new module versions and test them in isolation. This helps identify and resolve issues early, minimizing disruption during adoption. **Controlled Rollouts**: Provide mechanisms for selective rollouts or forced releases. Allow teams to pin module versions or roll back to stable ones, ensuring they can move at their own pace without compromising on reliability. **Transparency and Metrics**: Offer project owners access to detailed logs and metrics for the modules they use. This visibility helps teams understand module behavior and troubleshoot issues, building trust in the platform. By adopting these practices, organizations can create platforms that are not just tools but thriving ecosystems. Innersourcing transforms the traditional top-down model of platform development into a collaborative, iterative process—one where every team has a stake in the platform’s success. Facets.Cloud and InnerSourcing ![INNERSOURCING IN FACETS](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/inner-sourcing-in-platform-engineering-1734322991399-compressed.gif) Facets.Cloud empowers organizations to move away from project-specific automation and embrace [standardization](https://blog.facets.cloud/standardization-in-security-cost-compliance-and-observability/) through its **platform-as-a-product** model. By incorporating innersourcing principles, Facets enables power users of the organizations to build on its platform-as-a-service for catering to team-specific needs. Here’s how Facets has adopted and operationalized these principles: ### **Provide Clarity on the Core and the Context**:  Facets delivers clear guarantees by managing critical responsibilities like generating project-specific [Infrastructure as Code (IaC)](https://blog.facets.cloud/guide-iac-with-terraform/), maintaining automation state management, and ensuring cloud portability. At the same time, it avoids rigidity by offering guidelines for writing project-agnostic automation code rather than enforcing strict adherence to standard modules. This balance allows teams to innovate while relying on the platform’s core strengths. ### **Open for Contributions**:  Platform engineers and power developers can leverage reference modules provided by Facets to create their own custom modules for unique use cases. These modules can then be registered with the platform, which takes over the controls of their execution during appropriate invocations. The developers of our customers have used this flexibility to integrate custom cloud solutions, toolchains, built reusable modules, Slack bots, and other creative enhancements. ### **Review & Preview Process**:  Contributed modules are stored in a Git repository, enabling Platform Engineers to review and approve qualified changes. Once approved, these changes can be tested in a **preview mode**, allowing selective testing of new versions without disrupting ongoing operations. ### **Controlled Rollouts**:  Facets provides project owners with the ability to control rollout speeds and mitigate risks. They can choose to adopt the latest version of a module or pin specific versions to maintain stability. This approach minimizes the blast radius of potential issues during module updates. ### **Building Transparency**:  Transparency is built into the core of the Facets platform. Project owners can access details about the modules being used in their projects, view execution logs, and even examine the generated code. This visibility ensures developers and teams have a deep understanding of how the platform and its innersourced modules are functioning, fostering trust and collaboration. Conclusion ---------- Innersourcing for platform engineering may not be a one-size-fits-all solution. For smaller organizations, a purely opinionated platform might suffice, offering simplicity and efficiency that aligns with their scale and needs. However, for larger enterprises and platforms aiming for longevity, innersourcing provides a compelling avenue to explore. By fostering collaboration, transparency, and adaptability, innersourcing equips platform engineers to build solutions that not only meet immediate demands but also evolve with the organization, ensuring sustained relevance and success. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Implementing GitOps at Scale: From Standards to Strategy Author: Rohit Raveendran Published: 2024-12-02 Category: Tech Articles Meta Title: Avoid Implementing GitOps: Common Pitfalls with Best Practices and Strategies Meta Description: Explore GitOps principles, OpenGitOps standards, and actionable strategies for successful implementation. Learn how to overcome challenges, avoid common pitfalls, and scale GitOps effectively with Facets. Tags: facets.cloud, OpenGitOps, gitops URL: https://blog.facets.cloud/implementing-gitops-at-scale-from-standards-to-strategies GitOps has transformed the way organizations manage infrastructure and application delivery, providing a framework for consistency, automation, and reliability. However, achieving success with GitOps requires adopting its principles correctly and designing practical strategies to avoid common pitfalls that can lead to inefficiencies and errors. In this blog, we explore GitOps, examine the [OpenGitOps standards](https://opengitops.dev/), and share practical advice to help organizations avoid implementation failures. How GitOps Became Popular: Parallels with Code Management --------------------------------------------------------- Imagine managing your entire infrastructure and applications the same way developers manage code—using a single, reliable system that tracks every change, lets you roll back when needed, and ensures everything is always in sync. That’s the promise of GitOps. GitOps extends the familiar principles of Git-based code management to infrastructure and software delivery. By declaring the desired state of systems in Git and using automation to ensure the real-world state matches, GitOps provides a powerful way to simplify operations. Developers loved Git for its ability to version, audit, and roll back changes easily—and now, with GitOps, the same benefits are applied to infrastructure. At its core, GitOps delivers on four key promises: * **A single source of truth**: All infrastructure and application configurations live in Git, just like your code. * **Version control for infrastructure**: Every change is logged, making it easy to track who did what and when. * **Automatic reconciliation**: If something drifts out of place, it’s automatically brought back to the desired state. * **Rollback safety nets**: When things go wrong, you can instantly revert to a stable, known good state. The familiarity of Git, combined with the simplicity of automation, made GitOps an instant favourite for teams trying to streamline their delivery processes. But, like all great things, GitOps comes with its challenges. Many organizations jump on the GitOps bandwagon without fully understanding how to adapt its practices or implement them effectively. As GitOps gained popularity, so did the need for thoughtful, tailored systems to make it work smoothly in diverse and complex environments. And that’s where the real story begins: not just adopting GitOps but doing it right. Common Mistakes in GitOps Implementation ---------------------------------------- Modern cloud environments are a labyrinth of complexity. Consider the sheer types of artifacts/concerns that make up a typical application ecosystem—services, cloud resources, virtual machines, PaaS components, monitoring setups, alerting systems, database schemas, routing rules, secrets, and variables, to name just a few. In theory, GitOps can and should be applied to all these types of artifacts/concerns. However, in practice, many organizations fall short, leading to fragmented or incomplete implementations. * **Misunderstood Principles:** GitOps is more than just automating deployments—it’s a system that ensures critical change management with strong guarantees. Often, teams are not equipped with the right principles to build a GitOps system. * **Disjointed Systems**: Infrastructure, application configurations, and delivery pipelines exist in silos, making it hard to track how changes in one area affect others negating the power of GitOps * **Inconsistent States**: Environments become inconsistent because there’s no unified source of truth across concerns. This leads to configuration drift, unpredictable behaviors, and manual firefighting. * **Manual Wiring**: Teams spend significant time wiring various GitOps components together manually, increasing the risk of errors. * **Limited Visibility**: With so many changes flowing through GitOps, without any centralized view, debugging and incident recovery become time-consuming and error-prone. For example, consider a team that manages application code with GitOps but infrastructure with [IaC tools](https://blog.facets.cloud/top-8-infrastructure-as-code-iac-tools/), and configurations manually. This fragmented approach undermines the benefits of GitOps by introducing complexity and inefficiencies. Advice for Building Your Own GitOps Systems ------------------------------------------- ### 1\. Adopt the right standards: OpenGitOps The OpenGitOps initiative, established by the GitOps Working Group, provides a clear and universal set of principles for adopting and scaling GitOps practices. 1. **Declarative**: Systems must express their desired state in a declarative format. 2. **Versioned and Immutable**: Desired states must be stored in version-controlled repositories with immutability. 3. **Pulled Automatically**: Software agents must automatically fetch desired states from the source of truth. 4. **Continuously Reconciled**: Systems must continuously monitor and align actual states with desired states. While the OpenGitOps standards provide a clear framework for implementing GitOps, they primarily address the foundational principles—what GitOps should look like in theory. In practice, GitOps is a way of working that requires organizations to build the right “system” around these principles. This involves answering critical operational questions, such as what to store in Git, how to manage complex configurations, and how to build a robust reconciliation system to keep environments aligned. ### 2\. Implement the right way: avoid common errors #### **a. Introduce Abstractions That Translate Into Implementations** A successful GitOps system relies on abstractions that simplify complex representation for developers and operators. These shared abstractions foster ease of use and collaboration. These abstractions should: * Define what needs to be done (the "desired state") rather than how to do it. * Allow platform engineering teams to create higher-level constructs (e.g., "services" or "databases") that are translated into underlying implementations like Kubernetes manifests, Terraform modules, or configuration files. * Spare developers from the cognitive load of details but allow power users all the flexibility #### **b. Leverage Declarative Definitions for All Concerns** The declarations in your GitOps system should encompass all operational concerns to achieve consistency and easy rollback. The declarative definitions should span across * Infrastructure as Code ([IaC](https://blog.facets.cloud/guide-iac-with-terraform/)): Define infrastructure components (e.g., servers, storage, databases) using declarative * Kubernetes Manifests: Manage containerized workloads with YAML files that specify Deployments, Services, and other Kubernetes objects. * Similarly, for [CI/CD](https://blog.facets.cloud/kubernetes-cicd-explained/), [Configuration Management](https://blog.facets.cloud/why-traditional-configuration-management-challenges/), IAM and Image wiring and any other concern that completes the environment definition. #### **c. Ensure a Way to Wire Things Together** A GitOps system must provide a mechanism to wire components together declaratively. This capability to interconnect resources makes it possible to maintain a cohesive, functioning system where changes propagate across dependencies automatically. For instance: * A database connection string output from Terraform can be wired into a Kubernetes ConfigMap for an application. * An IAM role provisioned for a service can be referenced in a CI/CD pipeline configuration. * Outputs from IaC tools can be fed as environment variables into runtime configurations. #### **d. Combine Unified Outputs With Flexible Execution** A well-designed GitOps system translates declarative definitions into various execution models, such as: * IaC Execution: Automating the provisioning of infrastructure resources like VMs, databases, or networking. * Kubernetes Manifests Creation: Generating YAML definitions for containerized workloads and applying them to clusters. * Orchestrate Image Wirings, CI/CD, Tool configuration through your preferred tools #### **e. Focus on Extensibility** Your GitOps system should support extensions and customizations to meet evolving needs and make it future-proof. Teams should be able to: * Add new resource types or configurations easily. * Integrate or replace external systems like cloud providers, CI/CD pipelines, or monitoring tools without touching every layer * Customize workflows while maintaining adherence to GitOps principles. How Facets Help Organizations Achieve True GitOps ------------------------------------------------- ![](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/truegitops-final-1733133184627-compressed.gif) Click the image to expand Facets.Cloud makes adopting GitOps straightforward by embedding its principles directly into the platform. Instead of requiring organizations to figure out the details themselves for complex multi-project setups, Facets builds the OpenGitOps standards into its core and provides a ready-to-use framework for achieving true GitOps. ### Adapting OpenGitOps standards without reinventing the wheel **1\. Declarative Blueprints** _“A system managed by GitOps must have its desired state expressed declaratively.”_ At the heart of Facets is the **[Blueprint](https://readme.facets.cloud/docs/blueprint)**, a declarative representation of your entire architecture. Blueprints define: * Resources like services, databases, and caches. * Relationships and dependencies between components. * Any declarations that need to be centrally managed like a database schema, dashboards, alerts Blueprints are stored in JSON, providing a single source of truth for infrastructure, code, and configurations. ![Declarative Blueprints](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1733115156788-compressed.png) ![Blueprints stored in JSON](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1733115158832-compressed.png) **2\. Versioned and Immutable Histories** _“Desired state is stored in a way that enforces immutability, versioning and retains a complete version history”_ Blueprints in Facets are stored in Git repositories, ensuring: * **Version Control**: Every change is tracked, offering traceability and rollback capabilities * **Immutability**: States cannot be altered once committed without creating a new commit. This fosters trust, transparency, and security across environments and ensures traceability of releases and rollbacks. ![Versioned and Immutable Histories](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1733115161396-compressed.png) **3\. Automated State Synchronization** _“Software agents automatically pull the desired state declarations from the source.”_ Facets Platform Orchestrator automates the reconciliation process by: 1. Pulling changes to the Blueprints from Git repositories. 2. Applying environment-specific overrides 3. Generating and applying Infrastructure-as-Code (IaC) scripts to manage resource additions or modifications 4. Ensuring downstream tools like monitoring, and alerting are correctly configured ![Automated State Synchronization](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/marketings-video-nov-25-2024-veed-1733137652829-compressed.gif) Facets Orchestrator ensures that the automation and tool configuration state matches the declared state for every environment in the reconciliation process. **4\. Continuous Reconciliation** _“Software agents continuously observe the actual system state and attempt to apply the desired state.”_ Facets continuously monitors and reconciles the state of environments through: * **Scheduled Updates**: Regular synchronizations ensure environments remain consistent. * **On-Demand Reconciliations**: Teams can trigger updates for specific resources or entire environments. ![Continuous Reconciliation](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1733115167007-compressed.png) Beyond OpenGitOps: Facets’ Unique Enhancements for complex setups ----------------------------------------------------------------- While the OpenGitOps standards lay a solid foundation, implementing GitOps in large organizations with complex setups is far from straightforward. It requires adapting principles to diverse scenarios, supporting unique workflows, and building robust systems for seamless adoption. Facets takes a step further, offering enhancements designed specifically to address the challenges of scaling GitOps in intricate environments. 1. **Lowering Barrier of Entry**: Facets provides pre-built templates and workflows to lower the barrier to entry for GitOps. This guided setup ensures best practices are followed from day one. 2. **Insights on Reconciliation**: Dashboards provide visibility into resource states, pending changes, and reconciliation logs. Teams can easily identify which commit IDs are live, and what is pending per environment and resolve discrepancies 3. **Controlled Reconciliation**: For actions that could disrupt systems, such as database deletion or syncing information back to Blueprints, Facets enforces manual approval processes to ensure changes are deliberate and carefully considered. Additionally, it offers controlled conflict resolution for scenarios like accidental manual changes or emergency fixes, helping teams handle exigencies without compromising system integrity. 4. **Extensibility**: Teams can define custom resource types and seamlessly integrate them into Blueprints, expanding GitOps capabilities beyond standard practices. For example, Facets enables GitOps-driven database management and the distribution of standardized dashboards, ensuring the platform adapts to unique workflows and organizational requirements. 5. **Intuitive UI-Driven Workflows**: A user-friendly interface allows non-experts to visualize the connection between declarations and their real-world manifestations, streamlining access to critical information. The platform also offers tailored interfaces for different personas, such as developers, operations teams, and architects, combined with role-based access control to reduce cognitive load and ensure focused, relevant interactions. Conclusion ---------- As modern software systems grow increasingly complex, so does modeling them well in GitOps. We’ve observed many organizations using GitOps as a catch-all term, often without a clear understanding of its principles. This poor implementation results in several critical issues: 1. **Configuration Drift and Inconsistencies**: Without a unified approach, environments can become unpredictable, leading to misaligned configurations and drifts that are costly to manage. 2. **Increased Risk of Human Error**: Manual interventions and ad-hoc changes bypass GitOps workflows, introducing errors that can lead to downtime, outages, or security vulnerabilities. 3. **Prolonged Recovery Times**: Without clear versioning and rollback capabilities, recovery from incidents becomes slower and more expensive, increasing operational risks. 4. **Deployment Bottlenecks**: Slow, risk-laden deployments that undermine team velocity and create a culture of fear around releasing new changes The cost of not implementing GitOps correctly goes beyond operational inefficiencies—it impacts business agility, team morale, and customer trust. Transform your GitOps journey with [Facets](https://www.facets.cloud/)—start today. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Part 1:Why Traditional Configuration Management Falls Short and What to Do About It Author: Anshul Sao Published: 2024-11-25 Category: Tech Articles Meta Title: Why Traditional Configuration Management Fails and How to Improve It - Part 1 Meta Description: Traditional configuration management creates headaches for developers with high maintenance, security risks, and inefficiencies. Find out why it falls short and what can be done to fix it. Tags: CONFIG MANAGEMENT, secret management URL: https://blog.facets.cloud/why-traditional-configuration-management-challenges Configuration and secret management is crucial for deploying applications securely and reliably. Traditional methods often involve manual updates, [scattered secrets](https://blog.facets.cloud/blue-green-deployments), and environment-specific files. These practices can lead to increased overhead, elevated vulnerabilities, and hinder scalability.  As we delve deeper into environment management strategies across various companies, these challenges consistently emerge as recurring pain points, hindering efficient software delivery and secure operations—particularly as teams, projects, or environments scale. This part of our series explores the common challenges the developers and DevOps engineers face with traditional configuration and secrets management. ![Traditional configuration](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/mindmap-taditional-configuration-1732630762507-compressed.png) Manual Configuration Updates and Environment- Specific Files ------------------------------------------------------------ The Burden on Engineers Managing configurations manually across multiple environments is time-consuming and error-prone. Each environment—development, staging, production—typically requires its own set of configuration files. This approach leads to: * **Increased Maintenance Effort**: Updating configurations in multiple places whenever a change is needed requires higher effort and diligence * **Risk of Inconsistencies**: Manual edits can introduce discrepancies between environments, causing unexpected behavior. * **Delayed Deployments**: Time spent coordinating and verifying configurations slows down the release cycle. #### **Example Scenario** Suppose you have an application that requires specific database connection strings and API keys for each environment. Managing these settings manually means any update requires editing multiple configuration files, increasing the chance of errors. Scattered Secrets and Security Risks ------------------------------------ #### Challenges with Secret Management Even when using secrets management tools, traditional practices often involve applications accessing secrets directly. This introduces several issues: * **Insecure Access Patterns**: Applications fetching secrets directly can expose them if not handled properly. * **Coordination Overhead**: Ensuring that the secret identifiers in the code match the actual secrets stored is error-prone. * **Complex Secret Management**: Rotating and updating secrets across environments becomes an administrative burden. ### Coordination Challenges Aligning the secrets used in code with those stored in secret management systems can be difficult: * **Manual Synchronisation**: Developers need to update code references when secrets change or new ones are added. * **Environment Discrepancies**: Different environments may have different secrets, requiring careful coordination. * **Limited Visibility**: Tracking which secrets are used and where can be challenging without a centralised system. Impact on Security and Efficiency --------------------------------- #### Security Vulnerabilities * **Exposure of Sensitive Data**: Vulnerabilities in the application can potentially expose secrets. * **Inconsistent Access Controls**: Direct access may bypass centralized policies. * **Audit Difficulties**: Monitoring which applications access specific secrets becomes more complex. * **Hard-Coded Credential**s: Embedding secrets in configuration files or code increases the risk of exposure. * **Inadequate Rotation Practices**: Rotating secrets regularly is challenging without automated processes. #### Operational Inefficiencies * **Increased Workload**: Engineers spend significant time managing secrets instead of focusing on development. * **Error-Prone Processes**: Manual creation and distribution of secrets can lead to misconfigurations. * **Scalability Issues**: As the number of applications and environments grows, managing secrets becomes more complex. * **Delayed Response to Threats**: Manual processes slow down the ability to respond to new requirements or security threats. * **Resource Drain**: Time spent on manual configuration and secret management distracts developers from development efforts. Conclusion ---------- In summary, traditional configuration and [secret management](https://www.redhat.com/en/topics/devops/what-is-secrets-management) pose the following challenges: * High Maintenance Overhead: Manual updates and environment-specific files increase workload and risk of errors. * Security Risks: Scattered secrets and direct application access expose systems to vulnerabilities. * Coordination Complexity: Synchronising secrets between code and secret stores across environments is labor-intensive. In the next part of this series, we'll explore how adopting [Infrastructure as Code (IaC)](https://blog.facets.cloud/top-8-infrastructure-as-code-iac-tools/) can address these challenges by automating configurations and securely managing secrets per environment. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Understanding Developer Experience (DevEx) and Why It Matters Author: Kirti Krishan Published: 2024-11-24 Category: Blogs Meta Title: Understanding Developer Experience (DevEx) and Why It Matters | Facets Meta Description: Explore the critical role of Developer Experience (DevEx) in enhancing productivity, innovation, and satisfaction among software development teams Tags: Developer experience, devex URL: https://blog.facets.cloud/developer-experience-dev-ex DevEx, or developer experience, isn’t just another industry fad.  A study by Forrester found that 74% of professionals believe improving developer experience (DevEx) drives productivity.  Developer experience refers to developers' overall experience interacting with tools, processes, and environments used to build software.  But what exactly is DevEx, and why does it matter? Let’s jump right in to understand this better.  What is developer experience (DevEx)? ------------------------------------- Developer experience encompasses everything from development environments and toolchains to company culture and work processes that influence a developer's daily activities. Greg Mondello, director of product at GitHub, states: "Software development capacity is often the limiting factor for innovation. Improvements to software development effectiveness are inherently valuable." DevEx aims to create an environment where developers can: 1. Work efficiently 2. Stay focused 3. Produce high-quality code with minimal friction Key components of Developer Experience include: 1. Systems and technology 2. Processes and workflows 3. Culture and work environment [Idan Gazit](https://github.com/idan), senior director of research at GitHub, offers an insightful analogy: "Building software is like having a giant house of cards in our brains. Tiny distractions can knock it over in an instant. DevEx is about how we contend with that house of cards." This analogy captures the delicate balance developers must maintain while working on complex projects. A positive DevEx helps developers maintain that balance, allowing them to stay in their flow state and work more effectively. Why does developer experience matter? ------------------------------------- Let’s explore some of the key reasons why it makes sense to start implementing some form of DevEx in your business.  #### Increased productivity and efficiency Well-designed developer experience significantly boosts productivity and efficiency. When developers have access to the right tools, streamlined processes, and a supportive work environment, they can focus on writing code and solving problems rather than wrestling with infrastructure issues or navigating complex workflows. In the same Forrester Survey, [82% of respondents](https://5890440.fs1.hubspotusercontent-na1.net/hubfs/5890440/Humanitec-Forrester-Opportunity-Snapshot-Platform-Engineering.pdf) believe improved DevEx can increase customer satisfaction over the long run. The correlation likely stems from the direct link between developer productivity and the ability to deliver high-quality software quickly. #### Higher job satisfaction and reduced burnout Developers enjoying a positive experience at work are more likely to be satisfied with their jobs and less prone to burnout. Good DevEx reduces frustrations, eliminates unnecessary roadblocks, and provides developers with a sense of autonomy and purpose in their work. Over [63% of developers](https://wac-cdn-bfldr.atlassian.com/K3MHR9G8/as/w62hhp9vhcb8696mg9m944/CSD-7616_Document_ADO_State_of_DevEx_Survey-FINAL) said that developer experience is very important when deciding whether to stay in the job or not.  The same report found that [86% of leaders](https://wac-cdn-bfldr.atlassian.com/K3MHR9G8/as/w62hhp9vhcb8696mg9m944/CSD-7616_Document_ADO_State_of_DevEx_Survey-FINAL) believe attracting and retaining the best talent would require improving the developer experience first.  #### Improved code quality and innovation Good user experience is a result of good development practices and systems. So, only focusing on the end result, without fixing what’s broken in-house can lead to a disaster. ** ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1729576370216-compressed.png) ** That’s why you need to create an environment that supports your developers’ creativity and problem-solving skills so they can write higher-quality code and more innovative solutions—thus creating better software for the end users. With DevEx prioritized, you give your developers a way to experiment, iterate quickly, and learn from mistakes without fear of repercussions. #### Better collaboration and teamwork Good DevEx doesn’t only help individual developers on the team.  When you work towards a better developer experience, you are also implementing better toolchains in the team, improving how people collaborate, and creating systems for ideas to flow.  This culture of innovation can encourage people to work together and collaborate on different projects.  #### Faster time to market DevEx also helps shorten the time to market, according to [77% of surveyed developers](https://5890440.fs1.hubspotusercontent-na1.net/hubfs/5890440/Humanitec-Forrester-Opportunity-Snapshot-Platform-Engineering.pdf). This generally happens because good DevEx also means working towards eliminating bottlenecks and inefficiencies that can slow down the development process. ### What is good DevEx? Good developer experience requires thinking holistically about not just one or two aspects but addressing all aspects of a developer's work life.  #### Tools and technology Most devex improvements start at the toolchains. Let’s look at what we need here: 1. Efficient development environments: You want to provide developers with fast IDEs and computers and any new tools that can help improve their development environments. 2. Integrated toolchains: You should aim to create a seamless integration between various development tools. This includes connecting version control systems, CI/CD pipelines, and code editors, allowing developers to work smoothly across different stages of the development process. 3. AI-powered code completion tools: Add advanced AI tools like GitHub Copilot to your development environment. These tools can suggest code snippets and help automate repetitive tasks, potentially boosting developer productivity. 4. Performance monitoring and debugging tools: Equip your team with user-friendly tools that make it simple to identify and resolve issues quickly.  5. Cloud-based development environments: Consider adopting platforms that help developers work from any location while maintaining consistent development spaces, promoting flexibility and standardization across your team.  You can implement [Facets.cloud](https://facets.cloud/) to automate infrastructure management and provisioning while allowing the Ops teams to create the guardrails required for developers to self-provision infrastructure without overusing resources. #### Work environment and culture Once you have settled the toolchains, we’ll move to the company culture and work environment—two important factors that help developers feel at ease while doing their job. Let’s look at a few elements that you can improve: 1. Clear communication channels: Set up effective communication tools and practices so developers can collaborate with team members and stakeholders, helping everyone be on the same page. 2. Collaborative atmosphere: Create an environment encouraging knowledge sharing, promote practices like pair programming, and create mentorship opportunities. The more knowledge spreads across the team, the faster the learning. 3. Continuous learning opportunities: Provide access to training resources and conferences and give developers time for self-improvement and to contribute to open-source projects or just hobby projects.  4. Psychological safety: Build a culture where developers feel safe taking risks and encourage sharing ideas and admitting mistakes without fear of negative consequences. 5. Work-life balance: Implement policies supporting flexible working hours and remote work options and respect personal time to prevent burnout and maintain long-term productivity. #### Workflows And finally, we need to put this all together and create a workflow that doesn’t block any team or individual from completing their assigned work as efficiently as possible. Here are a few things that you can start with: 1. Efficient onboarding: Design a well-structured onboarding process. New developers quickly get up to speed with the codebase, tools, and team practices, reducing time to productivity. 2. Agile development practices: Adopt agile methodologies promoting iterative development. Encourage frequent feedback and adaptability to change, helping teams respond quickly to new requirements or challenges. 3. Code review processes: Establish clear guidelines and tools for code reviews. Constructive feedback and knowledge sharing improve code quality and team skills. 4. Automated testing and quality assurance: Implement robust automated testing frameworks. Issues get caught early and the burden of manual testing reduces, improving overall code reliability. 5. Deployment automation: Set up streamlined CI/CD pipelines. Automated build, test, and deployment processes reduce manual errors and speed up release cycles. 6. Documentation and knowledge management: Maintain comprehensive, up-to-date documentation and knowledge bases. Developers can easily find information and solve problems independently, reducing bottlenecks. ### Common challenges in improving the developer experience The key elements that make up a positive developer experience seem easy on paper, but there are quite a few challenges that you need to address: 1. Fragmented toolchains: Developers often work with a wide array of tools that need to be integrated better, and changing them can lead to context switching and reduced productivity. Find ways to streamline your toolchain to reduce these pain points. 2. Developer environments: Setting up and maintaining consistent development environments challenges even seasoned pros. Developers might struggle to quickly create self-serve environments for testing code. And oftentimes difficulties in replicating production conditions locally can lead to the frustrating "it works on my machine" problem. Look for ways to standardize and automate your environment setup process. 3. Excessive context switching: Frequent interruptions or developers juggling multiple tasks simultaneously can disrupt a developer's flow state and decrease overall productivity. Try to help the team set up dedicated blocks of uninterrupted time for deep work and minimize unnecessary context shifts. 4. High ticket-ops: Frequent reliance on ops teams to create or modify development environments slows your developers down and creates bottlenecks. Instead, focusing on self-service options and automation in environment management can help increase autonomy and productivity across the team. 5. Inadequate documentation: Poor or outdated documentation forces developers to waste time searching for information or reinventing solutions to problems someone on your team may have already solved. Prioritize and incentivize the creation and maintenance of clear, up-to-date documentation to save future headaches. 6. Slow feedback loops: Long waits for code reviews, builds, or deployments kill momentum and cause frustration. These delays compound when you face obstacles in setting up or modifying your development environments. Try to find opportunities to automate and speed up your feedback cycles. 7. Technical debt and legacy systems: A lot of companies stick to legacy tech because it’s expensive to rebuild things. These outdated or poorly maintained codebases can be frustrating and time-consuming for new developers. If your team is working on such systems, try to move away from them one step at a time to avoid downtimes and the high upfront costs. 8. Lack of autonomy: Overly restrictive policies or micromanagement block creativity and also reduce job satisfaction since developers now feel caged inside a set of rules. Instead, try to create an open culture that gives everyone enough freedom to work without relying too much on other teams. 9. Insufficient resources: Limited access to necessary hardware, software licenses, or computing resources can hinder a developer's ability to work effectively. Of course, this comes down to the budgets assigned to each project—however, if lack of resources is the obstacle, try to advocate for the tools and resources needed for better execution. 10. Poor work-life balance: High-pressure environments with unrealistic deadlines or expectations of constant availability can lead to burnout. Take inspiration from larger tech companies that require developers to spend time on personal projects instead of spending every minute on the job—this helps the companies over the long term with happier developers and more creative solutions to old problems.  ### Best practices for enhancing developer experience Now, you know what makes up good developer experience and what the challenges are. Let’s look at the best practices and things that you can do to improve the developer experience.  #### Automate what you can  Automation is probably the easiest way to start improving the developer experience. CI/CD pipelines automate build, test, and deployment processes, reducing manual work and speeding up development cycles.  You can implement scripts, bots, and AI-powered tools to handle routine tasks like code formatting, dependency updates, and basic code reviews.  Along with that, giving developers self-service infrastructure also helps them manage their own development environments and resources. If you want to implement self-service infrastructure in your organization, the easiest way would be to use a platform like [Facets.cloud](https://facets.cloud/).  With Facets, your Ops team can create the guardrails and infrastructure provisioning limits, while developers can directly create new instances as and when required without waiting for tickets to be resolved.  You can also streamline the code review processes through well-chosen tools and practices, which can make code reviews more efficient and valuable for everyone involved. #### Create continuous feedback loops You also need to create some sort of real-time collaboration feature that enables code sharing and pair programming, boosting knowledge transfer.  To further make collaboration useful, gather regular developer feedback through surveys, retrospectives, and open channels.  The insights you’ll gain from here can help pinpoint areas for improvement—showing developers how their work impacts the bigger picture can increase motivation. #### Improve support and documentation DevEx isn’t only for the existing developers but also for the new ones.  To make onboarding them easier and faster, you want to have clear documentation for all codebases, APIs, and internal processes, which in turn forms the foundation of excellent support.  The knowledge management systems you use for storing the documentation can further be used with LLMs to become a single source of truth for all your developers.  A question to the LLM that knows answers from your knowledgebase can make it easier for not only onboarding new developers but even enhancing customer support. To make this sustainable, you can even gamify knowledge sharing through points and scoreboards, giving your team members a reason to add to the documentation. #### Optimize the development environment The pandemic gave us a new way of working—a more flexible way we hadn’t thought of before. In fact, only [4% of developers](https://www.terminal.io/state-of-remote-engineering) want to go to the office 5 days a week. ** ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1729576373302-compressed.png) ** So, DevEx isn’t restricted to just in-office teams. It extends to individual developers, no matter the locations.  Start with providing each member powerful hardware for an efficient development setup—ensuring that the hardware is never the bottleneck for productivity. This hardware becomes even more powerful when paired with the perfect set of collaboration tools.  If you haven’t already, implement containerization technologies for more efficiency and better resource usage.  #### Don’t let technical debt get out of hand Technical debt is the sort of mess no developer wants to enter into. If your company forms a reputation for having a lot of backlogs or legacy technology, it discourages people from joining in.  ** ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1729576378690-compressed.png) ** Source: [xkcd](https://xkcd.com/) Now, you can’t clear debt overnight. It will take time.  But the easiest way is to understand that it takes time and schedule time. Consider refactoring old code, finding ways to upgrade legacy technologies one by one, and any other changes that will ensure long-term viability of your code.  _**Also Read:**_ [_**10 Best Software Release Management Tools to Streamline Your Deployment**_](https://blog.facets.cloud/10-best-software-release-management-tools/)​ ### Measuring and improving developer experience Alright, so you have started implementing tools and technologies to enhance DevEx, are working on improving the company culture and also implementing the best practices.  How do you know that it’s all working? Let’s look at the key metrics that you can use to see if you’re headed in the right direction. The quickest way is to look at the [DORA metrics](https://cloud.google.com/blog/products/devops-sre/using-the-four-keys-to-measure-your-devops-performance) to track progress. * Deployment Frequency: Measures how often an organization successfully releases to production. Higher frequency often indicates better DevEx. * Lead Time for Changes: Tracks the time from code commit to code running in production. Shorter lead times suggest more efficient processes. * Time to Restore Service: Measures how long it takes to restore service after an incident. Quick recovery times reflect robust systems and practices. * Change Failure Rate: Represents the percentage of changes that lead to failures in production. Lower rates indicate higher quality and stability. Then, combine these metrics with developer satisfaction surveys, provide a comprehensive view of your development environment's health.  You can also track additional DevEx-specific measurements such as: * Code review turnaround time: Faster reviews often lead to quicker iterations and higher developer satisfaction. * Build and test success rates: Higher success rates typically indicate better code quality and more stable development environments. * Time spent on different activities: Analyzing how developers allocate their time between coding, meetings, and debugging can help identify areas for efficiency improvements. * Onboarding time: Shorter onboarding times often indicate better documentation and support systems. If you can, find a way to automate regular tracking of these metrics so you get a clear view of how things are improving within the team. Our goal isn't just to improve numbers, but to create an environment where developers can work more efficiently and enjoyably. ### What does the future of developer experience look like? Developer workflows are improving rapidly. What was relevant just one year ago seems like a decade old today.  For instance, the simple code completion tools we used earlier are now replaced with LLM-powered code completion tools. And these tools not only complete the line you were writing but also the rest of the function.  With AI in the picture now, things are moving at speeds we simply haven’t seen before.  And as technologists, our job is to stay on the lookout for any tech that can make the developer’s life easier. Things that can help them become self-reliant, require less external help that can bottleneck progress, and basically unblock them completely.  That said, if you want to unblock your CI/CD pipelines and automate infrastructure provisioning and much more, try [Facets.cloud](https://facets.cloud/) today! --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Cloud-spend Optimization in SaaS: 7 Overlooked and Underused Strategies Author: Pravanjan Choudhury Published: 2024-11-10 Category: Blogs Meta Title: 7 Strategies for Cloud-Cost Optimization in SaaS Meta Description: How can SaaS startups optimize cloud costs without compromising innovation? This article explores 7 underused strategies to reduce cloud spend. Tags: cloud cost management, cloud cost optimization , cloud spend optimization URL: https://blog.facets.cloud/managing-cloud-spend-in-saas-7-overlooked-opinions ![graphical representation of cloud spend optimization in saas](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/dalle-2024-01-23-15-1706004803330-compressed.png) $100,000 - for FREE. That’s what you get when you sign up for the AWS startup program. Microsoft will see and raise at $150,000 and GCP will go all-in at $350,000. That’s at par with the seed round funding of quite a few startups.  It is not a joke! These cloud platforms really want you to succeed. They want you to scale, and be a cheerleader in that journey. Until, they can penalize you for your success. As your start-up begins to grow, the cost of being on cloud, which felt like a never-ending grant once, quickly becomes the reason why your Head of Finance doesn’t want to hang-out with you anymore.  It pinches you too, but you haven’t laid the foundations of managing cloud costs, and now you want to look at it at a later date. That date is now.  > For analytics and data processing apps, your cloud spend typically accounts for about 15% of the revenue. For your regular Saas apps, the spend should be below 7%. So you should read this article if your cloud spend is above these benchmarks. We are going to talk about some of the overlooked (and yet super-effective) opinions and frameworks to keep the cloud costs in check. Ready to take off? (pardon the pun). 1\. Agreeing to a Single Metric, a North Star --------------------------------------------- The [cloud cost](https://blog.facets.cloud/cloud-cost-optimization-efficiency-by-design/) affects all the teams - Tech, Finance, and Sales. It's vital to have a common key measure, or a "North Star" metric, that everyone in the company understands and uses to make decisions. The North Star metric should reflect what's most important to your business and could be something like the cost for each user, each user action, or each page view. For simplicity, we'll talk about the 'cost of a single transaction'. Let me share an example from my own experience, a CRM (Customer Relationship Management) SaaS company. In this case, 'retail transactions' were used as the key metric. However, instead of just calculating the cloud cost based on total transactions, it was divided into two categories: a) **active cost** and b) **holding cost**. **Active Cost per Transaction:** This is the cost for each transaction as it happens in real time. For example, it might be $0.1 for every transaction. **Holding Cost per Transaction:** This is the cost for storing transaction data for future analysis, like $0.001 for each transaction. This method was effective because the company managed new transactions (for immediate processing) and old transactions (for analysis later) in different systems. This approach had several benefits: **For Sales Teams:** They could price new accounts by considering the expected number of monthly transactions and how long transaction data needed to be retained for analysis. **For Finance Teams:** They could calculate the gross margins per account more accurately. **For Tech Teams:** They had specific goals (or OKRs) to work on improving the system to reduce both the active and holding costs per transaction. This gave everyone a clear idea of the costs associated with each transaction, leading to better decision-making. You would argue that tech teams will be hesitant to engage in these discussions if their systems aren't directly linked to transactions. And that's okay. An approximate model that roughly matches the cost model is often good enough, and it can include various factors as long as they align with the overall cost model. _**Also Read:**_ [_**10 Best Software Release Management Tools to Streamline Your Deployment**_](https://blog.facets.cloud/10-best-software-release-management-tools/)​ 2\. Private Cloud? Why not! --------------------------- In SaaS, the standard approach is to use multi-tenant architectures, where a single deployment of the software serves multiple customers. This setup is popular because it offers agility and cost-effectiveness. However, there are compelling reasons to consider private cloud deployments, especially for certain types of clients and situations. ### Reasons for Considering Private Cloud Deployments **Handling High-Volume Accounts:** When you begin to attract bigger clients with high transaction volumes, mixing these high-load workloads with lower-volume accounts in the same cloud environment can lead to inefficiencies. For instance, a large customer may need more resources (over-provisioning), faster response times, and a more robust disaster recovery strategy. This can increase costs and complexity for all clients sharing the same environment. **Cost of Cloud Services:** The cost of cloud services typically rises with the number of transactions. Although you can try to optimize resource utilization, you can't entirely prevent costs from scaling with usage due to the pricing structure of cloud providers. This aspect becomes critical when your pricing model as a SaaS provider differs from that of the cloud providers. For example, if you offer volume discounts or enterprise deals, there could be a mismatch between what you charge your customers and what you pay your cloud provider, potentially reducing your gross margins. **Offloading Cloud Spend to Customers:** By deploying a private cloud on a customer's account, SaaS startups can offload the cloud expenditure to the customer. This approach can be more attractive to high-end customers who expect a certain level of service and performance. It also aligns the customer's expenditure with their actual usage, potentially leading to more efficient and satisfactory outcomes. **Data Security and Compliance:** For many large companies, data security and regulatory compliance are critical concerns. Deploying on a private cloud can give them the assurance that their data remains within their controlled environment. This can be a significant selling point for early-stage SaaS startups trying to compete with larger enterprises. Challenges of Private Cloud Deployment Private cloud deployments have been less popular due to the higher maintenance costs and complexity they bring, especially for tech teams managing multiple environments. However, modern DevOps tools and practices have evolved significantly, making the management of multiple environments more feasible. Automation, containerization, and [infrastructure as code](https://www.facets.cloud/no-code-infrastructure-automation) can help manage these complexities much better now. 3\. Be at a ‘Striking Distance’ from Cloud Optionality ------------------------------------------------------ Cloud optionality for SaaS startups is about maintaining flexibility in their cloud strategy. It's about being prepared to switch providers if needed, choosing services that allow for easy migration, and staying adaptable to take advantage of the best offerings from cloud providers. ### Reasons for Considering Cloud Optionality * **Customer Preferences:** Customers might favor certain cloud providers for their unique features, existing setup, or data laws. * **Regional Cost Differences:** Some regions offer lower cloud costs, important for startups managing expenses. * **Potential Partnerships:** Flexibility for future cloud provider partnerships can bring financial and support benefits. * **Avoiding Single Provider Dependence:** Diversifying providers prevents lock-in, improves negotiation power, and prepares for unexpected changes. ### What Cloud Optionality Means **Not Necessarily Multi-Cloud Deployment**: Cloud optionality doesn't imply deploying on multiple clouds simultaneously. Instead, it means being prepared and capable of switching to another cloud provider if needed. **Maintaining “Striking Distance”**: This approach involves being ready to switch clouds without significant delays or disruptions. It requires a good understanding of the different cloud environments and ensuring that the application architecture supports such flexibility. **Using “Blue-Collar” Services**: Startups should prefer using managed services that have equivalents in other clouds. For example, using PostgreSQL Aurora offers the benefits of managed services while retaining compatibility with [PostgreSQL](https://blog.facets.cloud/k8s-postgresql-operator/), making it easier to switch clouds if necessary. The approach requires careful planning and understanding of the cloud landscape but can offer significant advantages in terms of cost, performance, and strategic positioning. **_Also Read:_** [**_10 Best Cloud Infrastructure Automation Tools_**](https://blog.facets.cloud/cloud-infrastructure-automation-tools/) 4\. Decentralized Governance of Cloud Costs ---------------------------------------------- Decentralized cost governance in SaaS startups is an approach that, though less common, can be more effective over time compared to centralized cost governance.  Although centralized cost governance can provide immediate results, decentralized governance offers a more sustainable approach by involving developers directly in cost management, thus fostering a culture of cost awareness and responsibility across the organization. **Let’s take a comparative look:** **Parameter** **Decentralized Cost Governance** **Centralized War Room-Based Cost Governance** Approach Distributes cost management responsibilities across developer teams. Concentrates cost management in a dedicated expert team. Cost Visibility Each development team has visibility and accountability for their specific costs. Cost visibility is limited to the central team, with less transparency for other teams. Responsibility Developers are directly responsible for the costs of the systems they manage. A smaller group of experts is responsible, leading to less ownership among other teams. Cost Attribution Fine-grained, with specific costs attributed to respective services. Broader, with costs managed at a higher, more centralized level. Involvement in Decision-Making Developers are empowered to make cost-related decisions for their systems. Decision-making is typically top-down, with less input from individual developers or teams. Metric Inclusion in Sprints Cost is treated as a key metric alongside performance and availability in development cycles. Cost management is often separate from regular development sprints. Long-Term Sustainability More sustainable due to widespread ownership and continuous involvement. Less sustainable as it depends heavily on a small group of experts. Optimization Opportunities Greater opportunity for cost optimization due to direct developer involvement. Optimization may be limited to the expertise and capacity of the central team. Cultural Impact Fosters a culture of cost-awareness and responsibility across the organization. Can lead to a disconnect between cost management and other organizational functions. Suitability Better suited for organizations aiming for long-term, sustainable cost management. More effective for short-term or immediate cost control needs. **5\. A Culture of Service Resource Requirements‍ Documentation** -------------------------------------------------------------------- While teams often use tools to analyze cloud bills and identify resource-intensive services, they frequently miss an important step: formally documenting the specific resource requirements of each service. This oversight can lead to missed opportunities for cost optimization.  **Here’s a condensed version of what this approach involves:** * Determine if a service must always be running, or if it can be scheduled as a job. * Assess whether the service can handle restarts, to balance the use of on-demand and spot instances. * Define how the application should scale according to varying demands.Define what resources are needed for each service  This approach helps get a better grip on what cloud services are costing, going further than just what you see on the basic reports. It digs into smarter ways to save money.  Sure, it takes a bit of work at the start to document everything, but it pays off. In the long run, you end up saving money and running things more smoothly. It's a smart move for any organization that wants to get better at managing their cloud costs and improve their financial health. **6\. Focus on Optimizing Low-Risk, Non-Production Environments** -------------------------------------------------------------------- In SaaS startups, it's usual to charge the costs of non-production stuff to tech teams and the costs of production to the company's main financial records (PnLs). Production costs are generally bigger and more linked to the business. So, when trying to cut these costs, you need to be careful because there's more at stake. But, cutting costs in non-production areas is less risky and a good place to start. Here are some easy ways to do this: 1. **Go for Spot Instances:** Use spot instances for non-production tasks to cut down costs a lot. 2. **Control Resource Usage:** Put limits on how much resources your non-production setups can use. 3. **Separate Testing Areas:** Have different places for different kinds of testing, like checking functions or load testing. 4. **Shut Down When Not Needed:** Turn off the non-production setups when nobody's using them, like on weekends. It’s also important to keep a budget-friendly local development environment. For a typical SaaS company spending about $1 million on the cloud, non-production stuff can be up to 20% of all costs. With smart steps, this can often be brought down to less than 8%. By starting with non-production areas, SaaS startups can save money quickly without messing with their main production systems. This way, they can get better financially without taking big risks. _**Also Read:**_ [_**7 Best Internal Developer Platforms (IDPs)**_](https://blog.facets.cloud/7-best-internal-developer-platforms-idps-to-consider/)​ **7\. Applying Budget Constraints on experimental Data Workloads‍** ------------------------------------------------------------------- In recent years, SaaS startups have greatly benefited from the big data trend, using advanced tools like Apache Spark and Snowflake for fast data processing. These technologies, more resource-intensive than traditional data warehouses, have enabled diverse queries, leading to higher cloud costs, often over half of total expenses. While reverting to older systems isn't ideal, finding a balance is crucial. Commonly, data products start with trial queries in notebooks and are then productized if the return on investment (ROI) justifies it. We propose cloud cost should also be considered when deciding on the next set of productization when the budget is hit for the experimental data workloads. The experimentation phase is costliest, but productization focuses on cost-efficiency. By budgeting wisely and deciding when to convert experiments to products, companies can manage costs. The aim is to create efficient reports, dashboards, or datasets, reducing the need for expensive queries. Keeping spending within limits, like 20% of the total budget, is essential. Overall, while advanced data processing has advantaged SaaS startups, managing and optimizing cloud spending in this big data era is equally crucial. A strategic approach to productization, mindful of cloud costs and resource use, ensures a sustainable balance. **Final Thoughts** The cloud isn’t going anywhere. Neither are the bills. But we need to find ways to keep the costs in check or else, we will defeat the purpose of moving to the cloud, at least when it comes to the overall costs. Let me know your thoughts and practices you might be implementing at your organization for better cloud-cost optimization. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## How to Automate Your DevOps Pipeline Using Platform Engineering Author: Kirti Krishan Published: 2024-10-30 Category: Blogs Meta Title: How to Automate Your DevOps Pipeline Using Platform Engineering | Facets Meta Description: Discover how platform engineering can streamline DevOps pipeline automation for faster deployments, improved productivity, and enhanced reliability. Tags: platform engineering URL: https://blog.facets.cloud/automate-devops-pipeline-platform-engineering What if your entire software development and delivery process could be automated? That's the magic of DevOps pipeline automation. Think about your current process - from sketching out product features, to writing code, running tests, and finally pushing to production. Each step probably involves manual handoffs, waiting periods, and the occasional human error. Sure, you’ll move forward eventually, but not at the speed you need.  By smoothly integrating development and operations processes, automated DevOps pipelines are becoming the backbone of efficient software delivery reducing manual errors, accelerates deployments, and fosters improved team collaboration, making it an essential component for organizations striving to stay competitive, especially today where speed and agility are the need of the hour for delivering high-quality software.  What is Platform Engineering in DevOps? --------------------------------------- Platform Engineering, another fancy word for DevOps? Not quite. It is an evolution that first redefines the organization's internal DevOps frameworks, then accelerates its path to achieving goals.  Platform engineering in DevOps is all about creating a robust and scalable infrastructure that empowers development teams to thrive. By aligning platform engineering with DevOps principles, organizations can leverage reusable automation blueprints and seamlessly integrate CI/CD pipelines. This approach allows developers to automate routine tasks, freeing them up to concentrate on innovation rather than being bogged down by infrastructure concerns. Key Steps to Automate Your DevOps Pipeline with Platform Engineering -------------------------------------------------------------------- Facets provides essential tools for platform engineering, enabling DevOps teams to automate infrastructure management, CI/CD processes, and multi-cloud environments. To effectively automate your DevOps pipeline using platform engineering, consider the following steps: ### Step 1: Identify Manual Processes The initial phase in DevOps pipeline automation involves spotting manual processes that create bottlenecks in development. Focus on repetitive tasks such as code testing, integration, infrastructure provisioning, and deployment - these are perfect candidates for automation. Begin by asking your team: What repeated tasks are we doing manually that could be automated without losing value? Once identified, these processes can be matched with automation tools that handle them efficiently. ### Step 2: Leverage Infrastructure as Code (IaC) A fundamental element in automating your pipeline is adopting Infrastructure as Code (IaC). IaC enables you to manage and provision infrastructure through code instead of manual configuration. Using tools like Terraform or AWS CloudFormation, you can automate infrastructure setup and management across your development, staging, and production environments.  Facets streamlines DevOps automation by providing no-code infrastructure blueprints. Rather than writing extensive code for infrastructure management, DevOps teams can utilize pre-built blueprints that simplify configuration and deployment. This method eliminates manual intervention and maintains consistency across environments. ### Step 3: Implement CI/CD Pipelines The heart of DevOps pipeline automation lies in CI/CD pipelines. Continuous Integration (CI) automatically integrates code changes into a shared repository, while Continuous Deployment (CD) automatically deploys tested code to production. Tools like Jenkins, CircleCI, and GitLab CI automate these pipelines, ensuring new code is automatically tested and deployed without manual intervention. Facets enhances CI/CD pipeline automation through Kubernetes integration for orchestration, allowing teams to concentrate on coding rather than managing deployment complexities. Whether deploying to AWS, Azure, or Google Cloud, Facets handles the orchestration, scaling, and configuration needed to maintain smooth application operation. ### Step 4: Integrate Automation Tools Selecting appropriate tools is essential for comprehensive DevOps automation. Industry-standard tools like Kubernetes for container orchestration, Jenkins for CI/CD pipelines, and Terraform for infrastructure automation are key components in automating DevOps pipelines. However, managing these tools across multi-cloud environments presents its challenges.  Facets offers a platform that streamlines the integration of these tools into your DevOps pipeline, delivering pre-built automation solutions that boost deployment speed, improve developer productivity, and ensure operational efficiency. Tools for DevOps Automation --------------------------- Modern organizations rely heavily on DevOps automation tools to stay competitive. These essential tools enhance development workflows, strengthen system reliability, and boost operational efficiency throughout the software lifecycle. Here are three core tools that power modern DevOps automation and enable robust, automated workflows. ### 1\. Kubernetes for Container Orchestration At the heart of container management lies Kubernetes, an orchestration platform that streamlines the deployment, scaling, and operation of containerized applications. Its ability to maintain application performance across diverse environments while handling load distribution and failover scenarios makes it essential for modern cloud-native application management. Key Benefits: * Automatic scaling of containers based on demand * Built-in load balancing and traffic routing * Zero-downtime deployments with rolling updates ### 2\. Jenkins for CI/CD Pipelines Pipeline Automation Platform A cornerstone in CI/CD automation, Jenkins empowers development teams to streamline their software delivery workflow. By automating build processes, test execution, and deployment procedures, it minimizes human error and accelerates delivery timelines. Its extensive compatibility with various tools enhances its versatility in automation workflows. Key Benefits: * Rich plugin ecosystem for seamless tool integration * Parallel execution of build and test processes * Customizable pipelines with code-based configuration ### 3\. Terraform for Infrastructure Automation As a powerful Infrastructure as Code solution, Terraform automates cloud infrastructure provisioning and maintenance across multiple providers. By enabling teams to codify their infrastructure specifications, it ensures consistent and repeatable environment setup processes. This approach significantly reduces manual infrastructure management overhead, supporting key platform engineering objectives. Key Benefits: * Multi-cloud infrastructure management from a single tool * Version control for infrastructure changes * State tracking for complex resource dependencies Benefits of Automating DevOps Pipelines with Platform Engineering ----------------------------------------------------------------- Platform engineering combined with DevOps automation transforms how organizations deliver software. Here's a detailed look at the key benefits this integration brings: ### Reduced Time-to-Deployment Think of this as getting your software updates out faster. Here's what happens when you automate: * Eliminating Manual Bottlenecks: Tasks that once took hours or days can be completed in minutes through automation * Enabling Parallel Processing: Multiple deployment stages can run simultaneously, reducing overall pipeline execution time * Streamlining Approvals: Automated governance checks and approval workflows reduce administrative delays * Standardizing Deployment Procedures: Pre-configured deployment templates ensure consistent and swift deployments across environments ### Enhanced Developer Productivity This means your developers can spend more time building features instead of dealing with tedious tasks: * Self-service Infrastructure: Developers can provision resources on-demand without waiting for operations teams * Automated Testing and Validation: Continuous testing throughout the pipeline catches issues early * Standardized Development Environments: Consistent environments across the team eliminate "works on my machine" problems * Reduced Context Switching: Developers spend less time on operational tasks and more time writing code * Automated Code Quality Checks: Built-in code analysis and security scanning reduce technical debt ### Increased Reliability and Consistency Automation through platform engineering ensures: * Predictable Outcomes: Every deployment follows the same tested and verified process * Error Reduction: Automated processes eliminate human errors in routine tasks * Reliable Rollback Procedures: Quick recovery from issues through automated rollback mechanisms * Consistent Security Practices: Automated security scanning and compliance checks in every deployment * Audit Trail: Comprehensive logging and monitoring of all pipeline activities Challenges in DevOps Pipeline Automation ---------------------------------------- While automating DevOps pipelines offers benefits like faster deployments and enhanced productivity, there are still several challenges that organizations face. Understanding these hurdles is the first step to overcoming them.  ### 1\. Integrating Legacy Systems  A fundamental obstacle in DevOps automation involves connecting older systems with modern automated workflows. Organizations frequently depend on traditional monolithic architectures that resist seamless integration with container environments and CI/CD pipelines.  These legacy platforms often require hands-on management, complicating automation efforts without substantial architectural modifications. Solution: Success lies in implementing a gradual hybrid strategy for automation adoption. Creating API wrappers or microservices around legacy systems can facilitate their integration into contemporary pipelines. Solutions like Facets enable partial automation of infrastructure and CI/CD processes, even when working with traditional backend systems. ### 2\. Managing Complex Toolchains Managing extensive DevOps toolsets presents another significant challenge. Teams often utilize varied solutions for source control, CI/CD, system monitoring, and infrastructure oversight, resulting in knowledge isolation and increased maintenance complexity. Solution: Tool standardization and consolidation offers a path forward. Facets streamlines this process by providing ready-made connections to common DevOps tools including Kubernetes, Jenkins, and Terraform, minimizing the challenges of managing separate systems. Its unified dashboard simplifies tool management across pipeline stages, streamlining operational workflows. ### 3\. Ensuring Security in Automated Pipelines As organizations automate their pipelines, security maintenance becomes increasingly critical. While automation enhances efficiency, it may introduce security gaps, particularly when safety checks are overlooked or poorly integrated. Automated systems also expand potential vulnerability points across code repositories, build environments, and deployment platforms. Solution: Incorporating DevSecOps principles into pipeline design ensures security automation alongside deployment processes. Facets implements security controls and access restrictions to maintain safety standards during automation. Embedding security scanning and verification within CI/CD workflows helps maintain security without sacrificing automation benefits. Wrapping Up ----------- How can software development teams keep up? DevOps pipeline automation offers a way forward by simplifying repetitive tasks like code testing and deployment. This shift allows developers to focus more on innovation and building new features. By reducing manual work, automation also minimizes errors and speeds up delivery, helping companies meet fast-changing demands. At the same time, platform engineering sets the stage for this efficiency. It ensures that all tools and technologies work together seamlessly, creating a stable environment where automated tasks can thrive. When combined, DevOps automation and platform engineering empower development teams to deliver high-quality software faster and adapt to market changes.​ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Managing Kubernetes Clusters: Why, How, and Best Practices Author: Ayush Verma Published: 2024-10-28 Meta Title: Managing Kubernetes Clusters: Why, How, and Best Practices | Facets Meta Description: Learn the essentials of Kubernetes cluster management, including why it's critical, how to streamline it, and best practices for optimal performance. URL: https://blog.facets.cloud/managing-kubernetes-clusters It's 3 AM.  Your phone buzzes with an urgent alert.  Your company's flagship app is down.  Customers are flooding social media with complaints. Your boss is texting you in all caps.  Welcome to Kubernetes cluster management. If you're a developer or DevOps engineer, you've likely experienced similar heart-pounding moments.  Kubernetes (K8s) has changed how we deploy and scale applications. But with great power comes great responsibility—and a fair share of headaches. This guide will walk you through the ins and outs of K8s cluster management.  We'll explore why it's important, how to manage Kubernetes clusters effectively, and the best practices to save your sanity (and your sleep).  Why even manage the Kubernetes architecture? -------------------------------------------- Let's start with a hard truth: neglecting your Kubernetes clusters is like ignoring the check engine light on your car. It will seem fine for a while, but eventually, you'll end up stranded on the side of the road. Proper cluster management is the backbone of a healthy K8s ecosystem.  It's about optimizing performance, enhancing security, and preparing for growth. To put it into perspective, here’s a scenario. Imagine that you own an ecommerce site that’s gearing up for Black Friday.  Traffic is expected to spike 10x.  Without proper cluster management, you're essentially trying to funnel a flood through a garden hose. On the big day, your site crashes, sales drop, and your CEO is giving you the death stare in the emergency meeting. With effective cluster management, you're ready. Your clusters scale automatically to handle the surge. Customers make their purchases. Your CEO is now looking at you like you're the second coming of Steve Jobs. This is what good K8s management does. It helps your business scale online.  Let's break down why it matters: 1. Operational efficiency: Well-managed clusters run smoothly, use resources efficiently, and respond quickly to changes. This translates to faster deployments, reduced downtime, and happier developers. 2. Scalability: The ability to scale is K8s' superpower. But without proper management, it's like having a supercar with no fuel. Good management practices ensure your clusters can grow (or shrink) on demand, handling traffic spikes without breaking a sweat. 3. Security: Properly managed clusters have strong firewalls, the latest intrusion prevention systems (IPS), and secure communication channels. They protect your data, your users, and your reputation. 4. Cost control: Cloud resources aren't free, and poorly managed clusters can burn through your budget faster than a teenager with their first credit card. Effective management helps optimize resource usage, so you're not paying for idle capacity. 5. Compliance: For many industries, compliance is a legal requirement. Good cluster management helps ensure you're meeting regulatory standards, avoiding hefty fines and legal headaches in the future. 6. Performance optimization: Well-managed clusters perform better, providing a smooth user experience that keeps customers coming back. Now that we understand the 'why', let's dive into the 'how'.  But first, we need to understand what we're dealing with. Understanding the Kubernetes cluster architecture ------------------------------------------------- The Kubernetes architecture can seem as complex as a Rube Goldberg machine.  ** ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1730697028965-compressed.png) ** [Rube Goldberg Machine](https://kolaszek.medium.com/rube-goldberg-machine-calculations-381db05667ef) But you need to understand it to be able to efficiently manage the clusters.  Let's break it down into tiny components that make up the architecture. At its core, a K8s cluster is like a miniature data center. It has a control plane (the brain) and worker nodes (the muscle). The control plane makes global decisions about the cluster, while the worker nodes run your applications. ** ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1730697030097-compressed.png) ** [Source](https://kubernetes.io/docs/concepts/overview/components/) The control plane consists of several key components: 1. [API server](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/): This is the front door to your K8s cluster. All commands, internal and external, go through here. It's like the maitre d' at a fancy restaurant, directing traffic and ensuring everything runs smoothly. 2. [etcd](https://etcd.io/): Think of this as the cluster's memory. It's a distributed key-value store that holds all the critical information about the cluster state. Without etcd, your cluster would have the memory of a goldfish. 3. [Scheduler](https://kubernetes.io/docs/concepts/scheduling-eviction/kube-scheduler/): This component is like an expert Tetris player. It decides which node should run which pod, considering factors like resource requirements and constraints. 4. [Controller Manager](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/): Imagine a team of supervisors, each responsible for a different aspect of the cluster state. That's the Controller Manager. It ensures that the actual state of the cluster matches the desired state. The worker nodes, on the other hand, are where the real action happens. They run your applications inside containers, grouped into pods.  Each worker node has its components: 1. [Kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/): This is the primary node agent. It's like a diligent worker, ensuring that containers are running in a pod. 2. [Kube-proxy](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/): Think of this as the cluster's traffic cop. It maintains network rules on nodes, allowing network communication to your pods from inside or outside the cluster. 3. [Container Runtime](https://kubernetes.io/docs/setup/production-environment/container-runtimes/): This is the software responsible for running containers. Docker is a popular choice, but K8s supports other runtimes too. Now that we understand the architecture, let's explore some best practices for keeping this complex system running smoothly. How to best manage Kubernetes clusters? --------------------------------------- Managing a K8s cluster is like conducting an orchestra. Each section needs to work in harmony for the performance to be successful. Here are some key practices to keep your K8s symphony in tune: ### Efficient Resource Allocation Resource management in K8s is a delicate balancing act. Allocate too little, and your applications starve. Allocate too much, and you're wasting resources (and money). * Start by setting appropriate resource requests and limits for your containers. This is like giving each musician in your orchestra the right amount of sheet music—not too much, not too little. * Use [namespace resource quotas](https://kubernetes.io/docs/concepts/policy/resource-quotas/) to prevent resource hogging. It's like ensuring one overzealous violin section doesn't drown out the rest of the orchestra. * Implement [pod priority and preemption](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/). This ensures your critical applications (your first chair players) always have the resources they need, even if it means bumping less important ones. ### Automated Scaling and Monitoring In the world of K8s, change is the only constant. Your cluster needs to adapt to varying loads automatically. * Set up the [Horizontal Pod Autoscaler (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) to adjust the number of pods based on CPU utilization or custom metrics. It's like having an assistant conductor who can bring in more musicians during complex passages. * Implement [cluster autoscaling](https://kubernetes.io/docs/concepts/cluster-administration/cluster-autoscaling/) to automatically adjust the number of nodes. This ensures you always have enough infrastructure to handle your workloads without wasting resources. * Use tools like Prometheus and Grafana for monitoring and alerting. They're like having a team of sound engineers constantly monitoring the quality of your performance. ### Security Management with RBAC Security in K8s is not a set-it-and-forget-it affair. It requires constant vigilance. * Implement [Role-Based Access Control (RBAC)](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) to limit who can do what in your cluster. It's like having different levels of backstage passes at a concert. * Use network policies to control traffic flow between pods. This prevents unauthorized communication, like ensuring your brass section isn't secretly communicating with the percussion during a performance. * Regularly update and [patch your K8s components](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_patch/). Cyberthreats evolve quickly, and yesterday's security measures might not cut it today. These best practices already set you up for success. But you will still face challenges.  Let's look at some common ones and how to tackle them. Common Challenges in K8s Cluster Management ------------------------------------------- Even seasoned K8s conductors face their share of discord. Here are some common challenges and how to address them: * Configuration drift: Your configurations will drift apart as teams deploy changes across multiple environments. You can prevent this by implementing GitOps workflows, where Git serves as your single source of truth. When you connect tools like Flux to your repositories, they automatically sync cluster states with your defined configurations, eliminating manual drift corrections. * Resource optimization: Pods consume resources unpredictably, making capacity planning difficult. Set up the [Vertical Pod Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler) to adjust CPU and memory requests based on actual usage patterns. Check your resource metrics weekly and create alerts for unexpected spikes. You'll want to track both peak usage times and idle periods to optimize your resource allocation strategy. * Multi-tenancy: Multiple teams sharing one cluster often interfere with each other's workloads. Start by creating strict namespace boundaries and enforce them with resource quotas. Add network policies to control communication between workloads. When teams need stronger isolation, [deploy virtual clusters](https://www.vcluster.com/docs/get-started); they provide dedicated control planes while sharing your underlying infrastructure. * Networking complexity: Kubernetes networking becomes more complex with each new service you add. You'll benefit from implementing a service mesh like [Istio](https://istio.io/). Beyond basic connectivity, it handles encryption, access control, and detailed traffic analysis. You can then control service-to-service communication through high-level policies instead of low-level network rules. * Monitoring and troubleshooting: Finding issues in distributed systems requires careful observation. Set up distributed tracing with [Jaeger](https://www.jaegertracing.io/) to watch requests flow through your services. Send all component logs to a central store for quick searching. Create dashboards that show service health at a glance, and add alerts for common failure patterns you've encountered. * Upgrade management: Kubernetes upgrades carry risk but provide critical security fixes and features. Start each upgrade on a test cluster that mirrors your production setup. Move a small portion of traffic to upgraded nodes first, watching for problems. And whatever changes you make, keep detailed notes and verify your rollback procedure works before touching production systems. Tools and Solutions for Optimized K8s Management ------------------------------------------------ You'll find that managing Kubernetes becomes much simpler when you use the right tools. Let’s look at some tools that can save you countless hours and help you manage your clusters more effectively. 1. [Kubernetes Dashboard](https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/) serves as your central command center. Open the web interface and see your entire cluster at a glance. Monitor deployments, check pod health, and make quick adjustments without touching the command line. 2. [Facets.cloud](https://facets.cloud/) simplifies platform engineering for your team. You can create and manage environments across multiple clouds with a single click. The built-in self-service capabilities and standardized automation help you boost productivity and unblock your developers who would otherwise be waiting on Ops tickets being actioned. Facets also helps you build architecture blueprints through drag-and-drop interfaces instead of writing YAML files. And all of this combined with comprehensive monitoring for your applications with integrated observability tools. 3. [Helm](https://helm.sh/) makes deploying applications feel like installing apps on your phone. Pick from thousands of pre-made charts to deploy databases, monitoring tools, and complete application stacks. You'll skip hours of writing YAML files and focus on running your applications instead. 4. [Prometheus and Grafana](https://prometheus.io/docs/visualization/grafana/) work together to give you complete visibility into your cluster's health. Prometheus pulls metrics from your applications and infrastructure, while Grafana turns those numbers into clear, actionable insights. Set up alerts to catch problems before users notice them. 5. [Istio](https://istio.io/) adds smart networking features to your cluster without changing your code. Route traffic between services, enforce security policies, and track how services communicate. You'll gain deep insights into your application's behavior and catch networking issues early. 6. [Rancher](https://www.rancher.com/) lets you manage multiple clusters as easily as one. Control access, deploy applications, and monitor health across your entire Kubernetes fleet from a single screen. You'll maintain consistency across environments and reduce management overhead. 7. [Lens](https://k8slens.dev/) brings the convenience of an IDE to Kubernetes management. Connect to your clusters, edit resources, and troubleshoot issues through an interface that feels familiar to developers. You'll reduce the learning curve for team members new to Kubernetes. 8. [Kustomize](https://kustomize.io/) helps you maintain different versions of your configurations without duplicating files. Define a base configuration and apply environment-specific changes on top. You'll keep your configurations clean and maintainable as your applications grow. These tools can significantly simplify your K8s management tasks. They address many of the common challenges we discussed earlier, from configuration management to monitoring and security. Build K8s clusters that scale with your business ------------------------------------------------ Your teams want to move fast and ship code. Give them the tools to succeed. Managing Kubernetes clusters may seem overwhelming at first, but with the right practices and tools, you'll build resilient systems that grow with your needs.  Start small, implement the necessary monitoring, and slowly add automation as your requirements evolve.  For teams looking to accelerate this journey, [Facets](https://facets.cloud/) can help you standardize operations and empower developers with self-service capabilities, letting you focus on what matters most—building great applications.  [Try Facets today!](https://www.facets.cloud/signup) --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Building an Internal Developer Platform: The Harsh Reality Behind the Promise Author: Anshul Sao Published: 2024-10-21 Category: Blogs Meta Title: Building an Internal Developer Platform Meta Description: Is your internal developer platform slowing down deployment velocity? See why buying an extensible IDP accelerates innovation and boosts productivity compared to complex in-house platforms. Tags: Internal Developer Platform, platform engineering URL: https://blog.facets.cloud/building-an-internal-developer-platform ![An illustrative image of the challenges in building an IDP ](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/dalle-2024-01-04-11-1704371557057-compressed.png) Contents * [Why Are Businesses Building IDPs?](#why-are-businesses-building-idps) * [9 Reasons Building an IDP Isn’t The Best Solution for Businesses](#9-reasons-building-an-idp-isnt-the-best-solution-for-businesses) * [4\. Limited External Integrations](#4-limited-external-integrations) * [8\. Constant Updates and Maintenance](#8-constant-updates-and-maintenance) * [9\. Difficulty Retaining Specialized Expertise](#9-difficulty-retaining-specialized-expertise) * [What to do Instead of Building an IDP?](#what-to-do-instead-of-building-an-idp) * FAQs * [FAQs](#faqs) Many companies are enticed by the idea of building an [internal developer platform](https://blog.facets.cloud/in-house-or-third-party-internal-developer-platform/) (IDP)—a software that combines all the systems developers use into a unified experience. ​[IDPs](https://blog.facets.cloud/internal-developer-platforms-the-secret-weapon-for-developer-productivity/) provide developers with the infrastructure, tools, and workflows required to build, deploy, and monitor applications efficiently and at scale. However, building an IDP often fails to live up to the vision. Companies invest substantial time and money into developing IDPs, only to realize it isn’t the optimum use of their resources. This article will examine the gap between the promise and harsh reality of building internal developer platforms and why buying an extensible IDP is more practical. Why Are Businesses Building IDPs? --------------------------------- Why do companies pursue building internal developer platforms in the first place? At first glance, the potential advantages seem compelling: * **Automation** \- IDPs promise to automate repetitive processes like environment setup, configuration, and tear-down. This saves engineering teams time and speeds up workflows. * **Accelerated release cycles** - With an automated platform for testing and deployment, companies expect to deploy updates and new features faster. * **Enhanced security** - Companies believe purpose-built internal platforms will provide tighter security controls and safeguards than third-party solutions. * **Customization** \- Building a platform in-house allows companies to customize it to their specific needs and workflows. * **Control** \- Companies like owning the end-to-end process rather than relying on third-party vendors. Building a platform in-house provides greater control. * **Cost savings** - Some companies believe internal platforms will save money compared to licensing and managing third-party services. These potential benefits align with key business goals: faster time to market, increased efficiency, and cost savings. However, the reality of executing these goals via an internal platform often needs to be clarified. Let’s understand the actual costs of building an IDP vs. buying one. ### 9 Reasons Building an IDP Isn’t The Best Solution for Businesses ![A mindmap of why you shouldn't build an internal developer platform](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1704371409033-compressed.png) Developing an internal developer platform requires a significant upfront investment. The software, engineering resources, and other costs add up quickly. Ongoing maintenance and operation costs are also higher than expected. Let’s look at some of the actual costs behind building an IDP. ### **1\. Complexities of IDPs** Internal developer platforms promise to reduce complexity for engineering teams by providing a unified and automated environment. But in reality, building and managing an IDP introduces daunting complexity, which often backfires. Customizing, integrating, and operating complex infrastructure components like Kubernetes, service meshes, ingress controllers, and CI/CD pipelines is incredibly challenging. There are countless moving parts and dependencies that have to work together seamlessly. Rather than accelerating releases, an over-engineered internal platform can impede productivity as engineers waste time-fighting fires related to the underlying infrastructure. _The bottom line—building an IDP from scratch is a complicated undertaking. If you do take it up with limited resources, the slow updates and high upkeep can negate the developer productivity benefits of an IDP and add to resource wastage._ ### **2\. Technical Debt Accumulation** Internal developer platforms built in-house slowly accumulate "technical debt" over time. This debt eventually makes the platform unstable, insecure, and expensive. * For example, **outdated third-party libraries or frameworks** used in the platform become security liabilities needing urgent upgrades. But constant upgrade projects drain engineering resources. * Other debt comes from engineers **taking shortcuts or bypassing best practices** to match developer agility. This piles up as debt requiring future refactoring or a complete system overhaul. * **Insufficient documentation**— Often inadequate or outdated documentation of in-house built IDP leads to misinterpretation of features and capabilities.This slow accumulation of technical debt starts minor but eventually compounds into significant issues. And the platform team needs to constantly firefight to keep the platform operational. ### **3\. Time Sink of IDPs** Developing IDPs requires extensive time investment that prevents focus on core products. For example, [Treebo](https://www.facets.cloud/case-study/treebo) found developers wasted time maintaining complex in-house deployment tools, driving lengthy production troubleshooting cycles, and taking ~80% of ops time. _"The biggest challenge for the Ops team was that they would spend 70-80% of their time-solving production issues or helping the development teams debug; this was frustrating for both teams. The team's turnaround times would inevitably increase because of this." - Kadam, Co-Founder & CTO, Treebo._ Rather than expediting releases, building an IDP can seriously impede engineering productivity and velocity if attention is diverted from core product development. **4\. Limited External Integrations** Custom-built internal platforms often struggle to integrate smoothly with external tools and systems. This contrasts commercial platforms designed from the ground up for extensibility. * Internal platforms use **isolated architectures** that conflict with existing systems. There's no consideration for standard integration patterns. * The platforms **aren't designed for extensibility**. The focus is only on internal use cases. * It takes specialized skills to build well-documented, stable APIs enabling external integrations. Many internal teams lack these skills. * **Mismatched version dependencies** multiply integration complexity. Commercial platforms actively support integration within rich partner ecosystems. Phrase better: Integrate once is not enough because an IDP underneath uses multiple tool integrations, upgrading integrations constantly with the version upgrades of those tools is a daunting task The point is—poor integrations cripple adoption. Engineering teams default to previous tools incompatible with the new platform, and the platform fails to reach critical mass. ### 5\. Security Risk of DIY Platforms Phrase better: IDPs integrate with sensitive tools in business critical path like cloud, release management etc. Any breach will be catastrophic and hence mitigating security risk is paramount.Ensuring enterprise-grade security with internally built developer platforms is remarkably challenging. DIY platforms can expose organizations to breaches and outages without rigorous controls and processes. Unlike commercial solutions designed for security, DIY platforms often lack things like: * Granular access controls * Change & Upgrade management * Hardened architectures * Audit logs * Automated security testing Internal developer platforms frequently fail to meet enterprise infrastructure's stringent security and availability standards. DIY approaches expose organizations to potential breaches and outages without security woven into the platform. ### **6\. Governance and Compliance Challenges** Implementing robust governance processes presents difficulties for many companies using homegrown IDPs. For example: * **Access controls** - Ensuring developers have appropriate permissions and limiting access is complex to manage in DIY platforms. * **Policy enforcement** - Enforcing security, operational, or regulatory policies programmatically across all systems connected to the IDP is challenging. * **Auditability** \- Many IDPs lack tools to monitor access, changes, and logs needed for audits. * **Compliance** \- Adhering to regulations like HIPAA or PCI is difficult without proper compliance-oriented controls built into the IDP. As companies scale, these governance issues become exponentially more challenging. ### 7\. Ignoring Post-Deployment Management Many internally built IDPs focus heavily on deployment automation while ignoring critical post-deployment aspects that impact developer productivity and system reliability. **IDPs built in-house often lack the following:** * Observability into application and infrastructure performance via logging, monitoring, and metrics. * Troubleshooting capabilities like search and analytics to quickly resolve issues. * Incident response features like automated alerting and runbooks. * Performance management to track and optimize applications. * Cost optimization through real-time visibility and anomaly detection. As a result, developers fly blind after deploying new versions, leading to more application downtime and incidents. In contrast, purpose-built vendor platforms integrate deployment automation with robust operations capabilities out-of-the-box. Companies must carefully evaluate post-deployment functionality if choosing to build IDPs internally versus leveraging commercial solutions designed for production operations. ### 8\. Constant Updates and Maintenance Maintaining and constantly updating an internally built developer platform **significantly adds to the total ownership cost** over time. Consider factors like: * **Platform upgrades and migration** - Major upgrades to core components like Kubernetes or migration to new infrastructure often require substantial engineering effort for an internal platform. Commercial platforms handle upgrades seamlessly behind the scenes. * **New feature development** - Adding new capabilities to meet emerging needs involves dedicating precious engineering resources. Vendor platforms deliver continuous innovation through new feature releases. * **Bug fixes and patches** - Bugs and security vulnerabilities need rapid fixes and patches to avoid disruptions. DIY platforms lack processes for swift resolution. * **Custom integrations** - As new tools and technologies emerge, custom integrations must be built and maintained on the internal platform. * **Technical debt servicing** - The accumulating technical debt makes the platform unstable and expensive. Refactoring and rearchitecting are constant chores. * **Opportunity cost** - The ongoing cost of maintaining the platform soaks up resources better spent on developing core products and innovations. The long-term maintenance and update costs compound over time into a massive total cost of ownership for internal developer platforms. ### 9\. Difficulty Retaining Specialized Expertise Running an internal platform requires specialized expertise in infrastructure automation, APIs, security, integrations, and other complex domains. These skills are scarce, expensive, and challenging to retain in the long term. Constant recruiting, training, and transferring tribal knowledge as experts leave can become an impossible burden. Dependence on irreplaceable personnel is an Achilles heel. In contrast, external vendors concentrate deep expertise across large teams to continually advance their platforms. This makes stability and continuity impossible for internal platform teams to match. So, retaining specialized skills and knowledge is challenging for internally built and managed developer platforms. What to do Instead of Building an IDP? -------------------------------------- Rather than investing months or years in building an internal platform, off-the-shelf solutions provide a faster path to boosting developer productivity and innovation velocity. Prebuilt developer platforms like Facets give you the following: * **Accelerated time-to-market**: Get a unified automation platform out of the box instead of waiting months for custom builds. * **Improved focus**: Developers stay focused on building products vs. maintaining platforms. * **Optimized environments**: Built-in best practices for security, reliability, and cloud optimization. * **Increased collaboration**: Platforms like Facets connect siloed teams for better visibility. * **Future-proof foundation**: Continually enhanced platform vs. stagnant homegrown code. * **Reduced costs**: Avoid expensive hiring for specialized platform skills. Purplle evaluated solutions like [Facets](https://facets.cloud/) which provided environment automation, collaborative workflows between teams, and cloud optimization. This allowed Purplle to [reduce time-to-market by 25X](https://www.facets.cloud/case-study/purplle) and cut cloud costs by 70%. Building an internal developer platform in-house is challenging—steep complexity, soaring costs, and frustrating delays. There's a better way. [Facets](https://www.facets.cloud/) provides a unified, self-serve infrastructure automation platform designed by DevOps experts. It integrates deployment, configuration management, observability, and more into one solution. With Facets, you get reusable blueprints to launch multiple identical environments with a click. It centralizes visibility so everyone works from a single source of truth. Facets also free up your Ops team from repetitive tasks so they can focus on innovation. Companies like [Purplle](https://www.facets.cloud/), [Treebo](https://www.facets.cloud/case-study/treebo), and [Capillary](https://www.facets.cloud/case-study/capillary-technologies) transformed delivery using Facets to increase developer productivity by 20%, reduce ops tickets by 95%, and accelerate time-to-market by 25X. Shift gears on innovation and start shipping value faster to customers with Facets. See for yourself—[get a demo today](https://www.facets.cloud/demo/). ### FAQs #### What are some signs that an internal developer platform is failing? Indicators include engineers complaining about platform reliability issues, rampant bugs, steep learning curve, lack of use among developers, platform-related delays in deployments or releases, and escalating financial costs. #### How long does building an internal developer platform typically take? For most companies, [18-24 months](https://platformengineering.org/talks-library/how-to-build-an-idp-that-does-not-suck) is a typical timeline from initial planning through a minimal viable product launch. However, it often takes 12 additional months to work out bugs, stabilize the platform, and gain adoption across the engineering org. #### Is building an internal developer platform worth the effort and cost? In most cases, no. The risks, complexity, and costs often outweigh potential benefits compared to proven third-party solutions like [Facets](https://facets.cloud/) designed specifically for developer workflows. It allows extensibility so your IDP, even though not built internally, adjusts to your existing workflows instead of your team adjusting to a new workflow. #### What are the most important things companies overlook when evaluating internal developer platforms? Two critical factors often underestimated are the complexity of supporting a reliable, scalable IDP long-term and the substantial opportunity cost of engineering resources spent building the platform versus building products. Companies also often need to pay more attention to the advantages of purpose-built solutions from third-party vendors compared to DIY options. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## What is a Dev Environment? Understanding Dev and Test Environments Author: Kirti Krishan Published: 2024-10-09 Category: Blogs Meta Title: What is Dev Environment - Understanding Dev and Test Environments | Facets Meta Description: Explore the differences between development and test environments. Learn why these environments are crucial for software development and best practices for managing them. Tags: Developer environments, devops experience URL: https://blog.facets.cloud/what-is-dev-environment-test-environment You click a button in GitLab.  Seconds later, a pipeline runs on your merged code—it tests and deploys your changes to the next step in the pipeline. This process relies on complex development and test environments that remain mostly hidden from view. However, developer environments form the backbone of your software development process. Dev environments can lead to [a 30% reduction in customer-detected defects](https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/yes-you-can-measure-software-developer-productivity), according to a Mckinsey survey.  They provide controlled spaces for you to write code, experiment with features, and thoroughly test applications before release. Tools like BitBucket and GitLab abstract away many of the details. But as a developer, it always helps to understand what the environments are, what they consist of, the types of environments, and more.  Let’s get started, shall we? **First, what are environments?** --------------------------------- Environments in software development refer to distinct stages that applications go through during their lifecycle.  Think of environments as separate workspaces for your application. While each environment is generally an exact replica of the production environment, each one serves a specific purpose in your development process.  You'll write and test code in one environment, then move it to another for further testing, and finally to a production environment for your users. This separation ensures that various aspects of an application, such as code, configuration, and data, are managed appropriately for different purposes. You'll find and fix problems faster, leading to smoother releases and happier users. This structured method gives you confidence in your code at each stage of development. **What are the different types of development environments?** ------------------------------------------------------------- ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1728467001231-compressed.png) Let's explore the various environments that play a pivotal role in the software development lifecycle: #### Development Environment You [set up a dev environment](https://blog.facets.cloud/ideal-development-environment-for-optimal-software-development/) to write code efficiently, perform unit testing, and verify basic functionality before moving to the testing stage. In the development environment, you: * Work mostly locally, but may have centralized servers to sync code * Collaborate with developers to create new functionality and implement it in existing code * Compile locally and test if things are working as expected * Do peer reviews or over-the-shoulder reviews to ensure you’re not missing something Dev environments give you the freedom to make mistakes, try new approaches, and refine your code without worrying about breaking production systems or inconvenienceing users. Try to make sure your development environment mirrors your production environment as closely as possible to avoid any bugs in production. ### **Test Environment** The test environment is where the real detective work happens.  In the testing environment, you: * Run different types of tests, including unit tests, integration tests, and user acceptance tests * Ensure the application meets expected standards and operates as intended * Simulate various user scenarios to identify potential issues * Validate the application's behavior under different conditions * Verify the compatibility of different components and systems The test environment provides a controlled space for rigorous quality assurance. It allows you to identify and address issues before they reach end-users, improving the overall reliability and performance of your software. ### **Staging Environment** This environment is generally an exact replica of the production setup, including server configurations, databases, and network settings. In the staging environment, you: * Simulate real-world usage scenarios to identify performance bottlenecks * Uncover compatibility issues between different components * Detect any last-minute glitches or unexpected behaviors * Test the deployment process to ensure smooth transitions * Validate the application's performance under realistic conditions The staging environment provides you with a more accurate assessment of how your application will behave once released.  It serves as the final checkpoint before your software reaches end-users, allowing you to address any remaining issues and fine-tune your application's performance. ### **Production Environment** This setup represents the final destination in your software journey, where real users interact with your application. In the production environment, you focus on: * Optimizing for performance, reliability, and security * Continuously monitoring application behavior and user interactions * Implementing regular updates and patches to address emerging issues * Responding quickly to unexpected problems to minimize downtime * Scaling resources to meet changing user demands The production environment directly impacts your users and business operations. Any issues occurring here can have significant consequences, making it super important to maintain stability and responsiveness.  You'll need to balance the need for new features and improvements with the imperative of maintaining a stable, reliable system for your users. Let’s look at two of the most important environments in the development stage now—the dev and test environments. **What is a development environment?** -------------------------------------- ​ ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1728467002625-compressed.png) Like we discussed before, a development environment is where the development happens. It’s generally local, but some companies that work on sensitive information may want to setup their dev environments on a virtual server where everyone can directly code and compile on the server.  This is a workspace for developers to design, code, debug, and test software without affecting the live or production environment.  ### **What does a development environment include:** A typical development environment includes: * An Integrated Development Environment (IDE) like Visual Studio Code or IntelliJ IDEA * Version control systems such as Git for tracking changes * Local servers and databases for running and testing the application * Containerization technologies like Docker for consistent and scalable setups * Cloud-based platforms like AWS or Azure for flexible and collaborative environments These components have to work seamlessly and be perfectly organized to be effective for the developers to code on. That means, for local setups, providing good-quality hardware to the developers, and for virtual setups, hosting higher-tier servers with more resources to handle multiple developers working on the same IDE/workspace at the same time.  **What is a test environment?** ------------------------------- ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1728467003925-compressed.png) A test environment is the next stage in the development phase. Once a developer has successfully implemented a feature in the dev environment, compiled and tested it locally, it needs to be QA’d by tester on the team. So, the developer clicks a button on Gitlab or BitBucket or pushes code on GitHub to the test branch.  Some part of the compilation testing is also done automatically, preventing any base-level errors from entering the testing queue.  Once it hits the test environment, someone on the QA team gets assigned to the PR and will perform thorough testing on the implemented feature.  ### **What’s included in the test environment?** A test environment typically includes: * Software/servers configured to replicate the production environment * Test data and test beds for specific scenarios * Automated testing tools like Selenium or JUnit with scripts ready to perform the required tests * Continuous Integration/Continuous Deployment (CI/CD) pipelines for automated testing * Network configurations and devices for comprehensive testing The more thorough you can make this process, the better it is for your overall development pipelines. For instance, if you layer the test environment with three levels of review, you reduce the chances of finding bugs in staging or production.  Of course, this may be difficult for smaller companies and startups, but it certainly can be quite efficient once implemented.  **How do you create development and test environments?** -------------------------------------------------------- Most companies you work with will already have dev environments with a proper tech stack handling the CI/CD pipelines and deployments.  But if you join in as a founding engineer or have to implement these pipelines for a new project in your company, this knowledge will come in handy.  Here, I’ll be using [Facets](https://facets.cloud/) to create dev environments quickly and easily. Facets is a no-code infrastructure design and automation platform that helps you set up environments across different cloud providers through its simple, friendly interface.  You design your application architecture, configure resources, and deploy environments all in one place. Here are the steps: 1. Once you’ve logged in, go to the dashboard and [select from the Blueprint templates](https://readme.facets.cloud/v1.2/docs/create-your-first-blueprint) for your application. [Blueprints](https://readme.facets.cloud/v1.2/docs/blueprint) serve as pre-configured templates that define your environment's structure and components. 2. Environments tab from your selected Blueprint and click the prompt to create your first environment. 3. Next, pick your preferred cloud provider, such as AWS, Azure, or Google Cloud. Facets is cloud-agnostic and writes Terraform behind the scenes so you can easily work with multiple cloud environments and also transition from one cloud provider to another without spending months on the transition. 4. Give your environment a unique name. Select your desired release stream and region to ensure proper setup. 5. Decide whether to provision new infrastructure or use existing resources. If you opt for existing infrastructure, specify the base environment and namespace. 6. [Connect your cloud account to the Facets](https://readme.facets.cloud/v1.2/docs/integrating-cloud-accounts) control plane or choose a pre-configured cloud account which allows Facets to manage resources on your behalf. 7. Fine-tune your environment by setting advanced options like time zone, environment type, CIDR range, availability zone, instance types, VPC ID, and request limit ratio. Once all the above is configured, click "Create" to start the environment creation process.  Facets.cloud will provision the necessary resources and set up your development environment based on your configuration. ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1728467005318-compressed.png) ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1728467006677-compressed.png) Facets streamlines environment creation and ensures consistency across your development, testing, and production environments.  The best part is that Facets makes it simple not only to create and manage environments but also to have different environments on different cloud providers and work between them if required.  **There are a few additional benefits that Facets offers:** 1. Facets helps you easily track changes and roll back to previous configurations when needed with version-controlled environments. 2. You can work together with team members on environment configurations, share templates, and collaborate seamlessly. 3. With Facets, you can monitor and optimize your cloud spending across different environments with Facets.cloud's cost insights. 4. It also helps you scale your environments up or down based on your application's needs, ensuring optimal resource use. You can significantly reduce the time and effort required to create and manage development environments by incorporating Facets.cloud into your workflow.  This allows your team to focus on writing quality code and delivering features faster, boosting productivity and accelerating your software development lifecycle. ### **Wrapping up** You've now explored development and test environments, recognizing their role in software development. These stages help you and your team transform ideas into polished products. After all, creating software is more than just writing code. You have to create experiences that delight users and retain them over time. And for that, you have to understand and optimize each environment—development, test, staging, and production—to enable smooth transitions from idea to production.  Now, managing these environments can be quite overwhelming, especially if you add the complexity of different teams, different cloud environments, and more.  But [Facets](https://facets.cloud/quick-cloud-deployments/) simplifies the setup and integration with minimal effort.  You can visualize your architecture, launch pre-configured environments, and integrate them as your code progresses. All while optimizing costs and maintaining consistency across deployments. [Book a demo with Facets](https://www.facets.cloud/demo) today and see how you can transform your development processes into an efficient system for your entire team’s success. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## What are Self-Service Infrastructure Management Platforms? Needs & Benefits... Author: Anshul Sao Published: 2024-10-02 Category: Blogs Meta Title: What are Self-Service Infrastructure Management Platforms? Needs & Benefits... Meta Description: This article will explore self-service infrastructure for developers and application owners. We'll see how it works, why it's becoming essential, and how it can accelerate and simplify your workflows. Tags: devops networking event, self service infrastructure, developer autonomy URL: https://blog.facets.cloud/what-are-self-service-infrastructure-management-platforms Imagine deploying new applications in minutes. Controlling infrastructure to meet your needs on demand. Provisioning resources as quickly as ordering from a menu—no waiting for Ops teams. That’s the promise of self-service infrastructure.  It puts control back in the hands of developers, removing the complexity of traditional infrastructure management.  Your teams can spin up and manage the required resources with self-service infrastructure through a simple interface. No more tickets or delays. No more oversized or underutilized resources. Just the right-sized infrastructure, when and how you need it. This article will explore self-service infrastructure for developers and application owners. We'll see how it works, why it's becoming essential, and how it can accelerate and simplify your workflows.  But before we jump to that, let’s answer a pressing question — what are the challenges with a traditional infrastructure model? The Challenges with Traditional Infrastructure ---------------------------------------------- ![a logical map highlighting the challenges of infrastructure management](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/6527c38fc8ae9155077aebddfti8vxgphq-ak8kxzsmqtqzcphchfjpcgqjftlc0d5fsau4agd9hghb70ei1qcy5eaqharklf0bhh-etmu4bjndagugc4ankia6deb68jmxadg93iw0fdzo5tkbegdckdqmdng-ddorimj4-xm-1701865735252-original.png) Even with the move to the cloud, retaining legacy approaches to infrastructure management can add to [developer toil](https://tanzu.vmware.com/developer/learningpaths/developer-toil/).  Cloud resources that are provisioned and managed solely by Ops teams instead of being [self-service](https://www.facets.cloud/developer-self-service) can lead to a few persistent pain points: over-dependence on Ops. ### 1\. Complexity and Delays Cloud environments with discrete tooling and processes still add delays to Dev workflows. Developers must wait for Ops to manually configure and update cloud servers, databases, networks, storage, and policies.  This slows down deployments, feature releases, and provisioning. It also hampers developer velocity and slows time-to-market. ### 2\. Dependence on Ops Dependence on Ops teams to handle infrastructure also leads to bottlenecks. Developers must get the cloud resources they need with self-service access and APIs.  Even worse, the lack of cloud automation reduces cross-functional collaboration between Dev and Ops, making the maintenance of security, compliance, and governance controls more difficult. ### 3\. Challenges in Scaling Apart from the dependence on Ops, managing discrete cloud components in unintegrated ways can increase costs and limit the scalability of your processes.  This disparity makes it hard to scale your products and infrastructure with rapidly changing business conditions. If your organization is dealing with this, it might be time to think about a new way of doing things. By simplifying and using modern tools, you can make everything flow more smoothly, just like clearing the road so you can drive straight to your destination. What is Self-Service Infrastructure, and How Does It Help? ---------------------------------------------------------- ![a mind map of what exactly is self service infrastructure and how does it help the devops professionals](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/6527c38f742b8bf1e7901fd1g5wa1v2i36wo4iklytohyw4xqk3ybvnbq1taihvkiik3c8ginplujyjilq2ty7lsgardojxsevfx3ea848nv2v0v-7beirenvvtbxvatfbi1vkl0gpylyaayvcgyc1m0dy17pel2uhttdk8feesum6w-1701865736334-original.png) Self-service infrastructure puts infrastructure management directly into your developers’ hands. Instead of relying on Ops teams, developers can independently provision and manage infrastructure resources as required through pre-defined templates and Ops guardrails. ### **Self-service makes development more streamlined by eliminating bottlenecks** The self-service model enables greater efficiency, agility, and autonomy. Organizations can respond faster to changing market dynamics and customer needs. Infrastructure management becomes more straightforward and responsive when shifting from fragmented traditional systems to a more unified self-service approach.  And to help with this transition, platforms like [Facets.Cloud](https://www.facets.cloud/developer-self-service), AWS Service Catalog, Google Cloud Deployment Manager, and Azure DevOps make adopting these new self-service processes easy.  Benefits of Self-Service Infrastructure --------------------------------------- We want developers to be productive and efficient. And self-service infrastructure helps make that happen. Let’s take a look at the benefits of implementing self-service infra in your organization:  ### Organizational Autonomy and Efficiency ![Representation of Facet's version control to achieve developer autonomy and efficiency ](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/6527c38fcc216fbbb81fa487hcshiewkvqrb8iarkk5okps7svjvyizjgphsg428x-wbtewgpipxwgqb8yxf9raovfq8eyrtgfarp0mbsbaojvad1agcdep8pnz9ctyobyd5rrrnbhninxepbio88rfbuie1e8go1wmnr3pns6wak-1701865736862-original.png) Self-service infrastructure platforms improve development team autonomy—no more waiting for resources. Dev teams can allocate or modify resources based on the preset conditions in the [single source of truth](https://readme.facets.cloud/docs/blueprint) (SSOT) platform. Knowledge sharing also improves, and critical information is not confined to a few individuals.  ### Cost Optimization and Technology Standardization With self-service platforms like [Facets](https://facets.cloud/), your [cloud costs](https://blog.facets.cloud/cloud-cost-optimization-efficiency-by-design/) are optimized across the organization from Day 1. Standardizing the tech stack company-wide can help your organization [avoid drift and confusion](https://www.facets.cloud/blog/a-comprehensive-approach-to-maintaining-a-drift-free-infrastructure) from redundant or unsupported technologies within different teams.  But there’s another benefit to company-wide adoption.  You extract more value from the tools using their enterprise licenses for company-wide use. Additionally, specific budgets for the agencies approved within your unified stack so you know exactly how much you will pay each month. ### Enhanced Security and Control Self-service solutions offer organizations multiple ways to boost security and maintain tight control. When you standardize the hardened templates across your organization, you ensure the infrastructure is secured — the approved tools and toolchains are tested to work well within the organizational tech setup with minimal modifications. Vulnerabilities are mitigated by default since every approved image or tool has passed various security checks before deployment. As your systems become more standardized, you gain greater control over security and the tools used in the organization.  ### Enforcing Best Practices Best practices are standardized policies that all teams within an organization must follow. With a shift to platform engineering, these best practices are [built by-design instead of by-audit](https://blog.facets.cloud/shifting-the-devops-paradigm-from-by-audit-to-by-design/).  The platform team defines these best practices based on industry standards and internal governance.  They cover areas like: * **Infrastructure security** - Mandating secure templates, authentication, encryption * **Resiliency** - Replication, backup, disaster recovery * **Compliance** - Adhering to regulations like HIPAA, PCI, GDPR * **Efficiency** - Leveraging automation, infrastructure-as-code * **Cost optimization** - Right-sizing, tagging, usage monitoring * **Operational excellence** - Monitoring, alerting, documentation Teams build better systems following proven guidelines. Knowledge is shared instead of siloed. Adherence ensures smooth operations and auditing. ### Scalability and Reusability Self-service infrastructure platforms streamline provisioning and management by making the infrastructure declarative and templating it for reusability. Solutions like Facets use a Blueprint-driven approach to capture the interdependencies between services, acting as a Single Source of Truth. This allows developers to spend less time provisioning and managing resources. And because of the reusable nature of templates, any developer on the team can use them to manage resources as per their requirements.  This not only reduces the burden on the Ops teams but also has a positive impact on developer productivity. At [Facets](https://facets.cloud/), we’ve observed an [average 20% boost in productivity](https://facets.cloud/). This means fewer developers can do the same work more efficiently or maybe take up even more work.  Top 3 Best Self-Service Infrastructure Management Platforms 2023 ---------------------------------------------------------------- But these are all concepts — let’s look at some of the best tools that you can use to implement self-service infrastructure in your organization. ### 1\. Facets ![Facet's home page image](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/6527c393b6a85371c1814a81bnapwwwctzby2f4hokoixdmulmnkawbahig9mol9toglt7srwzcrxmhlnlryrggkhkg0ajd2y1fpi6n8pwlgamgrp5nwoq6gnpcieep6mw1k9mi6ppvnxa78rxbbm7ws1-xpe8pcn1sxsj6zkg6vngm-1701865738005-original.png) [Facets](https://facets.cloud) is a self-serve infrastructure management platform built on the philosophy of platform engineering principles. It integrates 32 categories of toolchains—helping you automate Ops and enable developers to self-serve their deployment needs. It enables developers to allocate resources, modify configurations, and more based on predefined rules and Ops guardrails.  **Key Benefits** * A single pane of glass for developers and Ops teams to collaborate efficiently * Automates provisioning and other manual tasks to speed up processes * Provides architecture visualization for complete visibility into services * Baked-in observability and security principles  * Standardizes workflows for consistency across teams * Designed to be extensible and reusable * Works with all major cloud providers - AWS, Google Cloud and Microsoft Azure. **Impact** * Increases developer productivity [by 20%](https://www.facets.cloud/case-study/capillary-technologies) * Reduces cloud costs by 28%  * Frees up [80% of Ops time](https://www.facets.cloud/case-study/treebo) from grunt work ### 2\. AWS Service Catalog Overview ![AWS Service Catalog Overview](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/6527c38f08829608b76496d4fsskhihj8jzx8hyhmucfxobfw3ttz347z-3vdw8smzkjhq-c4-go1a5mjeeymyoexh-p12ka9g7oiphidrxrotho4chw6tbn78eqero9h1tw8gasv6v5eic0-j5smgdvswyot6rvgodyxp2awb0sve-1701865739110-original.png) AWS Service catalog [AWS Service Catalog](https://aws.amazon.com/servicecatalog/) is an Amazon Web Services (AWS) cloud management and governance service for creating, sharing, organizing, and governing curated Infrastructure as Code (IaC) templates.  It allows centralized management of cloud resources to achieve governance at scale for IaC templates written in CloudFormation or Terraform. It also helps meet compliance requirements while ensuring quick deployment of necessary cloud resources. **Key Features** * Enables quick deployment of approved, self-service cloud resources, improving agility and governance across multiple accounts. * Integrates with ServiceNow and Jira Service Management to streamline workflows. * Provides automated access to SageMaker machine learning notebooks to speed innovation. * Allows scaling and controlling permissions for resource access in multi-account AWS environments. * Deploys baseline networking and security tools for new AWS accounts to ensure consistent governance. * Builds and governs scalable, automated CI/CD solutions to track all AWS application resources. **Use Cases** * Automating access to machine learning notebooks * Applying access controls across accounts * Provisioning resources for new AWS accounts * Accelerating CI/CD pipelines ### 3\. Google Cloud Deployment Manager ![Google Cloud Deployment Manager](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/6527c38f342cba81406eef6dwd0xsqajcyx1joq-6znkfusipywqnimu4joaifxtnbiehq9mf154qfi4snhfbfchwzdng7qzx37i9wlch4kdrqqshxch7vr9ltrsvr7bm6juxx0tbc2gv7mihh2xouuohyiy3sifuh3vex-npnq5gqo-1701865739711-original.png) Google Cloud Deployment Manager [Google Cloud Deployment Manager](https://cloud.google.com/deployment-manager/docs) is an infrastructure deployment service that automates creating and managing Google Cloud resources. **Key Features** * Automated creation and management of Google Cloud resources * Flexible templates and configurations for customized deployments * Integration with Google Cloud services like Cloud Storage, Compute Engine, Cloud SQL * Reusable templates for efficiency and consistency * Training and tutorials for learning Deployment Manager **Use Cases** * Deploying clusters on Compute Engine  * Migrating web apps and databases to Google Cloud * Following best practices for startups using Google Cloud Considering Self-Service Infrastructure for Your Organization? -------------------------------------------------------------- When time-to-market is everything, self-service is no longer a luxury but a necessity. You want your developers to do their best work with no bottlenecks. Like ordering from a menu, the proper infrastructure should be available on demand. > Self-service empowers developers to independently provision and manage resources, reducing delays and fostering innovation. Facets is a unified interface for developers and operations among the leading platforms. Facets helps visualize your infrastructure, automate environment provisioning, standardized workflow and simplify platform engineering for complex cloud environments. The future of infrastructure management is here. It's time to unlock the power of your developers and embrace the change.  [Book a demo with Facets](https://www.facets.cloud/demo) **_and take the next step towards streamlining your organization._** --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Kubernetes CI/CD Pipelines Explained—Tools & Best Practices Author: Ayush Verma Published: 2024-10-01 Category: Blogs Meta Title: Kubernetes CI/CD Pipelines Explained - Tools & Best Practices | Facets Meta Description: Kubernetes CI/CD pipelines guide. Tools, best practices, setup, deployment, and optimization. Streamline Kubernetes deployments with top CI/CD tools. Tags: Kubernetes, Cloud Automation URL: https://blog.facets.cloud/kubernetes-cicd-explained Think of a symphony orchestra getting ready for a big show. The conductor lifts their baton, and immediately, the musicians start playing together perfectly. Every instrument brings to life a smooth, beautiful piece of music, even though they're all different. Kubernetes CI/CD pipelines operate similarly.  They orchestrate various tools and processes to streamline software delivery. Like a conductor guiding musicians, Kubernetes guides the pushing of code changes, testing, and deployment. Over [60% of organizations](https://www.statista.com/statistics/1233945/kubernetes-adoption-level-organization/) now use Kubernetes, showing how well it manages constant code updates, tests, and deployments. Effective CI/CD pipelines also help automate everything from code commit to production release. They catch issues early, ensure high quality, and speed up delivery—resulting in [25% faster lead times and 50% fewer failures](https://www.antino.com/blog/what-is-ci-cd) compared to organizations without CI/CD practices. If you're new to continuous integration with Kubernetes, this article is for you. We'll learn how Kubernetes brings them all the tools in your toolchain together to create a fast, reliable software delivery pipeline. Let’s get started! Key Components of a Kubernetes CI/CD Pipeline --------------------------------------------- A Kubernetes CI/CD pipeline automates software delivery using several vital components. Let's examine how Kubernetes pipeline components work together. ### Containers and Container Registries Containers power Kubernetes CI/CD pipelines, with the [](https://lionkube.com/kubernetes/interesting-stats-and-quotes-about-kubernetes-devops-and-bazel/)market for application container technologies projected to grow [from $4.95 billion in 2023 to $48.44 billion in 2031](https://www.skyquestt.com/report/application-container-market), a CAGR of 33%. As stated by [OpenSource](https://opensource.com/article/18/2/how-kubernetes-became-solution-migrating-legacy-applications), "What makes Kubernetes so incredible is its implementation of Google's own experience with Borg. Nothing beats the scale of Google. Borg launches more than 2-billion containers per week, an average of 3,300 per second. At its peak, it's many, many more. Kubernetes was born in a cauldron of fire, battle-tested and ready for massive workloads". They pack apps and dependencies portably, keeping things consistent across environments. Docker's made containers easy to use, so more developers build and deploy with them.  Container registries hold and organize images. Developers send container images to these central stores, then use them for deployment. You'll find options like Docker Hub, Google Container Registry, and Amazon ECR.  ### Configuration Management and Version Control Systems (VCS) Developers tackle Kubernetes configuration challenges using tools like Helm and Kustomize. These tools streamline app configuration creation and management across multiple environments and applications. Git is the backbone of CI/CD pipelines, with [87% of developers](https://www.jetbrains.com/lp/devecosystem-2023/devtools/) regularly using it as their version control system.  They store code and configuration files centrally, fostering team collaboration. When developers push changes, these systems trigger CI/CD pipelines, kicking off build, test, and deployment processes. ### Security Testing and Monitoring Actively test and monitor security to strengthen your Kubernetes CI/CD pipeline. You'll catch vulnerabilities sooner by merging security testing into your process.  And you can also scan container images using tools like [Trivy](https://github.com/aquasecurity/trivy) or [Clair](https://github.com/quay/clair). This could help block risky images before they reach production.  To keep a close eye on your Kubernetes apps, consider using Prometheus and Grafana to boost your application health and performance too. Many developers are moving in this direction—over [50% now](https://www.jetbrains.com/lp/devecosystem-2023/devtools/) run their main CI/CD tools on cloud instances they or their company manage. Setting Up Your First Kubernetes CI/CD Pipeline ----------------------------------------------- To set up your first Kubernetes CI/CD pipeline, we'll walk you through the process and cover the key components. Begin by installing these tools: * Kubernetes Cluster: Try Minikube or Kind for testing. For production, you might want to check GKE, EKS, or AKS. * Container Runtime: Docker is a solid choice, but don't overlook containerd or CRI-O. * CI/CD Tool: Give Jenkins, GitLab CI/CD, or Azure DevOps a try as your CI/CD tool. Pick one that clicks with your workflow. * Config Management: You'll love how Helm and Kustomize streamline your Kubernetes setups. You could also consider using Facets as  ### Create and Manage Kubernetes Clusters Set up the control plane and worker nodes to create your Kubernetes cluster. The control plane manages the cluster's state, while worker nodes run applications. GKE, EKS, and AKS simplify cluster creation and management by handling infrastructure complexities.  You can use kubectl to interact with your running cluster. This command-line tool lets you deploy apps, manage resources, and monitor your cluster's state.  Alternatively, you could try [Facets](https://facets.cloud/)—a no-code infrastructure automation tool that writes all the terraform code for you, while you can simply drag and drop customized modules to build your automation setups.  Optimize Your Kubernetes CI/CD Pipelines ---------------------------------------- You can boost your [Kubernetes CI/CD pipelines](https://www.facets.cloud/open-source-tools-categories/ci-cd)’ effectiveness and reliability by following proven practices. Consider these approaches to enhance your pipeline performance: ### Implement GitOps for Deployment Control GitOps streamlines Kubernetes deployment management using Git as the primary source of truth, with [64% of developers](https://www.jetbrains.com/lp/devecosystem-2023/team-tools/#ci_tools) preferring to work with version control systems directly from their IDE in 2023. ** ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723010679575-compressed.png) ** Start by defining your cluster's desired state in Git repositories and make all changes through repository updates. This method offers version control, auditability, and easy rollbacks. Tools like [Argo CD](https://www.facets.cloud/open-source-tools/argocd) and [Flux](https://www.facets.cloud/open-source-tools/fluxcd-flux2) apply the GitOps pattern, syncing your Kubernetes cluster with the Git-defined state. You'll gain better control over your deployments and simplify your workflow. ### Roll Back Fast When Things Go Wrong You also need to prepare for production issues, even after thorough testing. Quick recovery from failures cuts downtime. You can always use [Kubernetes' built-in rollback](https://kubernetes.io/docs/tasks/manage-daemon/rollback-daemon-set/) features through deployment objects to get your applications back to working condition in case of failures.. Also, remember to version your container images. Apply deployment strategies like rolling updates or blue/green deployments. You'll easily revert to previous versions when needed. Define your rollback process clearly. Test it often to keep it working well. ### Secure Your Setup Make security a core part of your Kubernetes CI/CD pipeline. Here are a few security measures to guard your apps and data: * Image Scanning: Check your container images often for vulnerabilities with tools like Trivy or Clair. You'll catch and fix security issues before deploying to production. * [Role-Based Access Control (RBAC):](https://blog.facets.cloud/transform-your-devops-access-control-with-facets-rbac-management/) Apply Kubernetes RBAC to set and enforce access rules for cluster resources. This gives users and apps the right permissions while limiting access. * [Secrets Management](https://readme.facets.cloud/v1.2/docs/configuration-management-for-services): Keep sensitive data like API keys and database passwords in Kubernetes Secrets. Encrypt them at rest and restrict access based on need. Popular Tools for Kubernetes CI/CD ---------------------------------- You can use several tools to streamline and automate your Kubernetes CI/CD pipelines. Consider these popular options: ** ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723010681669-compressed.png) ** ### Jenkins and Jenkins X [Jenkins](https://www.facets.cloud/open-source-tools/jenkins-x) is the top CI tool used with Kubernetes ([54% adoption rate](https://www.jetbrains.com/lp/devecosystem-2023/team-tools/#ci_tools)). This open-source automation server powers many CI/CD pipelines. You can simply Install the Kubernetes plugin to deploy apps to your clusters. Want an even smoother experience? Jenkins X builds on Jenkins to offer a Kubernetes-native CI/CD solution. It sets up and manages pipelines with built-in GitOps, preview environments, and automated promotions. ### GitLab CI/CD This all-in-one DevOps platform integrates tightly with Kubernetes. Define and run pipelines right from your Git repos using simple YAML syntax. You'll also get auto-scaling runners, a container registry, and Kubernetes cluster management tools. ### Argo CD For declarative, GitOps-style continuous delivery on Kubernetes, try Argo CD. It follows the GitOps pattern, defining your apps' desired state in Git repos. Argo CD watches these repos and syncs your Kubernetes cluster with the defined state in Git. You'll find a web-based UI for visualizing and managing app deployments, plus features like automatic resource pruning and role-based access control. Challenges and Solutions in Kubernetes CI/CD -------------------------------------------- Kubernetes CI/CD pipelines bring great value, but also face hurdles. Let's look at some common problems and fixes. ### Scale and Manage Resources Effectively As you add more apps and pipelines, you'll need to manage resources and ensure scalability. Use Kubernetes' built-in tools like the [Horizontal Pod Autoscaler (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) to scale apps based on demand.  For CI/CD infrastructure scaling, you can also work with Kubernetes-native tools such as Jenkins X or managed CI/CD services. These options will help you handle the underlying infrastructure management more easily. ### Test Automatically and Monitor Continuously You must test automatically in your CI/CD pipeline. Testing in Kubernetes can be tricky due to its distributed nature. Design and implement comprehensive test suites covering various scenarios: unit tests, integration tests, and end-to-end tests. Keep a close eye on your apps running in Kubernetes. Use tools like Prometheus and Grafana for powerful monitoring and alerting. Collect metrics and logs from your Kubernetes clusters and apps to see how your system behaves. This approach will help you spot and fix issues before they become problems. Want to Simplify Kubernetes CI/CD for Better Software Delivery? --------------------------------------------------------------- Kubernetes CI/CD pipelines transform software delivery, but they can be complex to manage. Facets offers a solution, especially for teams with limited Ops expertise. This no-code platform lets you build quality infrastructure easily. Deploy apps quickly, avoiding infrastructure headaches. Facets.cloud handles Terraform generation, Kubernetes management, and works across major cloud providers. Create architecture blueprints visually, launch pre-configured environments fast, and streamline operations. Built on open-source tools, Facets gives you flexibility without vendor lock-in. Suitable for startups and enterprises, it scales with your business. Users report fewer issues, better uptime, and faster releases. [Try Facets free today](https://facets.cloud/).  Simplify your infrastructure management and focus on delivering great software. Stop wrestling with complexities and start shipping code faster with Facets. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Platform Engineering Meetup Like No Other Author: Facets.cloud Published: 2024-08-12 Category: Updates Tags: Developer experience, facets.cloud, platform engineering meetup, platform engineering, Loft Labs, devops experience URL: https://blog.facets.cloud/platform-engineering-meetup-loft-labs-facets-cloud ![Platform Meetup - Loft labs & facets.cloud](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/platform-engineering-meetup-preview-1724142256273-compressed.png) Recently, [Facets.Cloud](https://www.linkedin.com/company/facets-cloud/) co-partnered with [Loft Labs](https://www.linkedin.com/company/loft-sh/) to host an event that brought together 200+ DevOps enthusiasts, innovators, and tech enthusiasts under one roof– the **Platform Meetup** at the Facets.Cloud office, in HSR layout, Bengaluru, the heart of India's startup capital. If you couldn't attend, don't worry; this blog is your backstage pass to the exciting insights from the event. The day started off with our participants enjoying some delicious breakfast before getting ready to indulge in the speaker sessions and discussions around Platform Engineering. As the crowd settled in, [Hrittik Roy](https://www.linkedin.com/in/hrittikhere?miniProfileUrn=urn%3Ali%3Afs_miniProfile%3AACoAADFBchYBwDSgIGFCr9fZ3UfT4oPyvCQG3TI&lipi=urn%3Ali%3Apage%3Ad_flagship3_search_srp_all%3By41wTv37RbCwWPpTXPyNQg%3D%3D), Platform Advocate at Loft Labs, took the stage to kick off the event with a warm welcome and introduction to the event. Read ahead for more valuable insights from the meetup. Speaker Spotlight ----------------- ### Multitenancy in Kubernetes Era ![Saiyam Pathak, Principal DevRel, Loft Labs](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723465674598-compressed.png) The first speaker, [Saiyam Pathak](https://www.linkedin.com/in/saiyampathak?miniProfileUrn=urn%3Ali%3Afs_miniProfile%3AACoAAA2m_eoB-5RvYVA60tVysLlp0fluTsaa5Pc&lipi=urn%3Ali%3Apage%3Ad_flagship3_search_srp_all%3Bk52FXwLmSOacKib8qBYvmw%3D%3D), Principal Developer Advocate, Loft Labs, dove into the intricacies of Multitenancy in Kubernetes Era. His keynote presentation was a masterclass in cloud-native technologies, complete with a live vCluster demo. It sounds complicated, but he made it easy to understand. ### Role of Platform Engineering in Observability ** ![CTO & Co-Founder at SigNoz](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723465680648-compressed.png) ** CTO & Co-Founder at [SigNoz](https://www.linkedin.com/company/signozio/), [Ankit Nayan](https://www.linkedin.com/in/ankitnayan/), took the stage to explore the critical role of Platform Engineering in Observability. He presented example policies to enforce standardization across teams for correlating metrics, traces, and logs. He also delved into the challenges and strategies for managing the reliability and costs of self-hosted observability setups. ### Road to Platform Engineering ** ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/b64-1723465692697-compressed.png) ** Co-Founder and CTO, Facets.Cloud, [Anshul Sao](https://www.linkedin.com/in/anshul-sao-a132498/), continued the momentum with a practical guide on how to get started with platform engineering, offering actionable strategies for implementation. Anshul's talk bridged the gap between theory and practice, giving attendees a clear roadmap for their platform engineering journey. Anshul also showcased how cloud visualisation can be achieved through AI adding the AI flavour to the meetup. ### Lightning Sessions ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723465696774-compressed.png) The event featured a series of lightning sessions led by DevOps experts. [Seema Saharan](https://www.linkedin.com/in/seemasaharan/), a Site Reliability Engineer at Autodesk, shared insights on starting a Platform Engineering journey with the Crossplane project. [Harsh Thakur](https://www.linkedin.com/in/harsh-thakur-499096158/) introduced fresh perspectives on the future of unified package management in DevOps. [Vrushali Raut](https://www.linkedin.com/in/vrushali-raut-87bb298a/) wrapped up the lineup with strategies for overcoming deployment challenges from code to Kubernetes, ensuring smoother releases. Mixing and Mingling ------------------- The day wasn't all serious tech talk, though. A high-energy quiz session had participants competing for cool swag. Meanwhile, a Polaroid stand set up by Loft Labs became a hotspot for attendees eager to capture memories of the day. As the event wound down with pizzas and drinks, the conversations continued, ideas flowed, and new connections were forged. ** ![Platform Engineering meetup- Facets.cloud and Loft Labs](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723466043156-compressed.png) ** ​ DevOps Experience is the Key to Better Dev Experience​​ As the Platform Meetup demonstrated, enhancing the DevOps experience is not just beneficial—it's essential for fostering a better developer experience. By adopting Platform Engineering, organizations can transform their DevOps teams into product builders who design and deliver tools that empower developers. This shift allows developers to focus on innovation, free from the burden of manual tasks like environment creation. Ultimately, when DevOps teams are equipped with the right tools and mindset, they can offer developers a seamless and efficient experience, leading to greater productivity and success across the board. Platform Engineering is not just about building infrastructure; it’s about building a better experience for everyone involved in the development process.​ Here’s what the attendees had to say about the event ---------------------------------------------------- ![Platform Engineering meetup- Facets.cloud and Loft Labs](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723465745367-compressed.png) ![Platform Engineering meetup- Facets.cloud and Loft Labs](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723465750314-compressed.png) ![Platform Engineering meetup- Facets.cloud and Loft Labs](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723465754730-compressed.png) ![Platform Engineering meetup- Facets.cloud and Loft Labs](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723465759717-compressed.png) ![Platform Engineering meetup- Facets.cloud and Loft Labs](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723465765983-compressed.png) ![Platform Engineering meetup- Facets.cloud and Loft Labs](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723465774513-compressed.png) ![Platform Engineering meetup- Facets.cloud and Loft Labs](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723465780627-compressed.png) ![Platform Engineering meetup- Facets.cloud and Loft Labs](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723465787841-compressed.png) Wrapping up! ------------ This was more than just an event, it was a glimpse into the future of Platform Engineering. With the diverse range of topics covered by our esteemed speakers, one thing is clear: [Platform Engineering](https://blog.facets.cloud/handbook-to-platform-engineering-journey) is not just a trend, but a fundamental shift in how we approach technology infrastructure and development. It's an exciting time to be in tech, and events like these remind us of the power of community, knowledge sharing, and collective innovation. ** ![Platform Engineering meetup- Facets.cloud and Loft Labs](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1723466172805-compressed.png) ** Join Facets Community to be part of more such events. [Join Facets Community](https://join.slack.com/t/facetscommunity/shared_invite/zt-29jnodnbk-aDGrEHVk8glCnUU5n_UWOg)​ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Welcome to the Facets Blog! Author: Kirti Krishan Published: 2024-08-01 URL: https://blog.facets.cloud/tech-blog-clzb07slz002lxl5yxvv5xjkm Discover how Facets.cloud empowers DevOps teams and platform engineers with tools for self-serve environments, streamlined Kubernetes management, and custom module creation. Here, we share our experiential learnings to help you drive innovation and operational excellence in your organization. [Tech Blogs](https://blog.facets.cloud/posts/category/tech-blog/1/)[Product News](https://blog.facets.cloud/posts/category/product-news/1/) ​ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Facets.Cloud mentioned in the Gartner® Hype Cycle™ for Platform Engineering, 2024 report! Author: Pravanjan Choudhury Published: 2024-08-01 Category: Updates Meta Title: Facets.Cloud mentioned in the Gartner® Hype Cycle™ for Platform Engineering, 2024 report! Meta Description: Facets.Cloud mentioned in the Gartner® Hype Cycle™ for Platform Engineering, 2024 report. Learn how self-service environment management improves productivity, reduces cloud costs, and streamlines DevOps. Discover implementation strategies and real customer success stories. Tags: facets.cloud, platform engineering, Gartner, Gartner hype cycle URL: https://blog.facets.cloud/facets-cloud-in-gartner-hype-cycle-for-platform-engineering-2024 Why Self-Service Matters in Environment Management? --------------------------------------------------- Consider a development team developing a new approach to improve the performance of a couple of their systems. They wish to test their new architectural changes consisting of a branch and a new database/cache in a feature environment without affecting the regular testing.  Can they set it up without talking to the Platform team? How long will it take? How fast will the team be able to iterate configuration changes? How similar will the setup be to the other environments? Can they promote these changes to regular testing environments with complete confidence? And finally, how many such projects will the Platform team be able to support? Minimizing developers' wait time is one of the key reasons for promoting self-service environment management. The benefits, however, don’t stop there – > “There are three ways organizations can realize business value from self-service environment management tools. First, they help improve developer experience and minimize wait times on other teams, leading to overall business agility. Second, this approach helps codify governance policies as part of environment creation, addressing security, cost, and operations concerns upfront. Third, they lower the barrier for continuous testing of functional and non-functional requirements, leading to improved reliability.” – Says Manjunath Bhat and Bill Blosen in the [report](https://www.gartner.com/en/documents/5519995)​ How to achieve Self-service Environment Management -------------------------------------------------- Self-service Environment management requires Platform Teams to provide “Environment-as-a-service” for the Developers. Theoretically, one may believe that staying in the infrastructure automation will eventually morph into self-service. Practically, it doesn’t. Here is why - Different facets of an environment are split across automation code (Terraform), release management systems, observability tools, access management systems, and such. Centralizing all information accurately, mutating it over time, and using it to derive new environments requires special consideration for Platform engineers.  Here are 3 key considerations for the Platform Engineers: 1. Productize infrastructure automation so that developers can use it to manage the lifecycle of the environments 2. Provide a developer experience layer (UI or CLI-based) so it can be used effectively 3. Oversee the effectiveness through a governance framework How does Facets.Cloud help? --------------------------- Facets.Cloud provides a developer-operated, centrally governed platform for [self-service](https://www.facets.cloud/developer-self-service?_gl=1*1cel1k9*_gcl_au*MTkzMjkzOTIxNi4xNzE4MDAyNTI0*_ga*MTAyMzIwMzU2LjE3MDk1NDI4NDU.*_ga_N85M2TY0J1*MTcyMjQ4ODI5NC4zMjguMS4xNzIyNDg4MzUwLjQuMC4w) environment management. The Platform team adds capabilities in the form of automation to the platform but never stays in the critical path of environment lifecycle management. Developers, through the self-serve portal, manage their environments.  ### Key Features: ****Define**:** Centralize and mutate all aspects of environments (Blueprint) in a git repository. **Automate**: Link Facets automation pack or bring your existing automation. **Launch**: Create consistent environments of various types Dev, QA, Load Test, and Feature Test environments. Launch environments on any cloud. **Environment lifecycle management**: Manage environment lifecycles from a single interface. **Shift-left debugging**: Provide your developers Shift-left environment operations, e.g., using AI-driven Kubernetes management ​ **Customer Experiences** > “_There is a_ [video](https://www.youtube.com/watch?v=4GK1NDTWbkY) _from Spotify, that talks about aligned autonomy in an ideal engineering team. This means giving power to the teams to manage their entire software lifecycle from the get-go, where they control the development, deployment, performance, and infrastructure. But with this, alignment to the guardrails is also very important. One thing that really struck me about Facets was how beautifully it was fitting into this particular vision_." - ” said Suyash, CTO, Purplle.com. [Link](https://www.facets.cloud/case-study/purplle) to the case study.  Before Facets, The platform team at Purplle used to write automation suites but they were unable to make them available to the larger developer teams. Hence, this automation suite quickly turned into tribal knowledge. With scale, every team would struggle to take features to production in time due to the unavailability of environments.  Purplle now uses Facets for cost-effective developer-managed sandbox environments for accelerated testing. This has resulted in 25X faster go-lives of feature launches while spending 70% less on non-production environments.  [Talk to us](https://www.facets.cloud/start-free-trial) ​ Additional Resources Explore our resources to understand the benefits of self-serve environment management. **Blog Posts:** * ​[What are Self-Service Infrastructure Management Platforms?](https://blog.facets.cloud/what-are-self-service-infrastructure-management-platforms/)​ * ​[What is a Developer Self-Service Platform and Why Does it Matter?](https://blog.facets.cloud/what-is-a-developer-self-service-platform/)​ **Case Studies:** * [Case Studies](https://www.facets.cloud/case-study) **Documentation:** * [Documentation](https://www.facets.cloud/documentation) ​**Gartner Disclaimer**​ Gartner, Hype Cycle for Platform Engineering, 2024, By Manjunath Bhat, Bill Blosen, 19 June 2024 GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and HYPE CYCLE is a registered trademark of Gartner, Inc. and/or its affiliates and are used herein with permission. All rights reserved. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## A Beginner's Guide to Infrastructure as Code (IaC) with Terraform Author: Pravanjan Choudhury Published: 2024-07-22 Category: Blogs Meta Title: Go-to Guide: Infrastructure as Code with Terraform Meta Description: Learn the basics of Infrastructure as Code (IaC) with Terraform in this beginner-friendly guide. Explore setup, key concepts, and best practices for efficient infrastructure management Tags: Infrastructure as Code (IaC), Terraform URL: https://blog.facets.cloud/guide-iac-with-terraform When I first started to code, I remember staring at my computer screen, fingers hovering over the keyboard as I had to update a load balancer configuration—a seemingly simple change to the infrastructure.  But even so, I was anxious. One mistake or typo could take down our system.  Managing infrastructure shouldn't feel so risky. There had to be a better way.  Infrastructure as Code (IaC), with tools like Terraform, changed this.  IaC allows you to manage and set up infrastructure using machine-readable files instead of touching hardware or using configuration tools. You write code to automate the process and this code replaces manual infrastructure setup. In fact, the [2023 State of DevOps report](https://services.google.com/fh/files/misc/2023_final_report_sodr.pdf) found that organizations using IaC, achieve 30% higher organizational performance. And 68% of respondents saw faster development cycles after adopting platform engineering practices, which often include IaC. Now, you might be reading this because you've had a similar experience as I did. Maybe you were tired of manual configuration errors. Perhaps your company is growing, and you want to scale your infrastructure management. This guide will help you. We'll explore the basics of Infrastructure as Code(IaC) with Terraform and help you gain a solid conceptual understanding. But first, what exactly is Terraform and IaC? What Is IaC and Terraform? -------------------------- Infrastructure as Code (IaC) is a dramatic shift in how developers manage and provision computing resources. IaC treats infrastructure configuration like software, so instead of manually setting up physical servers, networks, and other IT components, you define the setup in code. This code becomes a single source of truth for your infrastructure, helping you automate deployments, ensure consistency across the board, and scale your setup easily with no infrastructure drift. But then, what’s Terraform and how does it relate to IaC? Terraform is an open-source tool created by HashiCorp that brings IaC to life. It is a declarative language  that helps you define your infrastructure as code.  You specify what you want—like a cluster of web servers and a load balancer—and Terraform figures out how to make it happen, whether you are using AWS, Azure, Google Cloud, or a mix of providers. Terraform understands the current state of your infrastructure and makes only the necessary changes to achieve your desired state. This means you can update your infrastructure over time, adding new components or modifying existing ones.  Is there any benefit to using Terrform? Benefits of Using Terraform for IaC ----------------------------------- Terraform is like a Swiss Army knife for infrastructure management. Its versatility and ease of use have made it the go-to language for DevOps teams. Here are some of the key advantages of using Terraform: 1. Development velocity: Platform engineering practices, especially IaC with Terraform, accelerate development speed. Organizations using platform engineering for over three years see even greater improvements. [53% of these teams](https://services.google.com/fh/files/misc/2023_final_report_sodr.pdf) report that the speed has improved "a great deal” compared to only 35% of newer adopters. 2. Improved security: Terraform integrates security directly into the infrastructure from the outset. Over [55% of users](https://www.puppet.com/system/files/2020-State-of-DevOps-Report.pdf) confirm that IaC practices strengthen overall infrastructure security. Also, with full security integration using Terraform, you canremediate critical vulnerabilities within a day [45% of the time](https://www.puppet.com/system/files/2020-State-of-DevOps-Report.pdf), compared to only 25% for those with low integration. 3. Better efficiency: [59% of respondents](https://www.puppet.com/system/files/2020-State-of-DevOps-Report.pdf) stated improved efficiency and productivity as a direct benefit of platform engineering practices, which often incorporate IaC tools like Terraform. This aligns with Terraform's ability to automate repetitive tasks and streamline infrastructure management. 4. Multi-cloud deployment: Platform teams want to go cloud-native with [40%](https://www.puppet.com/system/files/2020-State-of-DevOps-Report.pdf) citing it as their key goal. And managing infrastructure across multiple cloud providers is easy with Terraform. Public cloud infrastructure increases flexibility by [22%](https://services.google.com/fh/files/misc/2023_final_report_sodr.pdf) compared to localized setups. Terraform's multi-cloud capabilities help organizations maximize this advantage across different providers. 5. Support for self-service capabilities: Terraform aligns well with the trend towards self-service platforms. [Highly evolved DevOps organizations](https://www.puppet.com/system/files/2020-State-of-DevOps-Report.pdf) tend to offer a wide variety of self-service capabilities, including infrastructure provisioning and Terraform becomes the foundation of that setup. 6. Standardization and reduced duplication: Terraform's modular approach and reusable configurations contribute to increased standardization. [53% of developers](https://www.puppet.com/system/files/report-puppet-sodor-2023-platform-engineering.pdf) state that standardization reduced duplication of work in the workplaces.  7. State management and collaboration: Terraform's state management capabilities support the collaborative nature of modern DevOps practices. [63% of organizations](https://www.puppet.com/system/files/2020-State-of-DevOps-Report.pdf) have at least one [self-service internal platform](https://blog.facets.cloud/what-is-a-developer-self-service-platform/), which often relies on tools like Terraform for infrastructure management. These examples highlight just a few of Terraform's many benefits. As you begin using it, you will discover even more ways it can optimize your infrastructure management processes. Getting Started with Terraform ------------------------------ Let’s now jump into the steps to setup Terraform so you can get started implementing it.  ### Installing Terraform ### ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1721628403711-compressed.png) You need Terraform on your computer before you can use it. Downloading Terraform is easy because it lives in a single file. 1. Go to the official Terraform website ([https://www.terraform.io/downloads.html](https://www.terraform.io/downloads.html)) and get the right package. Make sure it works with your operating system. 2. Download the package, then extract its contents and put them in a directory of your choice 3. You will want to access Terraform easily. Add the directory with the Terraform contents to your system's PATH. This lets you run Terraform from anywhere using your terminal. 4. Open your terminal. Type "terraform." If you were successful, you will see Terraform's help information. That’s it, Terraform is now installed on your computer. ### Setting Up Your First Terraform Project With Terraform installed, you can begin your first project. Here's how to set up a basic Terraform configuration: 1. Make a new folder for your project. 2. Create a new file with a ".tf" extension inside the folder (for example, "main.tf"). This file will hold your Terraform code. Let's make a simple configuration that creates an AWS EC2 instance: provider "aws" { region = "us-west-2" } resource "aws\_instance" "example" { ami = "ami-0c55b159cbfafe1f0" instance\_type = "t2.micro" } 1. Go to your project directory in your terminal and run "terraform init". This prepares your Terraform files and downloads the necessary plugins. 2. Run "terraform plan" to view the changes Terraform will make. This command only shows you the potential changes; it doesn't apply them yet. 3. If the plan looks correct, run "terraform apply" to create the resources defined in your configuration. Terraform will show you the plan again and ask for your confirmation. Congratulations! You used Terraform to provision your first resource. While this example is basic, it shows the fundamental workflow of using Terraform. Understanding Terraform Basics ------------------------------ Let’s now explore the fundamentals of Terraform one by one—starting with what configuration files actually are. ### Configuration Files Terraform configurations are written in HashiCorp Configuration Language (HCL) and describe exactly how you want your infrastructure to look. They detail the resources you want, their settings, and how they connect. A typical Terraform configuration usually consists of one or more .tf files within a directory. When you use Terraform commands, it reads all these files. It then builds a dependency graph based on the resources you've defined. Here’s a simple example of a Terraform configuration file: provider "aws" { region = "us-west-2" } resource "aws\_instance" "facets\_default\_server" { ami = "ami-0c55b159cbfafe1f0" instance\_type = "t2.micro" } This tells Terraform to use the AWS provider. It sets the default region to "us-west-2" and creates an AWS EC2 instance. This instance uses the specified AMI and instance type. Terraform configurations are declarative so you describe the desired end state and Terraform figures out how to make it happen. This differs from imperative programming where you specify every step. This declarative approach offers several advantages: * Idempotency: Applying the same configuration multiple times always yields the same result. This makes your infrastructure predictable and reduces inconsistencies over time. * Parallelization: Terraform understands the end goal. This allows it to determine which resources it can create, modify, or destroy concurrently. The result is faster infrastructure operations. * Dry runs: You can use the terraform plan command to preview changes before applying them. This helps you catch potential problems early. Terraform configurations can include several block types: * Provider blocks: These define which providers your configuration will use (e.g., AWS, Azure, GCP). You also specify any global settings for those providers here. * Resource blocks: These define the individual pieces of your infrastructure, like EC2 instances, VPCs, or DNS records. * Data blocks: Use these to fetch information from providers for use in your configuration. An example is querying for the latest AMI. * Variable blocks: These let you parameterize your configuration, making it more reusable and adaptable. * Output blocks: Use these to export values from your configuration. Other Terraform configurations or external tools can then use them. As your infrastructure becomes more complex, you'll likely use multiple .tf files which helps organize different parts of your configuration. Terraform will automatically read and combine all these files into a single configuration later. ### State Management Terraform keeps track of the resources it creates in a state file (usually terraform.tfstate). Think of this file as the single source of truth for your infrastructure.  When you run terraform apply, Terraform follows these steps: 1. It reads the current state file to understand existing resources. 2. It compares the current state to the desired state you defined in your configuration files. 3. It figures out what changes are needed to reach the desired state. 4. It executes those changes and updates the state file along the way. This ensures Terraform always knows the current state of your infrastructure and it can then make precise, incremental changes as needed. The state file is necessary for Terraform to work correctly so make sure that you store it safely and take backups at regular intervals in case of a loss..  Terraform supports various backends: * S3: Stores the state file in an Amazon S3 bucket. * Azure Blob Storage: Stores the state file in an Azure Blob Storage container. * GCS: Stores the state file in a Google Cloud Storage bucket. * Terraform Cloud: HashiCorp's managed service for storing state files and running Terraform operations. Here's how to configure an S3 backend in Terraform: terraform {   backend "s3" {     bucket = "my-terraform-state"     key    = "directory-path/to/the/key"     region = "us-east-1"   } } With this configuration, Terraform will store its state file in the “my-terraform-state” S3 bucket, under the key “directory-path/to/the/key”. ### Terraform Plan and Apply Lifecycle The terraform plan and terraform apply commands let you preview and execute changes to your infrastructure safely and predictably. #### Terraform Plan Unless you disable it, Terraform first refreshes its understanding of the current state when you run terraform plan. It then determines the necessary actions to achieve the desired state you defined in your configuration files. Do note that this command doesn't make any actual changes. It simply shows you what Terraform will do. You can review the plan before anything happens, helping you spot potential errors or unwanted modifications. Here’s an example output from terraform plan: $ terraform plan  # aws\_instance.example will be created \+ resource "aws\_instance" "example" {   + ami = "ami-0c55b159cbfafe1f0"   + arn = (known after apply)   + associate\_public\_ip\_address = (known after apply)   + availability\_zone = (known after apply)   ... } Plan: 1 to add, 0 to change, 0 to destroy. This plan shows that Terraform will create a new AWS EC2 instance with the specified AMI. The + indicates this is a new resource. #### Terraform Apply Once you've reviewed the plan and are happy with the proposed changes, use the terraform apply command. This command executes the plan and applies the changes. By default, terraform apply asks you to confirm you want to proceed. You can skip this prompt using the --auto-approve flag. Just remember to be absolutely sure about the outcome before using this flag. Here's an example of applying a Terraform configuration: $ terraform apply aws\_instance.example: Creating... aws\_instance.example: Still creating... \[10s elapsed\] aws\_instance.example: Creation complete after 15s \[id=i-01234as39a123df0\] Apply complete! Resources: 1 added, 0 changed, 0 destroyed. After the apply completes, your infrastructure will match the state described in your configuration files. One of the key benefits of Terraform's plan and apply lifecycle is its focus on safe and predictable changes. Reviewing the plan before applying helps you catch potential issues early as Terraform tracks your infrastructure's state, making only the necessary modifications. This contrasts with some other infrastructure management tools. They might require you to specify your entire infrastructure every time, leading to potential inconsistencies over time. These concepts—configuration files, state management, and the plan and apply lifecycle—form the foundation of working with Terraform. Everything you build will be based on these basics. However, the core workflow remains constant: * Write declarative configuration files. * Let Terraform manage the state. * Use plan and apply to manage your infrastructure safely and predictably over time. How to Use Terraform Modules? ----------------------------- Terraform modules are a powerful way to organize and reuse your infrastructure code. Let's explore how to create and implement these valuable components.  ### Creating and Using Modules As your infrastructure becomes more complex, managing it with a single Terraform configuration can feel overwhelming. This is where modules simplify things.  A Terraform module is essentially a set of Terraform configuration files grouped within a single directory.  These modules become your building blocks: they let you create reusable components, organize your code better, and manage parts of your infrastructure as self-contained units. Think of a typical module structure like this.  ├── main.tf ├── variables.tf ├── outputs.tf └── modules/     ├── vpc/     │   ├── main.tf     │   ├── variables.tf     │   └── outputs.tf     ├── ec2/     │   ├── main.tf     │   ├── variables.tf     │   └── outputs.tf     └── rds/         ├── main.tf         ├── variables.tf         └── outputs.tf Don’t worry if you don’t fully understand this right now. Just look at it as a sample directory structure of how modules and terraform configuration files are stored. In this setup, you have a root module defined by the main.tf, variables.tf, and outputs.tf files in the main directory. The modules directory houses subdirectories for each of your infrastructure components, like VPC, EC2, and RDS. Each of these subdirectories represents a separate module. You can use a module within another Terraform configuration. Use a module block to do this: module "vpc" {   source = "./modules/vpc"     # Other module configuration } The source argument points to the module's location. Here, it's a relative path on the same machine, but it could also be a remote source like a Git repository or the Terraform Registry. ### Benefits of Modules Modules offer several advantages for your Terraform configurations: * Reusability: Modules let you package and reuse common configurations across your projects. This saves you time and minimizes repetitive code. * Encapsulation: They hide complexity by allowing you to manage related resources as a single unit. * Organization: Breaking down your configuration into modules makes your code cleaner and easier to understand. * Versioning: You can version your modules. This lets you safely update and iterate on your infrastructure over time. * Sharing: Modules can be shared within your team or publicly. This fosters collaboration and helps standardize infrastructure deployments. The Terraform Registry ([https://registry.terraform.io/](https://registry.terraform.io/)) offers a vast collection of pre-built modules for common infrastructure elements and these existing modules can significantly speed up your infrastructure development. Best Practices for IaC with Terraform ------------------------------------- Now that you’ve understood the basics of Terraform for IaC, let’s look at the best practices for implementing IaC with Terraform.  ### Adopt a GitOps Workflow for Version Control and Collaboration Managing Terraform configurations effectively means adopting a GitOps workflow. This approach makes Git the single source of truth for your infrastructure and applications, just like your code. You store your Terraform configurations in a Git repository to benefit from the version control and collaboration features. This approach lets you manage infrastructure code like application code.  You can use pull requests, code reviews, and automated testing. Here is an example of a typical GitOps workflow with Terraform: ** ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1721628405335-compressed.png) ** [Source](https://www.microtica.com/blog/3-steps-to-developing-a-successful-gitops-model) 1. Developers make a new branch for their infrastructure changes. They then open a pull request. 2. The CI/CD pipeline automatically runs terraform plan. This previews the changes. 3. After review and approval, the team merges the changes into the main branch. 4. The CI/CD pipeline then runs terraform apply. This applies the changes to the infrastructure. This workflow ensures that all changes to your infrastructure are tracked and reviewed. It also helps make sure that you apply changes in a controlled way—you will always have an audit trail. ### Use a Remote Backend for State Management Terraform state keeps track of the resources Terraform creates. It also makes sure that Terraform can update or destroy those resources. The state is stored locally by default using a file named terraform.tfstate. This works fine for one person.  However, it can cause problems for teams. If multiple people run Terraform at the same time, they can overwrite each other's changes and corrupt the state file. To avoid these issues, use a remote backend to store your state. It stores the state file in a shared location. It could be Amazon S3, Azure Blob Storage, or any other cloud setup that you already have.  They often have locking mechanisms that prevent concurrent modifications and protect the integrity of your state file. ### Use Modules for Reusability and Maintainability Terraform modules are a great way to package and reuse common configurations. You can easily create reusable Terraform code to reduce duplication and make your configurations easier to maintain. It also reduces duplication and promotes consistency. A well-designed module should: 1. Have a clear purpose. 2. Encapsulate related resources and configurations. 3. Expose a clear interface (inputs and outputs). 4. Be independently testable. 5. Be versioned and stored in a separate repository When creating modules, follow naming and structure conventions. HashiCorp recommends a standard module structure. This includes main.tf, variables.tf, and outputs.tf files. ### Use a Consistent Naming Convention Try to establish a consistent naming convention for your Terraform resources. This makes them easier to maintain and understand. Here are a few pointers for good naming: 1. Be descriptive and meaningful. 2. Include information about the resource's purpose, environment, and/or team ownership. 3. Use a consistent format and separator (e.g., hyphens or underscores). 4. Be compatible with the naming restrictions of your cloud provider. Here’s an example naming convention: --- For instance: myapp-prod-ec2-webserver myapp-staging-rds-database Maintaining a consistent naming standard across the team also makes it easier to understand each resource's purpose. It helps you identify resources belonging to a specific project or environment and avoid naming collisions when there is a large team working on the same projects. Common Challenges and Solutions ------------------------------- While Terraform is a powerful tool, it’s not without its challenges. Here are some common issues you may encounter and how to address them: 1. State drift: Your infrastructure's actual state should always match your Terraform state file. When they don't align, that's called drift. It often happens when someone manually changes your infrastructure outside of Terraform. Remember to regular run terraform plan and terraform apply to [find and fix drift](https://blog.facets.cloud/comprehensive-approach-to-maintaining-a-drift-free-infrastructure/). 2. Collaboration conflicts: Working with a team on the same Terraform configuration can lead to conflicts. To avoid headaches, use a remote backend like Terraform Cloud or AWS S3 with DynamoDB. And along with all the locking mechanisms, there is no substitute for clear communication and coordination within your team. 3. Complex dependencies: As your infrastructure grows, dependencies between resources can become complex and hard to manage. Terraform has a built-in dependency graph to help you understand and visualize these dependencies. 4. Performance issues: Large infrastructures can make Terraform operations slow and cumbersome. One way to speed things up is to break your configuration into smaller modules. You can also use terraform plan with the --target flag to focus on specific resources.  5. Provider limitations:  Terraform supports many providers. However, not all provider features may be available. Read the documentation for the providers you are using. Make sure you understand any limitations. The Next Step in Simplified Infrastructure ------------------------------------------ Implementing IaC will be difficult at first. Remember your first time learning to code?  The syntax seemed strange, and the ideas were hard to grasp. Algorithms didn’t make sense. But your confidence grew with every line of code you wrote and every bug you fixed.  Learning Terraform is going to be quite similar. It's just another way to interact with your infrastructure and like learning any language, you’ll get better as you write more code. But, this will take time for you, as well as everyone, who will be interacting with infrastructure on your team.  What if you could implement IaC without the difficulty of learning it from scratch? That is where [Facets](https://facets.cloud/) comes in. Facets acts like an expert you can access anytime to guide you through the cloud. You can easily model your architecture, launch environments, and handle daily tasks with its no-code platform. Facets simplifies infrastructure management and frees you to focus on what's important—creating and deploying great software. Want to experience no-code infrastructure automation to speed up your development workflows? [Try Facets for free](https://www.facets.cloud/start-free-trial)! --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Transforming Developer Productivity with Platform Engineering Author: Pravanjan Choudhury Published: 2024-06-06 Category: Blogs Meta Title: Transform Developer Productivity with Platform Engineering Meta Description: Discover how platform engineering can transform developer productivity. Learn strategies and tools to enhance efficiency. Tags: Developer Productivity , platform engineering, developer self service URL: https://blog.facets.cloud/transforming-developer-productivity-with-platform-engineering Platform engineering is a game-changer for improving workflows by using automation, setting standards, and fostering teamwork. It's not just about making things more efficient; it's a whole new way of working that brings teams closer to a future of better productivity with less hassle. At its heart, platform engineering understands that being productive is about more than just doing a lot of work. It focuses on making sure the software is not only high-quality and high-paced but also secure and optimal, following the key ideas behind DevOps. It offers developers a strong yet adaptable structure, saving them from the boredom of repetitive tasks. By promoting a do-it-yourself attitude, it allows for quick and well-informed decisions and actions, all while keeping a high standard of quality and safety. Let's take a closer look at how platform engineering boosts developer productivity. The points below highlight the big changes it brings to tech development: 1\. Platform-izing Automation ----------------------------- By platform-izing automation of standard tasks like code merging, testing, and deployments, developers can dedicate more time to complex, creative challenges. This efficiency reduces project timelines. For example, setting up [Continuous Integration and Continuous Delivery (CI/CD) pipelines](https://www.redhat.com/en/topics/devops/what-cicd-pipeline) automates code releases, slashing manual effort and reducing errors. Key areas where automation makes a difference include: * **Delivery pipelines:** These automate the integration and delivery process, reducing the developers' workload and speeding up the feature rollouts. * **Self-service portals:** Developers gain instant access to the tools and resources they need, eliminating waiting times. 2\. Empowering Developers with Self-Service ------------------------------------------- ​[Self-service](https://blog.facets.cloud/what-is-a-developer-self-service-platform/) portals are a game-changer. They allow developers to quickly access resources, tools, and environments without being held up by IT or operations. This autonomy supports a dynamic development pace. These portals are generally backed by developer-friendly platforms that further ease developer operations. Kubernetes is a standout example, enabling developers to deploy applications or manage resources independently. 3\. Simplifying Workflows ------------------------- Making information easy to access and standardizing workflows can greatly reduce the mental load on developers. Efforts to streamline include: ### 3a. Centralizing resources:  Documentation, libraries, and APIs are all kept in one place, making them easy to find and use. ### 3b. Predefined CI/CD pipelines:  These come with security and testing protocols built in, steering development smoothly without the need for constant decision-making. 4\. **Standardizing Development Environments** ---------------------------------------------- Creating uniform development environments and tools across the board eliminates inconsistencies, saving developers considerable time otherwise spent on setup and problem-solving. Key Improvements: * **Quicker Project Setup:** Standardization cuts down environment setup time from 4 hours to just 30 minutes. * **Faster Onboarding for New Developers:** What used to take few weeks now only takes a day, streamlining the integration of new team members into projects. **Comparison of project setup times before and after standardization.** Criteria Pre-Standardization Post-Standardization **Environment Setup** 4 hours 30 minutes **Onboarding New**  **Developers** 2 weeks 1 day ![Environment setup time - pre-standardization and post-standardization](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1717662477562-compressed.png) ![Time to onboard new developers - pre-standardization and post-standardization](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1717662262425-compressed.png) 5\. Enhancing Rapid Prototyping and Deployment ---------------------------------------------- A unified platform also supports quicker prototyping and deployment, which is vital for agile development practices. A platform always necessitates limiting choices for the developers to pick among a few options so that it can provide the required guarantees. Utilizing Docker containers, for example, ensures that applications behave consistently across all environments, significantly reducing deployment time and facilitating a smoother development cycle. 6\. Boosting Teamwork and Quality --------------------------------- Platform engineering brings together tools that improve how teams work together and guarantee top-notch results: ### **6a. Better Collaboration:**  By integrating tools like Git for tracking changes and Slack for talking things out, working together becomes smoother. Version control lets multiple people tweak code at the same time without stepping on each other's toes, and chat tools help solve problems and make decisions fast. Integrating tools and practices to improve teamwork and product standards is vital. Strategies include: * Integrating version control systems (e.g., Git) with real-time communication platforms (e.g., Slack) to streamline workflow and enhance team communication. * Integrating [audit logs](https://blog.facets.cloud/product-updates/april-2024/) with CI/CD processes to ensure releases are faster and at the same time tracked. ### **6b. Quality and Safety First:**  Platform engineering brings in a development process that includes automatic tests and security checks which can’t be bypassed, making sure products are strong and safe right from the start. For instance, tools that check code as it's being written can follows the best practices, earlier than the deployment process, catching problems before they grow. 7\. Easing the Mental Load -------------------------- By centralizing resources and simplifying compliance, platform engineering lessens the mental strain on developers, freeing them up to focus on new ideas: ### **7a. One Place for Everything:**  Having one spot for all the documentation, libraries, and APIs makes finding information easy, saves time, and cuts down on mistakes. When everything's in one place, developers are always in the loop with the latest updates and can share tools and code more effectively, ensuring consistency across projects. ### **7b. Ready-to-Use Tools:** Providing a set of standard tools and processes that follow best practices helps developers avoid decision overload and makes development smoother. For example, using pre-made CI/CD pipelines that already include security and testing steps means projects automatically meet the organization's standards, saving developers the hassle of setting everything up themselves. By tackling these key issues, platform engineering not only boosts individual developers' productivity but also enhances the performance and output of the whole team. This strategy creates a workspace where innovation isn't bogged down by needless tasks and duplication, common in traditional development methods. **Paving the Way for Tomorrow's Software** ------------------------------------------ The benefits of platform engineering are clear. Projects get off the ground quicker, new team members become productive faster, and updates are rolled out more smoothly and frequently. These aren't just minor improvements; they're game-changers that can significantly boost a team's performance and the quality of their work. ​[Platform engineering](https://blog.facets.cloud/handbook-to-platform-engineering-journey/) is not just a passing trend. It’s a new way of working that addresses the challenges of modern software development head-on, helping teams adapt quickly to changes and meet their goals more effectively. It's evident that adopting platform engineering is essential for any organization that wants to stay ahead of the curve. For those ready to take on the challenge, the rewards include not just greater productivity and better software but also a more vibrant and innovative development culture. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Embracing the Future: AI and No-Code Solutions’ role in Platform Engineering Author: Pravanjan Choudhury Published: 2024-05-30 Category: Blogs Meta Title: AI and No-Code Solutions’ role in Platform Engineering Meta Description: Let's look into how AI and no-code are transforming platform engineering, its impact on developers and businesses, and how it navigates today's tech complexities. Tags: platform engineering, Artificial Intelligence, AI, No-Code platforms URL: https://blog.facets.cloud/ai-and-no-code-solutions-role-in-platform-engineering ![AI and No-code platforms](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/screenshot-2024-05-30-at-12-1717053655412-compressed.png) Companies are shifting towards custom platforms, yet the advantages of no-code solutions are too big to overlook. This change is boosted by recent AI advancements, leading to smarter tools, better integrations, and more efficient development. At this pivotal moment, platform engineering offers both exciting opportunities and challenges. It's changing the way developers work and pushing companies to update their digital strategies to remain competitive. AI's role in platform engineering enhances its capabilities, promising a future where software development is quicker, more effective, and better suited to the fast-paced digital world. Let's dive into [platform engineering](https://blog.facets.cloud/platform-engineering-which-side-of-the-fence-should-you-jump/), exploring its impact on developers and businesses, and how it navigates today's tech complexities. This exploration is about welcoming change and spearheading a move towards a more flexible and quick-to-respond tech future. **Unpacking the Rise of Platform Engineering** ---------------------------------------------- Platform engineering is transforming software development by combining traditional coding with modern operational methods. This approach is tailored to match the fast-moving pace of current tech advancements. Let's delve into how this change is [reshaping the tech development landscape](https://www.cigniti.com/blog/rise-of-platform-engineering-gartner-hype-cycle-des/#:~:text=Gartner%20predicts%20that%20by%202026,more%20promise%20for%20platform%20engineering.). ### **Addressing Issues Early with Shift-Left** At its core, platform engineering uses a forward-thinking strategy called [shift-left](https://devopedia.org/shift-left). This approach aims to detect and solve problems early in the development cycle. It enhances efficiency and avoids future complications, leading to a smoother development process. ### **Guiding Development with the Golden Path** The [golden path](https://cloud.google.com/blog/products/application-development/golden-paths-for-engineering-execution-consistency?utm_source=the+new+stack&utm_medium=referral&utm_content=inline-mention&utm_campaign=tns+platform) offers developers a clear, flexible route through the software development lifecycle. It supports innovation within a structured framework, enabling developers to try out new ideas while following established best practices. Platform engineering is setting a new standard in software development and operations. It prioritizes early problem-solving, guides developers through the development process, speeds up software updates, and demystifies cloud technology. This approach is paving the way for a more streamlined and innovative future in tech. **Navigating the Next Wave of Platform Engineering** ---------------------------------------------------- The world of platform engineering is rapidly changing, shifting from the old ways to a future filled with innovation. This shift is driven by several key trends that are making technology development more streamlined and inclusive. **Custom Platforms Lead the Way:** In the fast-moving tech world, being quick is key. Top organizations are crafting their own custom platforms tailored to their unique needs and processes. These aren't just sets of tools but complete ecosystems designed for better workflow. They empower developers to manage their projects from start to finish, speeding up tasks and boosting independence. **Unified Frameworks Make Integration Easier:** With so many different tools and services out there, smooth integration is critical. Tools like Backstage offer unified systems that simplify and standardize operations. This means easier work for developers, fewer delays, and more efficiency. **No-Code Platforms Open Doors:** The rise of [no-code platforms](https://powerapps.microsoft.com/en-us/low-code-platform/) is perhaps the biggest game-changer. These platforms allow more people to take part in platform engineering, even without extensive coding skills. Thanks to user-friendly interfaces and simple components, no-code platforms enable quick development, inviting a wider group to innovate. As platform engineering evolves, it's embracing flexibility, better integration, and wider participation. These trends are setting new standards for tech development and paving the way for a tech world that's more collaborative, efficient, and open to all. **AI: The New Frontier in Platform Engineering** ------------------------------------------------ Artificial intelligence (AI), especially generative AI, is transforming platform engineering, making technology development more efficient and connected. Let’s see how AI is shaping the future of this field. **The Automation Revolution:** Imagine less manual scripting because AI can automate many tasks. This move towards automation will make processes faster and reduce the time it takes to launch new projects. **Easier Integrations:** Generative AI can simplify the complex task of integrating different systems. It aims to provide easy ways to merge technologies, lightening the load for developers. **Better Observability:** AI could revolutionize monitoring tools, offering real-time insights and advice. This proactive approach helps keep systems running smoothly and addresses problems before they escalate. **Streamlined Debugging:** AI, with its access to vast data, can help developers debug more efficiently by quickly identifying issues and their causes. This means less time troubleshooting and more time creating. **Infrastructure Design Made Simple:** AI can also guide infrastructure design, recommending best practices and configurations that meet developers’ needs, making setup easier. As platform engineering embraces AI, it's turning into a more intuitive and user-friendly field. Operations teams are adapting, focusing on improving tools and systems like product managers. This shift involves retraining teams, monitoring performance closely, and being ready to change strategies. Yet, it's essential to balance AI’s suggestions with human judgment, especially for critical decisions. The move towards AI-enhanced platform engineering is exciting, opening up new possibilities for smoother and more effective tech environments. **Embracing the Future** ------------------------ AI and automation are transforming platform engineering, leading to a shift towards more efficient software development. This change is here and now, paving the way for a future where developers spend more time innovating and less on routine tasks. AI is making big strides, easing integration efforts, enhancing monitoring tools, assisting in debugging, and managing infrastructure setup. This evolution is about working smarter, not harder. However, moving into this AI-driven era needs preparation. We must consider our roles more like product managers, focusing on training and adapting to these shifts. While AI's role in platform engineering brings excitement, we must tread carefully. Using AI wisely, particularly in critical functions, is essential. Balancing AI's capabilities with human oversight is crucial. Are we ready for AI's influence in platform engineering? Our response will determine the future of software development, affecting developers' work globally. Adapting offers challenges and opportunities, aiming to revolutionize how we create, launch, and maintain software. Looking forward, adopting AI in platform engineering is about setting new standards for innovation and efficiency in the tech world. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Security by Design & Default: Weaving Security into the DNA of your Software Author: Facets.cloud Published: 2024-05-21 Category: Blogs Meta Description: Build secure software from the start with Security-by-Design and Security-by-Default principles. Learn strategies for embedding security into every stage. Tags: security by design, security, security by default URL: https://blog.facets.cloud/security-by-design-and-security-by-default ![Security by Design and security by default control](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/screenshot-2024-05-21-at-3-1716286739077-compressed.png) Imagine building something secure from the beginning, instead of patching holes later. That's Security-by-Design. It's more than just a method; it makes security a fundamental part of creating technology products. This idea fits well with DevOps, a modern development practice, by promoting a proactive approach to security. It emphasizes including security at every stage of development. It's more than just a method; it makes security a fundamental part of creating technology products. This idea fits well with DevOps, a modern development practice, by promoting a proactive approach to security. DevOps is an excellent framework for integrating security throughout the software development and deployment process. This way, security becomes a key element right from the start, not just something tacked on at the end. Similarly, platform engineering benefits from adopting a Security-by-Default approach, which builds security into systems from the beginning and helps to mitigate risks early. There are challenges, of course. We need to change how we think, what we do, and the tools we use to make security a natural part of development. But if we do this right, security won't slow us down. It will actually make things better and faster. The aim is to create an environment where security is not viewed as an obstacle but as a driver of innovation and efficiency in the development and deployment of digital solutions. Two Schools of Thought: Security by Design and Security by Default Controls --------------------------------------------------------------------------- There are two schools of thoughts for implementing security in technology and they focus on the same goal, however the approach is differentiated. **Security by Default**: [Security by Default](https://owasp.org/www-project-proactive-controls/v4/en/c5-secure-by-default) Controls ensures software and systems start with the most secure settings. This makes it easier for users to stay secure, avoids mistakes in setup, and strengthens overall security. The trick is finding the right balance: strong security shouldn't make things too difficult or frustrating to use. **Security by Design**: [Security by Design](https://www.spiceworks.com/it-security/cyber-risk-management/articles/what-is-security-by-design/) weaves security throughout the entire software development process. This not only makes the software stronger, but also helps follow regulations and makes everyone think about security. However, it can require a change in how things are done and might take longer and cost more upfront. Both strategies are important. Each approach plays a crucial role in building secure systems, focusing on proactive risk management and secure user experiences, respectively. Here’s a more detailed comparison between the two: **Aspect** **Security by Design** **Security by Default Controls** Definition A proactive approach that integrates security considerations into every stage of the software development lifecycle. A principle that ensures security settings are configured to the most secure defaults throughout the software or system. Focus Embedding security in the design, development, and deployment processes. Ensuring that the default configurations are the most secure to prevent unauthorized access. Objective To anticipate and mitigate security risks early in the development process, making security an integral part of the solution. To provide users with a system that is secure by default, requiring minimal security configurations by the end user. Benefits Reduces potential vulnerabilities and security risks, facilitates regulatory [compliance](https://thenewstack.io/cloud-native-security-hasnt-solved-compliance-challenges/), and promotes a culture of security awareness. Simplifies the process of maintaining security for users, reduces the risk of configuration errors, and enhances overall security posture. Challenges Requires a shift in organizational mindset towards prioritizing security, may increase initial development time and cost. Balancing security with usability, ensuring default settings do not restrict necessary functionality or user experience. ### **​ The right way? Both, by default and by design** Integrating Security by Design and Security by Default Controls into [DevOps and platform engineering](https://blog.facets.cloud/next-in-devops-user-centric-platform-engineering-approach/) has become essential, not just optional. This approach embeds security deep within the development process, ensuring that it isn't just added as an afterthought but is a foundational component from the very beginning. By weaving security into every phase, from initial design through to deployment, these strategies ensure that security measures evolve as an integral part of the development lifecycle. Both strategies come with toolkits. Security by Design stresses planning ahead for security risks, like having a fire extinguisher handy. Security by Default Controls focuses on the safest settings by default, so users don't need to tinker for protection. By seamlessly integrating these approaches with DevOps and platform engineering, organizations gain a double win: stronger defenses and smoother operations. This security-first mindset is essential for building rock-solid digital solutions that can handle any threat out there. **Strategies for Embedding Security Principles into SDLC** * **Collaborative Culture:** Building a security-aware culture is foundational. Regular security training for all team members not only raises awareness but also empowers each individual to take ownership of security within their roles. * **Shift-Left Security:** By integrating [security](https://www.fortinet.com/resources/cyberglossary/shift-left-security#:~:text=1.,software%20and%20application%20development%20phase.) early in the SDLC, teams can identify and mitigate vulnerabilities sooner. Implementing automated [security scanning tools](https://spectralops.io/blog/top-10-ci-cd-security-tools/) in the CI/CD pipeline ensures continuous security checks, making security a part of the daily workflow. * **Automated Security Tools:** Automation is key in keeping up with fast-paced development cycles. Continuous feedback on code quality and security risks, streamlining the remediation process is pivotal. **Cultural Shifts:** * **From Siloed to Integrated Teams:** Breaking down the silos between development, operations, and security teams encourages a more holistic approach to security, where responsibilities are shared, and communication is open. * **Continuous Learning and Improvement:** Security is an ever-evolving field. Adopting a culture of continuous learning and regular retrospectives ensures that teams stay updated on the latest threats and best practices, adapting their strategies accordingly. The Future is Secure: Building on Innovation, Not Vulnerability --------------------------------------------------------------- The old way of building software – patching [security](https://thenewstack.io/security/) holes after the fact – is a recipe for disaster. By embracing Security-by-Design and Security-by-Default Controls, we can build a future where security is a cornerstone, not an afterthought. Imagine it: development that prioritizes security from the very beginning, leading to more robust and resilient digital solutions. Not only does it strengthen defenses, but it also streamlines operations. Let's build a future where innovation thrives on a foundation of security. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Product Updates: Audit Logs, UI Changes, Artifact Pinning, and more! Author: Facets.cloud Published: 2024-05-15 Category: Product News Meta Title: Facets Product Updates - April 2024 Tags: facets.cloud, product updates URL: https://blog.facets.cloud/product-updates/april-2024 In our continuous effort to enhance [Facets](https://www.facets.cloud/), we have been working to introduce new features and improvements. This month, we are introducing Audit Logs, artifact enhancements, new integrations, and UI revamp that make your Facets’ experience better. Let's dive into the details: **Audit Logs** -------------- Transparency and accountability are paramount in any organization. Our new Audit Logs feature provides this in Facets. You can now view the audit logs of every action performed across every page in your Facets Control Plane. For every change, the system captures and records the details of who made the change, where they made it, and what it was. Navigate to the Audit Logs page from the main nav bar to check the logs. ![Audit Logs in Facets](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1715777599932-compressed.png) Audit Logs This detailed audit trail can help you in security, compliance, troubleshooting, and deep audits. For example, if one of your applications experiences intermittent downtime or performance degradation, the audit logs provide a detailed trail to identify the root cause. You can review all recent actions performed on the application's resources like deployments, services, or configurations within the relevant timeframe. The logs capture user details, resource overrides, and environment changes that may have contributed to the issues. This comprehensive information allows teams to reconstruct the events leading up to the problem quickly, pinpoint potential causes like recent deployments or configuration changes, and expedite the troubleshooting process. Please note that this feature is currently in beta, and you can anticipate further enhancements. **Directly Pin Artifacts for Service** -------------------------------------- Devs and Ops teams often require the ability to use specific versions of container images for various reasons: 1. To isolate application code issues while testing and debugging new features or fixing bugs. 2. To use the same image versions while reproducing customer-reported problems. 3. To decouple from their CI/CD pipelines and use image versions not directly integrated with their CI systems for manual overrides and hotfixes. To address this, we’ve added the ability to directly incorporate images from your Container Registries into the service. ** ![Artifact pinning](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1715777600912-compressed.png) Pin artifacts for specific service ** **Secure Boot on Kubernetes Clusters managed by Facets** -------------------------------------------------------- Secure Boot, a critical security feature is now available on Kubernetes clusters managed by Facets. It enhances system security by verifying the digital signatures of boot components on cluster nodes. This ensures that only authorized firmware and signed kernel modules are loaded during the boot process, safeguarding devices from malware and unauthorized access. With Secure Boot enabled, users can protect their environments from potential threats, ensuring only verified software components are executed during the boot process. **AWS Data Lifecycle Manager (DLM) Integration** ------------------------------------------------ AWS Data Lifecycle Manager (DLM) helps automate the management of data and storage resources across various AWS services. It is designed to simplify creating, retaining, and deleting backups, snapshots, and other data objects based on user-defined policies and schedules. This integration enables volume snapshotting, cross-region copying, and retention management, all from a single configuration setting, enhancing your data backup processes. **Support for ARM Architecture** -------------------------------- Facets now supports ARM architecture, including Graviton and other ARM-based nodes. This lets you deploy and manage workloads on ARM-based servers, which can be more power-efficient, cost-effective, and optimized for certain tasks. **ACL Support for Legacy S3** ----------------------------- Managing access controls for your S3 buckets just got easier with our new ACL support for S3. You can now establish public access to your buckets and their respective objects. **UI Revamp** ------------- We are excited to announce a significant update in this release, focusing on enhancing your user experience based on your valuable feedback. Complete UI Navigation Makeover: * All Blueprint and Environment tabs previously found on the side panel are now located as tabs within their respective pages. ![Navigation](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1715777602083-compressed.gif) Navigation changes * The navigation now only displays the Blueprint context you are actively working on, ensuring clearer navigation and a focused approach to your workflows. * The breadcrumb now shows the context of the selected Blueprint, Environment, Resource, and Resource Type. New 'Settings' Tab: * A comprehensive view across all entities is now available under a new 'Settings' tab. ![settings](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1715777603374-compressed.gif) Settings tab We are committed to continuously improving the Facets platform and delivering features that enhance your experience. Stay tuned for more exciting updates in the coming months! --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## A Blueprint for Standardization in Security, Cost, Compliance & Observability Author: Pravanjan Choudhury Published: 2024-05-03 Category: Blogs Meta Title: Engineering Standardization: Blueprint for Security, Cost & Compliance Meta Description: Understand the challenges and principles for standardizing engineering processes across teams for better security, cost management, compliance and observability. Tags: security & compliance, standardization URL: https://blog.facets.cloud/standardization-in-security-cost-compliance-and-observability ![standardization](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/standard-quality-control-concept-m-1-1715254614736-compressed.jpg) ​In complex engineering organizations where teams work on diverse products, standardization is key. By focusing on security, cost management, compliance, and system oversight, standardization aligns teams with different objectives. At its heart, [standardization](https://thenewstack.io/bringing-harmony-to-chaos-a-dive-into-standardization/) strengthens the system and fosters resilience. It's like creating a common language for the company, easing cooperation and preventing conflicts. The goal is to weave these standards seamlessly into daily workflows for greater flexibility and consistency. However, getting everyone on the same page isn't straightforward. Different ways of doing things can lead to a mix-up. As we look into this more, it's important to spot these challenges early and figure out how to overcome them. Complexities of Standardization: Key Challenges and Considerations ------------------------------------------------------------------ The challenges in standardization stem from different areas, including how teams work together and the pace of tech advancements. Let's dive deeper into these issues for a clearer picture. ### 1\. Diverse Team Priorities and Practices **Alignment Hurdles in Large Organizations**: Aligning numerous engineering teams, each with distinct methodologies, tools, and objectives, poses a substantial challenge in sizable firms. **Resistance to Change**: Teams accustomed to their workflows may see new standards as disruptive, resisting changes that they perceive as unnecessary. **Reinventing the Wheel**: A lack of standardization can prompt teams to develop their solutions for problems already solved within the organization, leading to wasted resources and inconsistencies. ### 2\. Knowledge Transfer and Documentation **Rapid Technological Evolution**: The swift pace of change in technology complicates the maintenance of current standards and best practices, making documentation a continuous challenge. **Varied Interpretations**: Sole reliance on documents for standards can result in different teams applying these guidelines in slightly varied ways, leading to inconsistencies. **Engagement with Documentation**: Dense, lengthy documents can deter teams from thorough engagement, often resulting in skimmed readings or complete avoidance. ### 3\. Resource and Communication Constraints **Cost Implications**: The adoption of new standards or tools entails significant expenses, from the acquisition of new resources to the indirect costs associated with training and adaptation periods. **Challenges in Communication**: Ensuring that each team member, especially in large or geographically dispersed teams, comprehends and adopts new standards necessitates effective communication strategies. **Collaboration Across Teams**: For teams working on interconnected projects, the lack of effective communication and collaboration channels can result in misalignments or redundancies, complicating the project ecosystem further. ### 4\. Balancing Rigidity and Flexibility Finding an equilibrium that avoids the extremes of stifling innovation with too much rigidity or fostering confusion with too much flexibility is crucial for effective [standardization](https://cloudnow.medium.com/standards-vs-standardization-in-devops-the-fine-line-between-streamlining-processes-and-hindering-d08daff01130). The Fundamental Principles of Software Standardization ------------------------------------------------------ Following our exploration of the challenges associated with a lack of standardization, it becomes crucial to highlight the principles that pave the way for alignment and coherence across teams. These principles act as a compass, guiding teams towards a shared objective. Below, we delve into the core principles essential for achieving software standardization. ### **1\. Prioritizing Artifacts Over Documentation** **Why It Matters**: Shifting focus from heavy reliance on documentation towards actionable artifacts streamlines processes and encourages a practical approach. **How to Implement**: Emphasize the creation and use of tangible outputs, such as standard dashboards, compliance rules, or backup policies. These artifacts offer teams concrete tools to work with, reducing the need to navigate through extensive documentation. ### **2\. Validating Early in the Shipping Process** **Why It Matters**: Early validation of artifacts within the software development lifecycle helps prevent errors from advancing to later stages, where they are harder and costlier to fix. **How to Implement**: Embed standardized checks and protocols early in the design and development stages to catch and correct mistakes promptly. ### **3\. Establishing Golden Paths** **Why It Matters**: Providing a clear, efficient guideline for teams ensures adherence to best practices, streamlining workflow and reducing variability. **How to Implement**: Define and document optimal processes for common tasks and challenges, creating a reference that ensures consistent application of best practices. ### **4\. Balancing Global Standards with Flexibility** **Why It Matters**: Although global standards provide a solid structure, recognizing the unique needs of individual teams is essential. **How to Implement**: Implement international standards while granting teams the flexibility to adjust and customize based on the specific demands of their projects, marrying best practices with practical applicability. ### **5\. Crafting Platform Guarantees** **Why It Matters**: Platforms without clear boundaries can lead to misuse. Establishing minimalistic interfaces with guaranteed outcomes promotes efficiency. **How to Implement**: [Design platforms](https://blog.facets.cloud/building-an-internal-developer-platform/) with straightforward interfaces that offer guarantees on outcomes, aligning with [](https://blog.facets.cloud/improve-developer-experience-with-specialized-dev-environments)[developer expectations](https://blog.facets.cloud/ideal-development-environment-for-optimal-software-development/) without overwhelming them with too many options. ### **6\. Focusing on Developer-Centric Information** **Why It Matters**: Excessive information can overwhelm and distract. Providing developers with relevant, timely data enhances productivity. **How to Implement**: Utilize smart systems to filter and present information that is directly relevant to developers, minimizing distractions and focusing on actionable insights. Challenge v/s Principle ----------------------- ![Standardization - Challenge vs Principle](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/screenshot-2024-05-07-at-2-1715072750506-compressed.png) The Power of Uniformity in Tech Development ------------------------------------------- Standardization plays a vital role in technology, enhancing system performance and [security](https://thenewstack.io/security/) while minimizing errors. It aligns teams across the board, leading to more dependable systems. For software development, maintaining consistency, especially in non-functional requirements, is essential.  Standardization boosts efficiency, especially for organizations juggling multiple projects. It goes beyond mere guidelines, forming the backbone of productive development practices. By reducing problems and improving results, standardization is key to thriving in the ever-changing tech world. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Before first steps: Handbook to the Platform Engineering Journey Author: Pravanjan Choudhury Published: 2024-04-19 Category: Blogs Meta Title: Handbook to the Platform Engineering Journey Meta Description: From transitioning to customizing solutions, this blog provides actionable advice to help teams navigate this transformative journey to platform engineering Tags: Developer experience, platform engineering URL: https://blog.facets.cloud/handbook-to-platform-engineering-journey Making the switch to [platform engineering](https://blog.facets.cloud/platform-engineering-a-guide/) requires a big change in the way we assume the roles in the Ops function. We need to focus on how to actually make the move, providing a clear guide for this new journey. We aim to offer straightforward, actionable advice for teams looking to shift to platform engineering roles. The goal is to make the path to becoming a platform engineer clear and doable, offering insights and advice to make this complex transition easier. We're not just talking about changing jobs; we're talking about a deep change in how we think and work in software development. Handbook to the Platform Engineering journey -------------------------------------------- Switching to platform engineering isn't just about changing job titles—it's a complete transformation in responsibilities, skills, and mindsets. This guide outlines how to smoothly transition from traditional operations to innovative platform engineering. **Changing mindset from maintenance to innovation** The essence of platform engineering is to treat developers as your primary customers, focusing on their needs as if creating a product specifically for them. This approach shifts from simply managing infrastructure, as seen in DevOps, to developing Infrastructure as a Product. This change is vital; your role isn't just to maintain systems but to create tools that boost developers' productivity and creativity. **Identifying problems and customizing solutions** Begin by examining your current operations to identify and resolve bottlenecks. This crucial step clears the way for innovation by fixing problems that hinder or interrupt workflows. Addressing these issues directly paves the path to a successful platform engineering approach. **Balancing Control and Innovation** Finding the right balance between governance and creative freedom is key. Too much control can stifle innovation, while too much freedom can cause inconsistency and governance issues. Aim for a middle ground that maintains order and compliance but also allows for innovation and experimentation. **Comprehensive Resources for learning and support** Like any product, your platform must come with detailed documentation and training materials. These resources are crucial, enabling users to understand and navigate the platform confidently and efficiently. **Summarizing the guidelines** **Aspect** **Objective** Mindset Shift Adopt a product-centric view, focusing on developers as customers. Identifying Bottlenecks Locate and address workflow inefficiencies to pave the way for platform engineering adoption. Customization vs. Best Practices Balance unique organizational needs with industry standards to foster innovation without reinventing the wheel. Control vs. Flexibility Strike a balance between governance and creative freedom, ensuring a conducive environment for innovation. Educational Support Provide comprehensive documentation and training to enable user proficiency and platform adoption. Platform Engineering for Startups and Scale-ups ----------------------------------------------- The path to platform engineering differs greatly between startups and large enterprises, with each needing a distinct strategy for a smooth transition. Here’s a guide on how both can achieve this with precision and insight. ### Startups: Embracing Agility and Team Feedback For startups, agility is crucial. A key aspect of agility is gathering direct feedback from developers to ensure the platform meets genuine needs and encourages everyone's input.  Startups prioritize documenting processes and automating routine tasks, which helps dedicate more time to innovation. They track their progress by measuring developer satisfaction through the Net Promoter Score (NPS) and their ability to quickly move from concept to execution (sprint velocity). Key Advantages for Startups: * **Fast Feedback and Adoption**: Without old systems to hold them back, startups can quickly take in feedback and try out new ideas. * **Focus on Essentials**: Making sure everything's written down and automating simple tasks helps streamline their work. * **Keeping Track**: Using metrics like developer NPS and sprint velocity lets them see how well the transition is going and where to adjust. ### **Scaled-up Enterprises: Steady Evolution and Strategic Resourcing** Larger companies take a more cautious approach to moving towards platform engineering, often starting with small trial projects. The challenge for them is to [standardize processes](https://blog.facets.cloud/standardization-in-security-cost-compliance-and-observability/) without stifling the unique needs of different teams. They rely on their operations teams to find and fix any issues, and they invest in training developers for the new ways of working.  To see if they're on the right path, they conduct regular checks and watch key indicators, like how many large projects are successfully completed. Key Focus Areas for Larger Companies: * **Trial Projects**: Small, manageable projects are used to test out new methods. * **Balancing Needs**: They work to standardize while still allowing for team-specific adjustments. * **Leveraging Experience**: Operations teams are crucial for spotting and solving problems. * **Training for the Future**: They make sure their developers are ready for new challenges. * **Measuring Success**: Through audits and looking at key performance indicators ([KPIs](https://blog.facets.cloud/key-kpis-for-platform-engineering-success)), like the success rate of big projects, they can tell if the changes are working. **Transition Strategies at a Glance** ------------------------------------- **Aspect** **Startups** **Scaled-up Companies** **Adoption Strategy** Swift adoption, leveraging the absence of legacy systems. Gradual implementation, navigating around legacy systems. **Team Involvement** Direct involvement of developers in decisions, thanks to smaller team sizes. Operations teams highlight inefficiencies; standardized practices are essential for cohesion across larger teams. **Flexibility & Iteration** A broad scope for trial and error to discover what works best. Pilot projects serve as test beds for new methodologies. **Resource Management** Focus on automation to allocate human resources to more complex challenges. Upskilling and retraining become priorities to bring existing teams up to speed. **Evaluation Metrics** Developer NPS and sprint velocity are key for measuring progress and making adjustments. Regular audits and specific KPIs, such as the rate of successful deployments, are critical for assessing the transition. Taking the next steps towards platform engineering -------------------------------------------------- Moving to platform engineering isn't just about updating technology; it's a complete re-evaluation of how a company operates. This shift breaks down old barriers and fosters a culture of teamwork and creativity. For both large and small companies, this change enhances operations and aligns them more closely with their objectives, allowing them to stay ahead in a fast-evolving tech landscape. Platform engineering is key for any business looking to improve its efficiency and stay competitive. It requires careful planning, flexibility, and a commitment to continuous improvement. Embracing this change opens the door to greater scalability, smoother operations, and a robust, flexible development ecosystem. Ready to embark on this transformative journey? --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Product Updates: PR Workflow for Blueprint Changes, GCP Enhancements, and more! Author: Anshul Sao Published: 2024-04-11 Category: Product News Tags: product updates URL: https://blog.facets.cloud/product-updates/march-2024 In this post, we will go over everything we have been working on in March 2024. Read on to know more. ### Major Updates ### PR Workflow for Blueprint Changes The PR Workflow for Blueprint changes is similar to how teams raise and merge PRs for code changes. It ensures that all changes made to the blueprint are reviewed before they are live in the environments, promoting collaboration and accountability. Now, you can manage blueprint changes through version control to confidently make modifications knowing that you can roll back to previous states if needed. This reduces the risk of introducing errors or unintended consequences in the infrastructure configuration. This workflow also aligns with industry best practices for managing cloud infrastructure. ![PR workflow for blueprint changes](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/bp-changes-1712850091109-compressed.png) You can create a separate branch to make blueprint changes, and then raise a PR to merge these changes into the master branch (ie the current Blueprint). By merging to the master branch, the Blueprint will be updated with your changes.  You can also check Git actions and PRs from Facets UI, and create PRs after making the required changes. ### ​Connect your VCS and Cloud Accounts to Facets in under 20 seconds Setting up VCS and Cloud account integrations with third-party services has always been difficult. So we took a first principles approach to reimagine what it could look like if we wanted to do it in the shortest time possible.  ![](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/accounts-1712890536083-compressed.gif) With the revamped setup, you can now integrate your VCS and Cloud accounts with Facets in under 20 seconds. This simplifies the intricate steps in setting up new Cloud and VCS accounts. ### Jumpcloud Integration for Oauth Facets provides the flexibility to integrate with your existing OAuth systems so that you don’t have to set up IAM permissions again. In addition to Google Oauth, Azure AD, Okta, One Login, and Generic SSO, we’ve now added Jumpcloud SSO. [Jumpcloud](https://jumpcloud.com/) is an open directory platform that provides features from unifying users across infrastructure, security, and resource access.  General updates --------------- ### GCP Enhancements With respect to Google Cloud, we have added support for three important features that are harder to set up manually. **Shared VPC**: In large organizations with multiple teams or projects, a Shared VPC allows for centralized network management and resource sharing across these teams or projects. This simplifies network administration, ensures consistent security policies, and optimizes resource utilization. It also helps in managing multi-tier applications distributed across different projects by allowing seamless communication between the components while maintaining network policies. While creating a GCP environment, users can now specify the Shared VPC configurations in the Advanced Settings section. **Support for multiple availability zones:** Deploying critical applications and services across multiple availability zones is crucial for maintaining business continuity and minimizing downtime in the event of a disaster or outage. By distributing resources across separate availability zones, organizations can ensure that if one zone experiences a failure, the application or service can continue running in the other zones, minimizing disruption to end-users and business operations.  ![GCP enhancements](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/gcp-1712889963755-compressed.png) ** Node Auto-provisioning in GKE:** It actively manages the node pools within the cluster to ensure efficient resource utilization and scalability. Auto-provisioning monitors the resource usage of pods running in the cluster and adjusts the size of node pools accordingly.  It’s beneficial in various scenarios like zero downtime, cost optimization, handling bursty workloads, multi-tenancy, disaster recovery, and high availability.  ### **Configmap as env variables** Facets now supports the ability to add configmaps as environment variables in service modules. Configmaps are used to store non-sensitive data. Users can add ConfigMaps as environment variables while adding and editing the configuration of the service module. Once added, users will need to just update the ConfigMap in case of any changes, rather than updating the configurations of every service individually.  ​[Read more about ConfigMaps as environment variables here.](https://kubernetes.io/docs/concepts/configuration/configmap/) ​ ### VCS Account Expiry Notification The PAT tokens can expire or be revoked for Github, Gitlab, or Bitbucket. This causes users to lose access to critical functions on Facets like performing a release, adding new resources viewing resources on the blueprint designer, managing secrets & variables on a blueprint level, etc.  * In the Account Management page, the accounts that have expired or are due to expire within 7 days will be highlighted via icons. ![](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/token-expiry-1712851079760-compressed.png) * If a linked VCS account expires, the “Sync with Git” will be automatically disabled. * All these enhancements will help identify if an error is because of PAT token expiry or other causes.  **​****Custom CIDR ranges with WireGuard VPN** By default, you have access to resources within the VPC CIDR range when connected to the WireGuard VPN client. However, you can now add custom CIDR ranges from the Environment Settings page, providing additional flexibility and control over your network access. ### Optional On-demand Fallback in AWS and GCP Environments Earlier in Facets, on-demand nodes were automatically scaled up for AWS and GCP Environments to tackle Spot interruptions. With this enhancement, you now have the option to disable the On-demand Fallback feature from the Facets Control Plane. If you choose to turn off On-demand Fallback and if spot instances are unavailable, On-demand nodes will not scale up. You can use this option for development, staging, or QA environments, as well as special workloads that don't need continuous uptime and optimize for cloud costs. This also improves budget predictability and helps in aligning resource allocation with strategic objectives. Enhancements ------------ ### Upgrade to Kubernetes 1.27 (GKE) We have upgraded the Google Kubernetes Engine (GKE) to version 1.27. For more information, refer to [GKE Release Notes](https://cloud.google.com/kubernetes-engine/docs/release-notes#current_versions). ### Release Time Optimization We have made enhancements to the release times in Facets. This results in reduced wait times and a seamless user experience. ### Resource Flavor Selection  ![](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/screenshot-2024-04-11-at-9-1712851494569-compressed.png) ​Previously, users had to manually edit resource JSON files to change resource flavors. Now, users can conveniently select the desired resource flavor when creating the resource, simplifying the configuration process. For example, right when you’re creating a database, you can either select a cloud SQL in GCP, an rds or aurora in AWS, or a flexible\_server in Azure. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## A Primer to Key KPIs in Platform Engineering Author: Facets.cloud Published: 2024-04-11 Category: Blogs Meta Title: Key KPIs for Measuring Platform Engineering Success Meta Description: Delve deep into the key leading and lagging indicators to effectively evaluate the impact of platform engineering initiatives on operational efficiency, developer experience, and cloud infrastructure. Tags: Developer experience, Developer Productivity , platform engineering URL: https://blog.facets.cloud/key-kpis-for-platform-engineering-success ![KPIs to Platform engineering](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/pikasotexttoimagecomputer-with-trends-and-charts-1715238953353-compressed.jpeg) [Platform engineering](https://blog.facets.cloud/platform-engineering-a-guide/) revolutionizes how technology meets specific needs, improving how developers work and makes operations smoother. It connects and refines tech parts, essential for companies aiming at efficiency and innovation, pushing them ahead in their industries. Yet, seeing platform engineering's real effects can be tricky. It's like piecing together a complex puzzle where immediate results aren't always visible, complicating the assessment of its impact on developer happiness and operational efficiency. Some benefits might take time to show, requiring patience and attention. To tackle these evaluation challenges, we use two key concepts: leading and lagging indicators. Leading indicators predict future trends, acting as early alerts, while lagging indicators confirm these trends after the event, helping us understand the long-term benefits. * **Leading Indicators**: These serve as our early warning system. Similar to the dashboard indicators in a vehicle, they quickly alert us to potential issues, allowing us to make adjustments before minor issues escalate into major problems. They provide a snapshot of current operations, enabling us to act swiftly and effectively. * **Lagging Indicators**: These come into play after we've made changes, offering a delayed reflection of those actions. They help us understand the full impact of our initiatives, confirming whether the adjustments we've made are delivering the desired long-term benefits. Now, let's delve deeper into the specific indicators we track to gauge the [success of our platform engineering](https://thenewstack.io/6-patterns-for-platform-engineering-success/) initiatives: 1\. **Reduction in Operational Tasks** -------------------------------------- * **Description**: This indicator measures the decrease in manual, repetitive tasks that often bog down the operations team. * **Significance**: By reducing these tasks, we free up our team to concentrate on more valuable activities, such as enhancing system performance and service quality. This shift not only boosts creativity within our IT infrastructure but also enables our organization to swiftly adapt to new challenges. * **Leading Indicators**: * **Decrease in Operational Tickets**: We track the reduction of manual operations tickets created by development teams, indicating smoother workflows post platform engineering implementation. * **Shorter Environment Provisioning Times**: We measure how much quicker new testing or business environments can be launched, thanks to automation and self-service capabilities. * **Fewer Operational Incidents**: We evaluate the decline in infrastructure failures and the speed of recovery, reflecting an overall improvement in system reliability. 2\. **Enhanced Developer Experience and Productivity** ------------------------------------------------------ * **Description**: This focuses on improving the satisfaction and efficiency of our developers. * **Significance**: Enhancing the [](https://blog.facets.cloud/improve-developer-experience-with-specialized-dev-environments/)[developer experience](https://blog.facets.cloud/ideal-development-environment-for-optimal-software-development/) leads to faster onboarding of new team members, improved collaboration across teams, and ultimately, quicker and more reliable software deliveries. These improvements play a critical role in driving our organization forward. * **Leading** **Indicators**: * **Accelerated Onboarding**: We assess how platform engineering reduces the time required for new developers to become fully productive, thanks to [standardized processes across teams](https://blog.facets.cloud/standardization-in-security-cost-compliance-and-observability/). * **Increased Deployment Frequency**: We monitor the rate of software releases, which signals a boost in confidence and autonomy among developers regarding their work. * **Optimized Time on Integrated Development Platforms ([IDP](https://blog.facets.cloud/internal-developer-platforms-the-secret-weapon-for-developer-productivity/))**: We gauge the quality and quantity of time developers spend on the IDP, looking for improvements in usability and efficiency. 3\. **Improved Cloud Infrastructure** ------------------------------------- * **Description**: This indicator focuses on the stability, security, and efficiency of our cloud services. * **Significance**: A strong cloud infrastructure is essential for any organization that relies on cloud technology. Improved compliance with best practices in cloud services reduces vulnerabilities and operational risks, which is crucial for maintaining high service availability and data security. This, in turn, strengthens trust among stakeholders and customers. * **Leading Indicators**: * **Cost Efficiency**: We track reductions in costs, especially in non-production environments, as an early indicator of broader financial benefits. * **Streamlined Upgrades**: We monitor the time spent on system upgrades, aiming for quicker transitions to newer, more secure versions. * **Reduced Compliance Issues**: We keep an eye on the frequency of non-compliances detected in security scans, such as audits or penetration tests, as a measure of our improved cloud security posture. Through meticulous tracking of these leading and lagging indicators, we are able to effectively evaluate the impact of our [platform engineering](https://blog.facets.cloud/next-in-devops-a-user-centric-platform-engineering-approach/) efforts. This ensures that our operations are not just efficient but are also conducive to creating a positive experience for our developers and a robust cloud infrastructure. ![KPIs to Platform Engineering](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/screenshot-2024-05-09-at-3-1715249983059-compressed.png) Gauging the Future Impact of Platform Engineering ---------------------------------------------------- At its heart, platform engineering is all about ongoing improvement. By closely watching our metrics, we make sure we're moving in the right direction. Leading indicators act as a preview, allowing us to adjust early on. Lagging indicators, on the other hand, show us how far we've come in improving operations and making developers happier. Platform engineering isn't just a method; it's a journey towards lasting success in technology. It emphasizes staying focused, striving for excellence, and adapting to a more agile world. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Platform Engineering: Which side of the fence should you jump? Author: Pravanjan Choudhury Published: 2024-04-04 Category: Tech Articles Meta Title: Platform Engineering: Is It Right for Your Organisation? Meta Description: Is platform engineering right for your organization? Understand the key benefits and factors to consider before adopting platform engineering. Tags: devops, platform engineering URL: https://blog.facets.cloud/platform-engineering-which-side-of-the-fence-should-you-jump ![platform engineering](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/screenshot-2024-05-09-at-4-1715252169810-compressed.png) [Platform engineering](https://blog.facets.cloud/next-in-devops-a-user-centric-platform-engineering-approach/) is about creating tools and services that help developers write high-quality code easily, without worrying about infrastructure. It speeds up projects, makes developers' jobs easier, and ensures consistency in teams.  While platform engineering offers solutions to many digital challenges businesses face today, adopting it requires careful thought. Organizations must assess their software development capabilities and readiness for change. In this blog, we'll dive into platform engineering, urging you to consider [whether your organization is ready for it](https://blog.facets.cloud/ops-roles-evolve-into-platform-engineers/). Are you set to benefit from platform engineering, or do you have more critical goals to achieve first? Let's explore the benefits and key considerations for making a well-informed decision. Who should SKIP platform engineering? ------------------------------------- **Startups seeking product-market fit** should aim for simplicity. During this crucial stage, quick prototyping and fast changes are key. Complex platform engineering could slow things down. Simple, cost-effective ways to deliver applications, or even no-code options, are often better choices. **Enterprises using off-the-shelf software** should also think twice. These organizations usually juggle legacy systems, packaged software, and custom solutions. Since vendors of third-party software like ERP systems often provide deployment and management guidance, adding platform engineering might complicate things unnecessarily, increasing both complexity and costs. Some **companies have products that hardly change**. For them, adding the speed and flexibility of platform engineering is unnecessary—like putting a race car engine in a horse carriage. If your product updates are infrequent and slow, you likely won't benefit from the extra agility platform engineering offers. Then there are **companies that stick to the traditional waterfall model for software development**. For these businesses, the agile and iterative approach of platform engineering might seem incompatible. If the structured, step-by-step process of the waterfall model has always worked for you, you might not see the value in adopting platform engineering. Who should say YES to platform engineering? ------------------------------------------- **Speed Wins in Tech-Rich Fields** In the fast-paced worlds of consumer tech, gaming, or SaaS, being quick to market is everything. Getting a new feature out first can skyrocket a company to the top, attracting users and boosting profits. Platform engineering is the secret sauce to this speed, making it easier and faster to launch new stuff. Here, being fast isn’t just nice—it’s essential to staying ahead. **SaaS and the Importance of Behind-the-Scenes Details** If you’re running a SaaS company, things like how much the service costs to run, keeping data safe, and following rules are super important. If these aren’t handled well, especially as the company grows, it can lead to big problems. Platform engineering helps manage these tricky parts, making sure customers are happy and the company makes more money. It’s all about being quick and efficient. **Everyone on the Same Page: Big Companies Need Standardization** Big companies often struggle with making sure all their tech teams are doing things the same way. When every team has its own playbook, things can get messy. Platform engineering brings everyone together, offering a [standard set of rules](https://blog.facets.cloud/standardization-in-security-cost-compliance-and-observability/) that works no matter where a team is or what it’s working on. **Filling the Skills Gap** Finding people who know how to build and manage cloud systems is tough. But what if your current developers could do it all? Platform engineering makes this possible by simplifying the tricky parts of cloud work and getting rid of repetitive tasks. This means you don’t have to hunt for hard-to-find specialists, and your team becomes more skilled and flexible. In sum, for certain businesses, platform engineering is not just another tech trend. It’s a crucial strategy for staying quick, efficient, and united. It’s about making sure your team can do more, and do it better, without constantly searching for more people to hire. ### Ready to jump the fence? Before jumping into [platform engineering](https://blog.facets.cloud/platform-engineering-a-guide/), take a moment to see the bigger picture. It's easy to follow the crowd, attracted by the promise of faster and more streamlined operations. However, it's essential to consider whether it's the right choice for your organization. Not every company needs the latest and fastest technology; sometimes, reliability and mastering the basics are more crucial. Evaluate where your company currently stands. Do your objectives, existing technology, and team capabilities align with what platform engineering can offer? If they do, platform engineering could significantly accelerate your projects. If not, you might benefit more from improving your current setup or investing in your team's skills development. Understanding your company's specific needs and growth strategies is key. In conclusion, choosing [platform engineering](https://thenewstack.io/platform-engineering-yes-no-a-guide-to-making-the-call/) should be more than just following a trend. It's about thoroughly assessing its fit with your company's goals and plans. Whether you decide to adopt it or focus on enhancing your existing processes and team skills, the goal is to make a deliberate choice. Informed decisions are what set you on a path to genuine growth and success in the tech landscape. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Efficiency by Design: Your Secret to Cloud Cost Optimization Author: Rohit Raveendran Published: 2024-03-27 Category: Blogs Meta Title: Efficiency by Design: Your Secret to Cloud Cost Optimization Meta Description: Overcome endless cloud cost rises with a proactive, design-first approach. Leverage platform engineering for smarter, more efficient cloud management and cost visibility. Tags: platform engineering, cloud cost optimization URL: https://blog.facets.cloud/cloud-cost-optimization-efficiency-by-design ![Cloud Cost Optimization](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/7005266-1715252546976-compressed.jpg) Endlessly rising [cloud costs](https://blog.facets.cloud/cloud-cost-optimization-by-design/), a problem many businesses know all too well. Trying to cut these costs with standard audits and tools often leads to only short-term relief, without solving the underlying issue. Think of audits as regular health checks—they're helpful, but they don't cure the fundamental problems causing costs to spiral. It's time for a change in tactics. The solution lies in a proactive, ground-up approach to managing cloud costs. Instead of looking for quick fixes, the aim is to build an efficient system from the start, focusing on preventing problems before they arise. Platform engineering shines here, offering smarter, more efficient cloud management techniques. It's about navigating through cloud costs with greater accuracy and planning. Over the past decade, the cloud has made it easier for developers to deploy and scale applications, thanks to giants like [AWS](https://aws.amazon.com/), [Azure](https://azure.microsoft.com/en-in) and [GCP](https://cloud.google.com/). However, as businesses grow, cloud expenses can surge, sometimes eating up to 50% of total revenue. For most Saas applications, crossing the 7%-10% mark means re-evaluation of cloud spending, leading to efforts to optimize and manage costs. Fortunately, new tools and platforms are emerging to address these challenges. Solutions from cloud providers and other platforms like Zesty, Apptio, and Harness are making it easier to see and optimize costs. This piece explores how adopting a design-first approach can transform cost visibility and management in the cloud, offering a strategic way to tackle cloud expenses effectively. The Band-Aids: Addressing Cloud Costs Attempted Approaches ---------------------------------------------------------- When organizations see their cloud costs climbing, they typically turn to different strategies to get them under control. But these methods are usually reactive, aimed at cutting back on current expenses rather than stopping high costs before they start. Let's simplify these common tactics: **Strategy** **Description** **Audit** Utilizes continuous audit cycles to identify and address areas of excessive spending. **Manual Oversight** A centralized team reviews cost dashboards to pinpoint and inform departments of overspending. **Project Tracker** Implements a tracking system to monitor cost reduction efforts and keep stakeholders informed. **Tools & Anomaly Detection** Employs specialized tools for better cost analysis and anomaly detection, some allowing automated responses. **Ops Team Responsibility** Assigns the operations team the task of managing costs, adding to their already heavy workload. Although these strategies can provide quick relief by trimming unnecessary costs, they don't create a cost-effective system from the start. This results in a pattern of temporary solutions rather than a lasting fix.  While traditional ways of managing cloud costs are useful, they highlight the importance of moving towards a more proactive approach. By adopting platform engineering principles early on, organizations can lay the groundwork for enduring cost efficiency. This shifts the focus from short-term fixes to a long-term strategy centered on prevention and sustainability. The New Era of Platform Engineering ----------------------------------- Optimizing cloud costs from the start is a forward-looking strategy that reduces cloud expenses efficiently as your cloud usage grows. This approach avoids the need for frequent, complex adjustments. By incorporating platform engineering, companies can combine cutting-edge cloud development with smart spending. Platform engineering leads the way in smart cloud cost management. It provides a systematic way to handle cloud expenses effectively and sustainably. Mixing platform engineering methods with cost-saving tactics allows businesses to balance innovation with spending wisely in their cloud activities. Here’s how the integration of platform engineering principles enhances cloud optimization by design: ### **Reducing Unnecessary Costs** Tackling the issue of resources that remain idle, resulting in needless expenditure. * Employ automation tools, such as [Facets](https://www.facets.cloud/), to efficiently identify and eliminate these unused resources. Enforcing smart usage policies for resources like S3 buckets can also significantly cut down on waste. ### **Effective Tagging for Cost Management** The practice of using tags to systematically track and manage cloud spending, improving both transparency and accountability. * Encourage developers to adopt thorough tagging practices, enhancing their awareness of costs. Using Facets as an illustrative tool can assist in providing precise tagging at the micro-service level, offering detailed visibility into the financial impact of each project. ### **Optimizing Non-production Environments** Addressing the high costs associated with development and testing environments, crucial for operations but often a source of inflated cloud bills. * Implement strategies like creating ephemeral clusters for targeted testing, optimizing resource use through application compaction, and leveraging spot instances to meet variable demands, all of which can lead to substantial savings. Facets plays a pivotal role in streamlining these tactics by simplifying resource management in non-production settings. ### **Using Spot Instances Wisely** The adoption of spot instances, which come at a lower cost compared to traditional on-demand instances but with added unpredictability. * Carefully incorporate spot instances into your cloud infrastructure, weighing the financial benefits against potential operational risks. Facets can guide this process by providing mechanisms to ease the transition and manage the balance between cost-efficiency and system reliability. This strategy encourages a culture where developers are conscious of their cloud usage and its financial implications. By embracing platform engineering principles, businesses can achieve a cost-optimized cloud environment that supports both innovation and cost savings, ensuring sustainable cloud operations in the long run. Unlocking Sustainable Cloud Efficiency -------------------------------------- Moving from quick, temporary fixes to thoughtful, proactive planning is crucial for better cloud cost management. Central to this change is platform engineering, which encourages designing cost-effective cloud setups from the beginning. This approach helps address current spending issues and supports efficient cloud usage in the long run. The key to effective cloud cost management is forward planning and the use of right tools. By using [Facets.cloud](https://www.facets.cloud/) and adopting platform engineering principles, businesses can build powerful, yet cost-efficient, cloud solutions. This strategy ensures that companies can leverage cloud technology's advantages without overspending, leading the way to a future where innovation and affordability are aligned. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Crafting the Ideal Development Environment for Optimal Software Development Author: Anshul Sao Published: 2024-03-21 Category: Tech Articles Meta Title: Improve Dev Experience with Specialized Environments Meta Description: This blog post guides you through developer environment setup strategies - shared development, multiple isolated, local environments - balancing cloud costs, team collaboration, and coding flexibility for efficient software development. Tags: Developer experience, Developer environments URL: https://blog.facets.cloud/ideal-development-environment-for-optimal-software-development Creating excellent software begins with a crucial yet often unseen phase: repeated tests in local environments. This stage allows teams to test their projects in a risk-free zone, safeguarding shared environments like integration test environments or production environments.  Our discussions with over 200 companies revealed a gap: while many have test and quality assurance setups, Developer Playground Environments are less common. These playgrounds are essential as they offer developers a secure space to freely experiment with new ideas without affecting either the shared test environments or the live product. Our findings highlight two key needs: * **Personal or Team-Specific Environments:** Developers and their teams want their own private spaces to test and refine new or experimental features freely. These personal sandboxes would enable quick innovation and idea improvement without limitations. * **Feature-Specific Environments:** There's a high demand for environments designed around specific features being developed. These settings would allow for the parallel testing of various features, enabling teams to independently assess and tweak them without affecting each other's work. The Developer Playground stands as a bridge between theory and practice, a sandbox for innovation that is distinct from where final products are crafted. This separation fosters a culture of creativity and continuous improvement, resulting in software that is both functional and cutting-edge. However, establishing these playgrounds for complex or extensive projects on local systems can be challenging, underscoring the need for scalable and flexible solutions. Recognizing developer needs is critical as inadequate development environments can lead to project delays, diminished quality, and increased costs. We will examine four strategies to create these environments, evaluating their costs, advantages, and optimal applications. This ensures developers have the necessary tools for creativity and efficiency, aiming for the highest quality in software development. Options for Setting Up Developer Environments --------------------------------------------- Organizations facing common development environment challenges have several options to mitigate these issues and manage costs. They can choose from or combine the following approaches: * **Shared Developer Environment with Telepresence** * Use Case: Ideal for debugging integration issues and implementing patches. It eliminates the need for local dependency runs. * Key Features: * Common cloud-based environment for all developers. * Telepresence allows developers to reroute shared environment traffic to their local setup for fast editing and deployment. * Considerations: * Challenging for testing significant feature overhauls with backward incompatibilities. * ​[Cloud costs](https://blog.facets.cloud/cloud-cost-optimization-by-design/) are controlled since one environment supports multiple developers. ** ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1711012790050-compressed.png) ** * **Multiple Isolated Environments** * Use Case: Best suited for parallel testing of large or complex feature rollouts. * Key Features: * Developers or small feature teams can create fully isolated cloud environments for independent testing. * Considerations: * Developers need the ability to select and set up the necessary subset of services. * Potential for increased costs due to multiple environments. ** ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1711012792290-compressed.png) ** * **Local Environments** * Use Case: Recommended during early development stages with fewer services and simpler databases. * Key Features: * Developers replicate a production-like environment locally using deployment mechanisms or Vagrant boxes. * Necessary services and databases are bundled; remote or mocked cloud services for those that can't be bundled. * Considerations: * Setup can be complex but offers a more immersive development experience. * Resource limitations on local machines can hinder productivity. * Regular updates are necessary to prevent drift from cloud environments. ** ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1711012794375-compressed.png) ** * **Partially Isolated Environments** * Use Case: Useful for testing stateless microservices in development by multiple teams. * Key Features: * Shared cloud environment utilizing request-header manipulation (e.g., Envoy) for version control of services. * Utilizes a sidecar pattern for isolating services under development. * Considerations: * Header-based routing can complicate system-wide implementation. * Not suitable for all service types, especially those using shared resources or requiring cache invalidation. ** ![undefined](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1711012796539-compressed.png) ** Each of these environments offers unique benefits and challenges, allowing organizations to tailor their development infrastructure to meet specific needs while managing costs and maintaining productivity, summarized in the table below. ### **Comparison of Developer Environment Strategies** **Factor/Environment** **Shared Development Environment** **Multiple Isolated Environments** **Partially Isolated Environments** **Local Environments** Drift Management Simple Simple Difficult Most Difficult Number of Environments One per Team One per Developer One per Team One per Developer Setup Complexity Complex Easy Complex Easiest Cloud Cost Moderate ($$) High ($$$$) Moderate ($$) Low ($) Setting up the right environments for developers is all about finding the right balance. Shared and Multiple Isolated Environments make syncing with production easy, saving time and effort. However, keeping Partially Isolated and Local Environments consistent requires more work. Deciding between shared or personal spaces impacts teamwork and costs. Shared environments boost team collaboration but may restrict solo experiments. Personal cloud spaces offer freedom but can be costly, while local setups are affordable and simple to start but need regular updates to be effective. The ideal choice hinges on a company's priorities—whether it's fostering teamwork, saving money, or providing developers with their own workspaces. Finding the Right-Fit Developer Environment Strategies ------------------------------------------------------ Exploring different strategies for developer environments reveals there's no one-size-fits-all answer. Options range from cost-effective Shared Development Environments, which save money, to Multiple Isolated Environments that offer developers more freedom. Local Environments are budget-friendly and easy to set up but need constant updates to stay current. Partially Isolated Environments offer a middle ground, providing some privacy without the high costs of complete separation. Ultimately, picking the right environment goes beyond just tech; it's about matching the team's workflow and the company's budget. It involves carefully balancing developers' independence with the team's cohesion. By understanding each option's advantages and drawbacks, organizations can craft a development space that fosters both innovation and efficiency. This ensures teams have what they need to create exceptional software that delights users. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Mastering Cloud Flexibility with Dynamic Cloud Interoperability (DCI) Author: Rohit Raveendran Published: 2024-03-14 Category: Tech Articles Meta Description: Leverage Dynamic Cloud Interoperability for seamless cloud migration between AWS, Azure, GCP - automated documentation, zero downtime, consistent environments, enabling agile multi-cloud strategies. Tags: cloud interoperability, cloud strategy URL: https://blog.facets.cloud/mastering-cloud-flexibility-with-dynamic-cloud-interoperability ![Dynamic Cloud Interoperability](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/57838-1715255128814-compressed.jpg) ​[Platform engineering](https://blog.facets.cloud/next-in-devops-a-user-centric-platform-engineering-approach/) is revolutionizing the way companies create and manage software. It's making them nimble, ready to adapt to market shifts, and primed to leverage cloud technology. This strategy is key in navigating the intricate digital environment and seizing fresh opportunities. Cloud portability has become essential, allowing organizations to easily switch between cloud providers or use [multiple clouds](https://www.facets.cloud/multi-cloud-infrastructure?source=blog). This flexibility is crucial for meeting customer needs, entering new markets, adhering to legal rules, and cutting [costs](https://blog.facets.cloud/cloud-cost-optimization-by-design/)—all key to staying competitive. Many companies start by partnering with one cloud provider, deeply integrating their system to use specific features and improve efficiency. However, this can lead to a dependency on that single provider, making it hard to switch to another provider or use additional cloud services later on. Being locked into one vendor can limit a company's growth and its ability to adapt to new tech changes. Cloud portability is vital for avoiding these issues, giving businesses the ability to stay flexible and responsive. By focusing on adaptability, companies can remain competitive and ready for new technological advances. **The Challenges of Cloud Portability** --------------------------------------- As companies expand, the idea of cloud portability—switching between cloud providers— becomes appealing due to its advantages. However, switching between cloud providers can present several challenges. Here are the main obstacles encountered during these transitions: * **Outdated Documentation**: The journey begins with needing current documentation, which is often missing. This discrepancy makes starting tough, as the existing setup might not align with what's documented. * **Learning New Skills**: Transitioning to a different cloud requires the team to learn new things, taking time away from regular duties. This learning period can lead to setups that aren't as efficient, due to unfamiliarity with the new environment. * **Rewriting Automation**: Scripts designed for one cloud usually need changes to work on another, a task that can get complicated. * **Pausing Development**: Updating automation for the new cloud can slow down or even halt development, potentially causing project delays. * **Managing Environment Drifts**: Moving from one cloud setup to another can create confusion and inconsistency, particularly as applications evolve. * **Training Teams**: Getting used to a new cloud means mastering new tools and practices, which can temporarily lower productivity and prolong the transition phase. Preparing for and understanding these challenges is crucial for a smooth shift to using multiple clouds, allowing businesses to fully benefit from cloud portability without the drawbacks. How Facets Simplifies Cloud Switching with Dynamic Cloud Interoperability ------------------------------------------------------------------------- Dynamic Cloud Interoperability ([DCI](https://blog.facets.cloud/dynamic-cloud-interoperability-redefining-cloud-agnosticism/)) is a breakthrough approach to easing [cloud migration](https://www.facets.cloud/cloud-migration?source=blog) challenges. It makes switching between cloud services like [AWS](https://aws.amazon.com/), [Azure](https://azure.microsoft.com/en-in), or [GCP](https://cloud.google.com/) much easier. It adds a layer that simplifies moving services, such as databases, between these platforms. This way, you can use different cloud services without redoing your application work, getting the best from each cloud without the usual difficulties. ### DCI makes cloud migration easier by: * **Clearer Documentation**: DCI provides easy-to-understand documentation for any cloud, simplifying the migration process. * **Less Learning Required**: It automatically incorporates best practices, so teams don't have to learn everything anew for each cloud. * **Automatic Automation Updates**: DCI updates your automation setups for you, removing the need for manual changes during cloud transitions. * **No Development Pauses**: Development can continue without interruption, as DCI ensures continuous delivery across different clouds. * **Stable Environments**: It maintains consistency in your cloud environments, preventing issues that arise from moving between clouds. * **Easier for Developers**: DCI's user-friendly interface means developers don't need a lot of retraining to adapt to new clouds. Best Practices for Using Cloud Services with DCI ------------------------------------------------ When using cloud services with Dynamic Cloud Interoperability (DCI), we follow a simple approach to deciding on our cloud service usage, categorizing choices into "blue cloud" for straightforward decisions and "gray cloud" for more complex ones.  For standard cloud services, we recommend sticking to basic services, such as AWS's Aurora, to facilitate easy migration without relying heavily on special features—this is our "blue cloud" approach. However, for "gray cloud" scenarios, using unique features may complicate transitions to other clouds.  When it comes to popular cloud features, opting for widely-used services like AWS S3, along with cloud-agnostic tools, ensures ease of use across different platforms. This falls under "blue cloud" as well. But, deeply integrating these services without flexibility falls into the "grey cloud" category, as it can hinder migrations.  Lastly, for unique cloud capabilities, we suggest wrapping special features in a flexible manner, such as using S3 select, to keep your migration options open and ensure seamless adaptation to other clouds when necessary. Finding the Balance in Cloud Strategy ------------------------------------- ** ![Dynamic Cloud Interoperability](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1710420795297-compressed.png) ** Being cloud-agnostic is about quickly adapting to new clouds, not just using many or switching without reason. It allows you to plan ahead without investing in tools for other clouds immediately. You can still enjoy each cloud's unique perks by managing your setup wisely. A good cloud strategy leverages the best each cloud has to offer while being ready to switch if needed. It involves choosing widely used services, adding flexible layers, simplifying processes, and employing automation tools. This approach ensures your cloud setup is robust yet adaptable to changes. In essence, smart cloud usage means using the best features of each provider while staying ready to change as needed. This strategy ensures you benefit from the cloud today and keeps you open to future opportunities. Achieving this balance is crucial for creating a cloud environment that's both powerful and flexible, prepared for the future. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Enhancing Software Development with Integrated Observability by Design in SDLC Author: Anshul Sao Published: 2024-03-06 Category: Blogs Meta Description: Learn how integrating observability components into every phase of the Software Development Life Cycle (SDLC) enhances software quality, reliability, and performance, while fostering a culture of continuous improvement and innovation. Tags: Observability, SDLC URL: https://blog.facets.cloud/enhancing-software-development-with-integrated-observability-by-design-in-sdlc Observability is key to keeping systems running smoothly, especially in software development. It gives developers deep insights, allowing them to solve problems early, much like a doctor diagnoses an illness by its symptoms. Observability keeps digital systems resilient and efficient by monitoring signs that indicate their health. Leaders in the field, such as Newrelic, Datadog, and standards like OpenTelemetry, have transformed how monitoring is done. They offer sophisticated tools that capture everything from code behavior to overall infrastructure health. These tools help developers collect detailed data, store it efficiently, and use advanced visuals to understand it better. This not only makes diagnosing problems easier but also helps improve system performance and reliability. The process involves three key steps: adding measurement tools to the system, storing data in a way that's easy to manage, and using visual tools for a clear view of the system's state. This methodical approach allows developers to keep a close eye on software, analyze data effectively, and make decisions that enhance stability and performance. After Ops teams set things up, developers personalize observability with dashboards, alerts, and debugging tools. However, several challenges persist: * **Quality Assurance:** It's crucial to rigorously test observability components to ensure they're reliable. * **Deployment Consistency:** It's important to make sure that no data points or alerts are overlooked in various deployments. * **Organizational Standardization:** There's a need for a uniform observability approach to avoid differences based on personal or team preferences. * **Component Discoverability:** Establishing a central place for easy access to observability tools is essential for promoting efficiency and teamwork. Tackling these issues is key to enhancing the effectiveness of observability, thereby improving system stability and operational performance. ** Spotting Observability Gaps and Blind Spots** Identifying weaknesses is key to a robust observability framework. Signs of trouble include unreliable alerts, difficulties with analyzing incidents, and ongoing issues within teams. Fixing these problems enhances both system management and reliability. These challenges often reveal deeper issues: * **Unreliable Alerts:** When our alert system is quiet, it might not be a sign of stability but a warning of missing data or oversight. Properly identifying what signals to track is vital to ensure alerts truly represent the system's condition. * **Unclear RCA:** Struggling to find the root cause after an incident suggests we might be missing crucial data or not have enough tools in place. If investigations frequently hit a dead end, it's a sign that standardization of observability components might need a boost. * **Recurrence of similar issues:** When the same problems keep happening across different teams, it points to a gap in how we share and apply knowledge. This suggests a need to improve how once an issue is fixed stays fixed Our analysis of the outlined challenges has led us to an insightful conclusion. The difficulties we face stem not from the tools at our disposal but from the struggle to maintain consistent configurations across deployments. Recognizing this, we've shifted our focus towards a more streamlined approach. We now prioritize the delivery of observability artifacts over the traditional method of manual configuration. This change in strategy is designed to bypass the inherent complexities of setting up each tool individually. By doing so, we ensure that our observability framework is both consistent and dependable, enhancing the overall reliability of our systems. Embrace the Future: By-Design Observability  -------------------------------------------- **How do we do it at Facets?** At [Facets](https://www.facets.cloud/), we meticulously shape our processes to ensure consistent success throughout the Software Development Life Cycle (SDLC). Our strategy places a strong emphasis on observability, treating it as essential to deployment, just as critical as release artifacts. This principle guarantees that observability is woven into the fabric of our development process from the start. * **Integration of Observability Components:** * **Across All Development Phases:** Embedding these elements at each stage ensures seamless accessibility and deployment. * **Boosting Reliability and Efficiency:** Our systems benefit from increased reliability and efficiency, reflecting our dedication to quality. Our approach reflects a deep commitment to maintaining and enhancing software health proactively. By integrating observability components as fundamental elements of the deployment process, we: * **Ensure Smooth Deployment:** Observability tools are readily available for easy integration. * **Showcase Our Commitment to Excellence:** This method demonstrates our ongoing dedication to the health and performance of our software from the outset. ### Let’s see how at each stage of SDLC we can add observability: **Plan Phase:** We start by setting clear Service Level Objectives (SLOs) and Service Level Agreements (SLAs) to outline performance goals. Product teams also define key business metrics to track how well a new feature is being adopted and performing. Technical leads go deeper, pinpointing specific metrics like API or database performance for a detailed analysis of the feature's success. **Develop Phase:** We embrace "metric discovery," a game-changing feature from the Open Metrics project, making it easy to automatically find and collect metrics in a unified way. Applications need to include metadata for this, like in helm charts, to simplify metric setup. We also integrate visualizations and alerts, using tools like [Grafana](https://blog.facets.cloud/simplifying-log-management-with-grafana-loki-and-facets/) for dashboards and Prometheus for alerts, making observability proactive from the start. **Continuous Integration Phase:** This phase focuses on ensuring that metrics, dashboards, and alerts align with our high standards. By embedding observability standards into the CI process, like requiring specific metrics for GRPC applications, we ensure a consistent and integrated approach across all developments. **Deploy Phase:** We centralize the rollout of metrics, dashboards, and alerts just like to code to maintain consistency across [environments](https://blog.facets.cloud/improve-dev-experience-with-specialized-environments/). We avoid configuring alerts and dashboards directly as that doesn’t guarantee any consistency. This ongoing process ensures that any updates or enhancements are uniformly applied, keeping our observability framework accurate and effective. **Operate Phase:** Constant monitoring allows us to ensure our observability tools accurately reflect system performance and adhere to benchmarks. This ongoing analysis feeds valuable insights back into our planning and development, creating a cycle of continuous improvement. This not only boosts system reliability but also keeps our [observability](https://blog.facets.cloud/standardization-in-security-cost-compliance-and-observability) practices up to date with system changes. **SDLC Phase** **Observability Actions** **Plan** \- Define SLOs/SLAs & Business Metrics **Develop** \- Configure Metric discovery & Define Metrics, Dashboards, and Alerts **Continuous Integration** \- Review & Refine Metrics, Dashboards, and Alerts **Deploy** \- Automatic Rollouts of Metrics, Dashboards, and Alerts to environments **Operate** \- Analyze Metrics, Generate Feedback & Address Incidents **Crafting the Future of Observability** ---------------------------------------- Integrating observability into the SDLC from the start represents a forward-thinking change in software development. Instead of adding observability later, this method includes it from the beginning. This ensures that teams can use valuable insights throughout development to improve software quality and durability. This approach brings foresight and creativity into the development process. Just as an artist imagines the finished artwork before starting, developers can foresee and prepare for future challenges. Monitoring and analysis become key parts of development, helping continuously improve applications. Additionally, this method encourages ongoing learning and development. Each project learns from the last, leading to better and more innovative solutions. By adopting this mindset, teams make software that not only meets today's needs but is also ready for tomorrow's challenges, pushing technology forward with each update. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## DevOps Evolution: From By (Reactive) Audits to By (Proactive) Design Author: Pravanjan Choudhury Published: 2024-02-22 Category: Tech Articles Meta Title: DevOps Evolution: From By-Audits to By-Design Meta Description: Transition to proactive 'By-Design' for efficient, secure, and cost-effective cloud operations, ensuring strategic planning and best practices from the start for lasting success in cloud development. Tags: devops, Proactive DevOps, DevOps Impementation URL: https://blog.facets.cloud/devops-evolution-from-by-reactive-audits-to-by-proactive-design DevOps has often been a step behind, jumping into action only when problems pop up. This method, while useful in some situations, tends to leave DevOps teams in a constant state of firefighting. Even today, the focus remains on fixing issues as they occur, a cycle of endless troubleshooting and tweaking. However, there's a growing need for DevOps to evolve, to anticipate issues before they happen, moving beyond the traditional fix-it-when-it-breaks mindset. Enter [platform engineering](https://blog.facets.cloud/next-in-devops-a-user-centric-platform-engineering-approach/), a forward-looking strategy that promises more adaptability and insight, marking a significant shift in how DevOps operate. In working with various companies, we've noticed a common trend of hefty cloud service bills, often hitting millions of dollars, leading to unnecessary overspending. Companies are now looking for ways to cut these costs, which, when added to security, compliance, and monitoring expenses, can significantly inflate the budget. The reliance on cloud services over the last decade has made this a widespread problem. Here's a thought: why not address these issues head-on before they escalate? Adopting a proactive stance could greatly reduce inefficiencies, minimizing the endless cycle of audits and cost-cutting. This approach not only makes operations smoother but also supports a smarter, more sustainable future for cloud computing and DevOps. We've identified two main strategies: the reactive 'By audit' and the proactive 'By design'. As we explore further, we'll see how platform engineering offers a promising way for DevOps to refine its approach, signaling a major transformation in our technological landscape. **The ‘By-Audit’ approach** --------------------------- The 'By-Audit' approach is a reactive method used in important tech areas like [cloud cost management](https://readme.facets.cloud/docs/cloud-cost-explorer), compliance, security, disaster recovery, and monitoring. This method often results in repeated work and inefficiency. We will break down each aspect to understand this better: * **Cloud Cost** * Current Approach: Centralized teams analyze cost reports daily or weekly, then assign and track tasks across engineering teams. * Challenge: This is effort-intensive and, while it trims some excess, it doesn't fundamentally improve efficiency. * **Compliance** * Current Approach: Quarterly audits identify non-compliant teams, who are then tasked with rectification. * Challenge: There's no assurance that the issues won't recur. * **Security** * Current Approach: Regular reports are generated to spot potential misconfigurations. * Challenge: The high number of false alerts not only overwhelms but often causes the root causes to be overlooked. Misconfigurations can originate from multiple sources, adding to the complexity. * **Disaster Recovery** * Current Approach: Frequent disaster recovery drills stem from low confidence in backup systems or recovery playbooks. * Challenge: With rapidly evolving systems, static recovery playbooks become obsolete, indicating a deeper uncertainty and leading to repetitive drills rather than addressing the root issue. * **Monitoring** * Current Approach: Alerts and dashboards are often configured from scratch in [monitoring](https://blog.facets.cloud/simplifying-log-management-with-grafana-loki-and-facets/) tools. * Challenge: This can be overwhelming. For instance, a lack of [alerts](https://blog.facets.cloud/staying-proactive-with-alerts-notifications/) doesn't necessarily mean everything is functioning correctly; it could indicate misconfigured alerts or incomplete coverage. These areas are vital for delivering software successfully but are often treated as secondary concerns rather than core elements of the design process. Imagine the improvement and foresight if these processes were part of a proactive, well-planned strategy from the start. This idea introduces the potential of moving from a 'By-Audit' to a 'By-Design' approach in DevOps, embedding anticipation and efficiency into our tech practices. ​**Embracing the 'By-Design' Approach** --------------------------------------- The 'By-Design' approach emphasizes planning and establishing proven methods from the start. It focuses on creating 'golden paths'—well-tested operational practices that guarantee compliance, security, and other needs are met efficiently. This method involves incorporating best practices and standard procedures early on, making sure everything is set up correctly from the beginning. Take the process of setting up new credentials, for example. Instead of a casual, as-needed approach, 'By-Design' insists on a formal, predefined method for credential requests. This ensures a clear distinction between the purpose of the credentials and how they're implemented, allowing for straightforward validation of both aspects. This systematic approach eliminates the need for constant re-checks. There should be clear rules for how credentials are created, isolated, stored, and updated, and these rules should be applied consistently. By integrating these rules into the initial request process, it's possible to enforce them automatically, reducing the need for manual checks. Starting these practices early in the software development life cycle (SDLC) reduces the need for ongoing audits. Organizations that implement the 'By-Design' philosophy naturally meet compliance, security, and efficiency standards, avoiding the hassle of detailed audits. This strategy does more than just make developers' jobs easier; it leads to significant, lasting improvements across the organization. By embedding critical considerations like cost, security, and compliance into the foundation of processes, it cultivates a culture of proactive planning and foresight. ![By-Audits v/s By-design approach](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/10-2-1708594538055-compressed.jpeg) The table highlights the shift from traditional DevOps, which often reacts to problems, to the proactive 'By-Design' approach that integrates key operations from the start. By planning for compliance, security, disaster recovery, observability, and cost efficiency early in the development process, organizations can streamline operations, enhance security, and save costs. 'By-Design' is a strategic move towards built-in system integrity and excellence in operations. ### The Advantages and Challenges of Both Approaches: ![Challenges with by-design and by-audit approaches to DevOps](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/11-1708594616123-compressed.png) The future belongs to the By Design approach -------------------------------------------- Looking ahead at the next decade of cloud development, the 'By Design' approach stands out as crucial. It integrates best practices and strict standards directly into our cloud infrastructure, creating systems that are efficient, secure, and resilient. This proactive strategy emphasizes optimization and compliance from the start, rather than treating them as afterthoughts. While the 'By Audit' method has effectively solved many issues, the growing cloud environment and increasing need for audit tools call for a 'By Design' strategy for a lasting solution. This method focuses on building a strong cloud ecosystem from the beginning. As cloud computing continues to grow, adopting a 'By Design' approach will lead the way, ensuring we reach our highest potential. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Navigating High-Capacity Log Management: Insights from the 'Loki at Scale' Webinar Author: Rohit Raveendran Published: 2024-02-15 Category: Tech Articles Meta Title: The Guide to 'Loki at Scale' Log Management Meta Description: Capillary Technologies optimized operations with Loki, addressing high-volume logging challenges for efficiency and growth in DevOps. Tags: Garafana Loki , Loki integration , Loki and Facets URL: https://blog.facets.cloud/navigating-high-capacity-log-management-insights-from-the-loki-at-scale-webinar The struggle of high-volume logging is real, and in response to the widespread interest in this topic we recently hosted a webinar titled 'Loki at Scale: Navigating High-Volume Logging Challenges.' Our guest for the event was [Sreejith S](https://www.linkedin.com/in/sreejith-s-26b5a8163/)., DevOps Lead at [Capillary Technologies](https://www.capillarytech.com/).  Managing approximately 1.5 TB of logs per cluster daily, Sreejith played a critical role in overseeing successful Loki adoption at Capillary, in close collaboration with the [Facets](https://www.facets.cloud/our-story) Team. Representing Facets, we were joined by [Rohit Raveendran](https://www.linkedin.com/in/rohit-raveendran-1529b1131?miniProfileUrn=urn%3Ali%3Afs_miniProfile%3AACoAACBcfo4By5z3J9H-qx2GknlNurN7P8hXJKU&lipi=urn%3Ali%3Apage%3Ad_flagship3_search_srp_all%3BtnEjIqpOTb6ngQoK2SNZ0w%3D%3D), Co-founder and VP Engineering, and [Pramodh Ayyappan](https://www.linkedin.com/in/pramodhayyappan?miniProfileUrn=urn%3Ali%3Afs_miniProfile%3AACoAABbfNnkBIrlkCTw3DbMMi83nqYN-HkhHPKk&lipi=urn%3Ali%3Apage%3Ad_flagship3_search_srp_all%3BZrN5ylenRfafF8Fm01R0pA%3D%3D), DevOps Tech Lead. Together, they shared the learnings from implementing Loki. The Challenge - Prelude to Loki Adoption ---------------------------------------- Capillary Technologies was facing challenges with its log management solution. Their log volume was over 1.5 TB logs per day, and they had to retain these for over a year for compliance and daily operations. Compared to the massive logs per day, the search queries for them were almost non-existent – only a couple 100 searches/day.  Adding to this challenge was the need for dedicated engineering manpower just to manage these logs. Their old solution was simple and reliable but with time, it turned out to be expensive and not scalable. With a high volume and low search ratio, their ROI was poor. They were sitting on a data goldmine but were not able to leverage it for any business metrics and alerts.  ![Diagram of traditional log management process ](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1707976867930-compressed.png) Log Management Process Pre-Loki They needed a solution wherein a similar investment would yield higher returns and add to the overall efficiency of the team. The team embarked on an evaluative journey, sifting through options like ELK,  Parseable, and New Relic, and eventually zeroed in on [Loki](https://grafana.com/oss/loki/). The decision hinged on Loki's scalability, cost-effectiveness, ROI, and seamless Grafana integration – all of which aligned perfectly with Capillary’s needs. ![illustrative image for Loki architecture and its several benefits](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1707976871677-compressed.png) Core Differentiator: Loki’s Architecture ---------------------------------------- What sets Loki apart is that it adopts a minimal indexing strategy and focuses on indexing only the labels and not the entire log content. This approach boosts not only the query speeds but it also significantly reduces the storage requirements. Inspired by Prometheus, Loki’s design was scalable and robust, making it a perfect choice for large-scale logging challenges at Capillary. The four key differentiators were: ![](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1707976875947-compressed.png) Implementing Loki - A Meticulous Strategy  Although Loki had all the right things that Sreejith and his team were looking for, the implementation was not that straightforward. The team took a methodical approach when moving from their legacy system to a modern Loki Architecture.  **The approach is twofold:** a technical strategy that advocates for phased deployment, comprehensive load testing, and continuous performance monitoring for optimization; and a change management strategy that prioritizes extensive training, accessible documentation, and an active feedback loop.  **Technical Strategy:** * **Phased Deployment:** There was a need for extensive testing and adjustments to the existing approach. A step-by-step approach was implemented to ensure that Loki’s integration did not impact any of the existing operations. The team was able to tune in the performance under different scenarios.  * **Load Testing:** The team conducted rigorous load testing to understand how Loki performs under different stress levels. These insights were critical for fine-tuning system configurations and ensuring scalability and reliability when fully deployed. * **Monitoring and Optimization:** The team implemented continuous monitoring to track Loki's performance and resource usage in real time. They utilized these metrics to optimize configurations, improving efficiency and reducing costs. **Managing Change in Developer Processes:** * **Comprehensive Training:** A comprehensive training program was developed covering Loki's architecture, features, and best practices. Hands-on sessions were conducted to help developers become familiar with the new system. * **Documentation and Support:** The developers were provided with detailed documentation and support channels. This step ensured that the developers have access to the information and assistance they need, facilitating smoother transition and integration into their workflows. * **Feedback Loop:** Established a feedback loop with developers to gather insights on Loki's implementation challenges and successes. This feedback loop proved invaluable for the continuous improvement and adaptation of both the technical strategy and developer support processes. **Overcoming Implementation Challenges** Transitioning to Loki brought its own set of challenges, each demanding specific resolutions. Issues like rate limiting, ingestor overloads, and S3 API rate limits surfaced during the implementation. The team tackled these through adjustments such as modifying ingestion rates and stream sizes and optimizing query performance.  **Rate Limiting and Ingestor Overloads:** Adjusting ingestion rates and stream sizes was key to managing the load efficiently. This strategic calibration ensured the system could handle high volumes of data without data integrity or latency being compromised, allowing for smooth processing by ingestors. **S3 API Rate Limits**: Addressing this involved implementing caching strategies and query sharding, which effectively reduced the number of calls to the S3 API. This approach not only diminished latency but also significantly improved the system's overall responsiveness to queries. **Query Performance:** Enhancements in query performance were achieved through optimizing how queries were processed and managed. By introducing more efficient data retrieval techniques and optimizing the indexing strategy, we were able to significantly speed up query times, providing faster access to logs. **Collaborative Optimization:** The partnership between Facets and Capillary allowed for a tailor-made optimization of Loki's setup. This co-development effort focused on creating a configuration that specifically addressed the challenges faced, leading to a more efficient, scalable logging solution perfectly suited to the operational requirements. [Watch the Webinar](https://www.facets.cloud/webinar/loki-at-scale-navigating-high-volume-logging-challenges) ### The Business Impact of Implementing Loki Adopting Loki marked a significant improvement in log management and also their efficiency in troubleshooting. Developers gained faster access to logs, accelerating issue resolution. The Loki architecture made it simple and easy to extract actionable business metrics from logs, thereby enhancing overall data analytics capabilities. From the many business impacts, here are a few that stand out:  **More Efficient Log Management**: Loki made log management more efficient while handling large volumes of log data, reducing the time and effort required for log processing and management. **Accelerated Issue Resolution:** Developers saw a significant reduction in the time taken to access logs. Faster access meant quicker identification and resolution of issues, leading to reduced downtime and enhanced system reliability. **Enhanced Analytical Capabilities:**  By extracting valuable metrics from logs, Loki has empowered Capillary Technologies to delve deeper into their data analytics, offering a clearer understanding of system performance, user behavior, and potential areas for optimization. **Scalability and Flexibility:** The adoption of Loki brought scalability and flexibility to Capillary's logging infrastructure. This flexibility is crucial in managing varying log volumes and ensures that the system can scale up efficiently as the organization grows. **Cost-Effectiveness:** The minimalist indexing strategy, coupled with efficient storage, translates to lower storage costs, making it a financially viable solution for large-scale log management.​ This has set a new benchmark in their log management approach, driving operational excellence and supporting business growth. Check out some of the key takeaways from the implementation: ![5 key takeaways from implementing Loki architecture ](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/building-a-successful-business-3-1707978457860-compressed.png) **In Conclusion:** In summary, the "Loki at Scale" webinar was not just an exploration of a tool but a broader narrative on overcoming the complexities of high-volume logging. One size doesn't fit all, and extensive domain knowledge is required to tackle the challenges. For professionals in DevOps and [platform engineering](https://blog.facets.cloud/next-in-devops-a-user-centric-platform-engineering-approach/), embracing tools like Loki isn't just about keeping pace with technology—it's about leveraging it to drive operational excellence and business growth. If you'd like to watch the full webinar and get the in-depth details, here's a link for you.  [Watch Full Webinar](https://www.facets.cloud/webinar/loki-at-scale-navigating-high-volume-logging-challenges) --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Stop firefighting with your DevOps Implementation. Here’s a 10-point checklist Author: Pravanjan Choudhury Published: 2024-02-13 Category: Blogs Meta Description: Optimize your DevOps implementation with a 10-point checklist. Address issues, streamline processes, and enhance collaboration for efficient software delivery. Tags: DevOps Challenges, DevOps Impementation URL: https://blog.facets.cloud/stop-firefighting-with-your-devops-implementation Stop firefighting with your DevOps Implementation. Here’s a 10-point checklist to make sure. Contents * [Warning signs to gauge the completeness of your DevOps setup](#warning-signs-to-gauge-the-completeness-of-your-devops-setup) * [What to Keep in Mind While Streamlining your DevOps Implementation?](#what-to-keep-in-mind-while-streamlining-your-devops-implementation) * [So, how do you stack up?](#so-how-do-you-stack-up) In the course of building out [Facets](https://www.facets.cloud/?source=blog), we’ve talked to over 300 organizations, and very few can say they have fully streamlined [DevOps processes](https://blog.facets.cloud/how-to-clear-your-devops-backlog/). How does your organization compare? Are your planned stories being overshadowed by an ever-growing backlog filled with urgent, unplanned stories? The challenge with DevOps is that often, systematic issues aren't obvious until they start negatively affecting your software delivery. Warning signs to gauge the completeness of your DevOps setup ------------------------------------------------------------ ![a mind map highlighting the warning signs in devops setup](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/image-cp-1707819164695-compressed.png) ### 1\. Stability Issues in Cloud Environments * Frequent Outages: Regular SLA breaches often stem from misconfigurations in the cloud environment, indicating a need for more thorough automation in the CI/CD pipeline. * Poor Release Confidence: Excessive manual interventions to ensure stability suggest underlying inconsistencies in the deployment process. * Difficulty in Creating New Environments: Challenges in launching new environments point to a lack of centralized knowledge and documentation, increasing the risk of issues in production. ### 2\. Productivity Challenges * Low Developer Productivity: Frequent blocks during collaboration with DevOps teams, such as during code releases or configuration changes, indicate productivity issues. * DevOps Team Burnout: High pressure and the need for constant adaptation can lead to burnout, affecting both individual well-being and organizational efficiency. * Over-Reliance on Ticket-Based Operations: Excessive dependency on ticket systems for communication between developers and DevOps teams can cause delays and bottlenecks in the SDLC. ### 3\. Organizational Risks Due to Inadequate DevOps Practices * **Incomplete Security Posture**: Last-minute fixes during compliance audits reveal vulnerabilities, necessitating a more robust DevOps strategy with better security mechanisms. * **Bloated Cloud Costs**: Frequent cost audits indicate an incomplete approach to DevOps, highlighting the need for a design-first, cost-optimized strategy. * **Business Continuity Risks**: The discrepancy between claimed and actual disaster recovery capabilities exposes organizations to significant risks, especially during cloud provider outages. If you notice any of these signs in your organization (which I can bet, you will) your DevOps setup needs to be audited for seamless implementation. The primary goal of DevOps is to improve collaboration and communication among development and operations teams, resulting in the increase of speed and efficiency of software delivery, and an improvement in the quality and reliability of software products and services. Since the function by definition involves collaboration across teams there are multiple aspects that need to be addressed while streamlining DevOps implementation. What to Keep in Mind While Streamlining Your DevOps Implementation? ------------------------------------------------------------------- Here’s a 10-point checklist across multiple tenets of a successful DevOps implementation. Something that you can use as guiding principles for your organization as well. **1\. Align Business Goals Across Teams:** Create a unified strategy by aligning the objectives and priorities of various departments, ensuring that development, operations, and business teams work cohesively towards common goals. **2\. Understand and Manage Software Lifecycle:** Maintain a clear understanding of the various stages involved in software development, from inception to deployment and maintenance, and manage these stages effectively to optimize the software lifecycle. **3\. Accelerate from Concept to Product:** Focus on streamlining the process from initial idea generation to the delivery of a functional product, reducing the time it takes to turn concepts into usable software. **4\. Continuously Improve Processes:** Engage in ongoing evaluation and enhancement of work processes, employing strategies such as agile methodologies to increase efficiency, effectiveness, and adaptability in the workflow. **5\. Provide Robust Development Environments:** Equip developers with environments that closely mirror the actual production setting, enabling accurate testing and smoother transitions from development to production, thereby reducing deployment risks. **6\. Implement Scalable Cloud Solutions and Microservices Architecture:** Embrace cloud computing solutions for scalability and flexibility. Adopt a microservices architecture to break down the application into smaller, independently deployable services, each running in its own process and communicating with lightweight mechanisms. This approach allows for easier scalability, quicker deployment, and more effective fault isolation. **7\. Integrate Automated Testing and Continuous Integration/Delivery:** Incorporate automated testing into the development process for consistent and reliable results; utilize continuous integration (CI) to merge code changes regularly, and continuous delivery (CD) to automate the deployment process, ensuring that software is always in a release-ready state. **8\. Emphasize Monitoring and System Health:** Implement comprehensive monitoring tools to continuously track system health, performance, and potential issues, enabling proactive identification and resolution of problems. **9\. Adopt Infrastructure-as-Code and Containerization:** Manage and provision [infrastructure using code](https://www.facets.cloud/no-code-infrastructure-automation?source=blog) for consistent, repeatable environments, and leverage container technology for efficient, scalable, and isolated application deployment. **10\. Document, Track, and Manage Systematically:** Maintain detailed documentation of all operational procedures, changes, and configurations. Utilize a change management system to systematically track and audit modifications across the DevOps process. Employ checklists to ensure thorough process adherence and consistency, aiding in maintaining high standards of quality and reliability. ### So, how do you stack up? A thorough audit of your DevOps setup is crucial if you identify any warning signs of inefficiency or instability. The checklist offers a strategic framework to enhance collaboration, streamline processes, and improve software delivery. By implementing these principles, you can achieve more efficient, reliable, and high-quality software development, overcoming common obstacles in DevOps practices. The approach not only addresses technical aspects but also encourages a culture of continuous improvement and cross-functional teamwork. Here’s to better implemented DevOps, all across! --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Facets’ Kubernetes Copilot: AI-driven debugging superpowers for your team Author: Anshul Sao Published: 2024-02-07 Category: Product News Meta Title: Facets’ Kubernetes Copilot Meta Description: Kubernetes Copilot, enhanced with K8sGPT and conversational AI, not only elevates your debugging experience but also empowers you to excel in the complex cloud-native landscape. Tags: infrastructure management , facets.cloud, Developer Productivity , self service infrastructure, Kubernetes URL: https://blog.facets.cloud/kubernetes-copilot Picture this: It's a typical Thursday afternoon. You've been working intensely and are about to step out for a much-needed coffee break. However, a service stopped working and you're halted by a daunting task: interacting with multiple Kubernetes commands to check logs, events, and related services.  This complex process can be time-consuming, often involving navigating through a maze of UIs and intricate Kubernetes commands. But what if you could handle this with just a conversation with your Kubernetes Co-pilot? Imagine focusing on the problem at hand rather than getting bogged down in technical complexities. ![Kubernetes Troubleshooting](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/debugging-kubernetes-1707386304796-compressed.png) That’s why we built “**Kubernetes Co-pilot**”. It leverages the power of [K8sgpt](https://k8sgpt.ai/) and conversational AI to transform how you interact with Kubernetes clusters. This AI-driven interface simplifies Kubernetes management and troubleshooting. It's like having a conversational assistant that understands the ins and outs of your Kubernetes environment, enabling you to command and control with ease. A New, AI-driven Debugging Experience ------------------------------------- ![Analyzing clusters and solving Kubernetes errors with Kubernetes Copilot](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/k8s-copilot-1707387025933-original.gif) Analyzing clusters and solving Kubernetes errors with Kubernetes Copilot Kubernetes Copilot is an AI-driven interface that streamlines Kubernetes management and troubleshooting. It uses K8sgpt's ability to scan Kubernetes clusters, pull out relevant information, diagnose issues, and present errors and their step-by-step solutions in plain English.  Conversational AI to make Kubernetes monitoring easier ------------------------------------------------------ **Log Management**: It eliminates the need to manually query logs. One quick request like “Show me the top 10 logs for service X” will do the work for you. **Event Monitoring:** Kubernetes Copilot provides real-time insights into the health and performance of your Kubernetes environment by staying on top of your cluster’s events. **Resource and Networking Mastery using conversational AI:**  From resource utilization to complex networking queries, Kubernetes Copilot offers clear, concise answers and suggestions, turning complicated tasks into straightforward conversations. K8sgpt: The Diagnostic Doctor for Kubernetes -------------------------------------------- K8sgpt excels in general diagnostics, offering a comprehensive approach to identifying and resolving issues within your Kubernetes clusters. **Proactive Problem-Solving:** K8sgpt and Kubernetes Co-pilot work together to proactively identify and address potential cluster issues before they escalate, safeguarding the integrity of your environments. **Contextual Cluster Analysis:** With just a click, K8sgpt diagnoses your clusters, identifying potential errors and their causes. The analysis is tailored to your specific environment, ensuring relevant and accurate solutions. **Smart Recommendations:** Beyond diagnosing issues, K8sgpt offers step-by-step solutions, complete with optimizations and best practices customized for your setup. Conclusion ---------- Kubernetes Copilot, enhanced with K8sGPT and conversational AI, not only elevates your debugging experience but also empowers you to excel in the complex cloud-native landscape.  ### Note: 1. **[K8sgpt is licensed under Apache License 2.0](https://github.com/k8sgpt-ai/k8sgpt/blob/main/LICENSE)**  2. **K8sgpt was created by [Alex Jones](https://github.com/AlexsJones)**  ​ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## DevOps Debt is real. Here’s how to clear your DevOps Backlog Author: Pravanjan Choudhury Published: 2024-02-02 Category: Blogs Meta Title: How to clear your DevOps Backlog/Debt? Meta Description: Prioritize your product development process by understanding the importance of addressing the DevOps backlog. Learn why companies are struggling with cloud spending based on the HashiCorp 2023 State of Cloud Strategy Survey. Tags: devops debt, Proactive DevOps, DevOps Backlog URL: https://blog.facets.cloud/how-to-clear-your-devops-backlog ​​ Contents ​[Understanding the DevOps Backlog](#h_4048183482673835) [Why DevOps Backlogs Grow](#h_3489418391975543)​ ​[The Overlooked DevOps Stories](#h_33277659308341745) [Addressing the DevOps Backlog](#h_8346375768408019)​ ​When we start up, the first goal is to get a decent product out to the users. Once the first iteration of the product is out, the next priority is to build it up into the vision, with multiple features, and polished UI. And the pursuit never ends. It’s when the product hits PMF and sees scale, we notice the tech debt of the product, and we address it as well. However, we don’t even talk about the DevOps debt, or the [DevOps backlog](https://devops.com/a-practical-guide-to-mitigating-devops-backlog/), until it becomes a bottleneck for business progress. > According to the [HashiCorp 2023 State of Cloud Strategy Survey](https://www.hashicorp.com/state-of-the-cloud), 94% of companies are still wasting money in the cloud. Understanding the DevOps Backlog -------------------------------- Every skilled product manager knows the importance of prioritizing the most crucial stories from the extensive and seemingly endless list of product ideas. In the constant inflow of requests for new features, upgrades, additional modules, and improved functionality, the emphasis often shifts towards completing the final product rather than focusing on the development process itself. This shift in focus results in the neglect of the development process, leading to an accumulation of unfinished tasks, commonly referred to as the DevOps backlog. This backlog is a collection of user stories, bug fixes, technical tasks, and other essential work needed for the ongoing support and maintenance of a product. It's a dynamic list, constantly evolving with the addition of new tasks and tickets. Why DevOps Backlogs Grow ------------------------ The reasons for an expanding DevOps backlog are multifaceted: 1. **Limited DevOps Resources:** Often, large development teams depend on a small group of DevOps specialists, leading to bottlenecks. 2. **Evolving Architectures:** Continuous introduction of new services and tools results in an ever-growing backlog. 3. **Changing Business Requirements:** Adjustments in business needs, especially in sectors like SaaS or healthcare, introduce complexities in maintaining and complying with various regional or industry-specific environments. 4. **Ticket Overflow:** A lack of direct access to DevOps toolchains for developers often results in an overwhelming number of tickets, further straining the DevOps team. ### The Overlooked DevOps Stories In reality, most companies don't formally capture DevOps stories with the same level of diligence that's followed to capture feature stories. This became evident to us when we talked to over 200 companies. We wondered if it was just a short-term problem, but to our surprise, even late-stage product companies have a large DevOps backlog! As the business needs grow, complex architectures come in, and regulations change, the backlog keeps growing. **But what exactly are DevOps stories? ** They are essential narratives that track deployments, application data, and key performance indicators (KPIs). These stories might include: 1. **Resource Provisioning and Configuration:** Managing the setup and adjustment of necessary resources and infrastructure. 2. **Application Lifecycle Management:** Utilizing tools and metrics to oversee the application lifecycle, often involving the creation of toolchains for automation. 3. **Release Management and Strategies:** Planning and executing software releases with minimal downtime and [robust rollback](https://readme.facets.cloud/docs/artifact-history-and-rollback) strategies. 4. **Observability:** Setting up alerts, capturing metrics, and configuring dashboards to monitor application performance and resource utilization. 5. **Database Management:** Handling database upgrades, migrations, backups, and restoration policies. 6. **Access and Permissions:** Ensuring secure developer access to environments, maintaining audit trails, and preventing permission leakages. 7. **Environment Management:** Creating and managing various environments like Development, QA, Pre-Production, and Production, ensuring consistency and efficiency. 8. **Security and Compliance:** Adhering to standards and regulations specific to the deployment regions. 9. **Cloud Cost Visibility and Optimization:** Managing and optimizing [cloud-related expenses](https://blog.facets.cloud/managing-cloud-spend-in-saas-7-overlooked-opinions/). 10. **Exploring New Tools and Frameworks:** Investigating technologies like zero-trust networks and network segmentation. And the list goes on! Verifying if you have essential stories covered in your backlog is a good starting point. You can [read this article](https://www.facets.cloud/blog/is-your-devops-implementation-complete) to learn more about how to spot issues in your DevOps implementation. ### Addressing the DevOps Backlog Addressing DevOps backlog challenges needs a shift in mindset, it's important to change the way we think and take action early. It doesn't matter if your backlog is just starting to grow or has already become complicated; the best time to deal with it is now. Prioritizing product features over process improvements can lead to complex issues in the DevOps realm. A balanced approach, giving due importance to both product development and essential process enhancements, is key to long-term success and scalability. Because much like the Lannisters, DevOps never forget their debts. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Infrastructure as Catalog (InCa): The Missing Piece in IaC Puzzle Author: Anshul Sao Published: 2024-01-25 Category: Tech Articles Meta Description: Infrastructure as Code (IaC) solved many problems, but challenges persist. This is how Facets' new open model, Infrastructure as Catalog (InCa), aims to simplify IaC complexities for enhanced flexibility and collaboration. Tags: Infra as Catalog (InCa), Infrastructure as Code (IaC), infrastructure management URL: https://blog.facets.cloud/infrastructure-as-catalog-inca-the-missing-piece-in-iac-puzzle At [Facets.cloud](https://write.superblog.ai/sites/supername/facetscloud/posts/untitled-draft-post-clrt13ih3001k13g3jc28fv53/facets.cloud), we are deeply committed to developing technologies that blend into our daily routines so effortlessly that their absence seems unimaginable.  They should become second nature, the de-facto method for doing tasks! Take Infrastructure as Code (IaC) as a perfect example. This method of managing and provisioning infrastructure through code presents an efficiency and control that surpasses old-school techniques. But, its adoption is not without hurdles. There are manual steps, extensive coding, and configurations that are specific to certain technologies or platforms. Contents * [Remember Why Infrastructure as Code (IaC) was born](#remember-why-infrastructure-as-code-iac-was-born)  * [What is Infrastructure as Catalog (InCa)](#what-is-infrastructure-as-catalog-inca) * [How InCa fits in the IaC puzzle](#how-inca-fits-in-the-iac-puzzle) * [InCa Use Cases](#inca-use-cases) * [You're invited to contribute!](#youre-invited-to-contribute) Remember Why Infrastructure as Code (IaC) was born  --------------------------------------------------- IaC was born to address the challenges of manual infrastructure configurations. It provided developers the autonomy needed to manage, monitor, and provision resources — making the [software development lifecycle](https://blog.facets.cloud/enhancing-software-development-with-integrated-observability-by-design-in-sdlc/) more efficient. Though many organizations have benefited from IaC, multiple challenges persist, such as: * Not every organization possesses the expertise to write IaC as per custom requirements (declarative IaC), requiring a deep understanding of both infrastructure components and coding languages. * Poor coding practices can lead to inefficiencies, vulnerabilities, and maintenance issues. * Organizations struggle with the modularization of IaC, leading to tightly coupled configurations that are difficult to manage, update, or scale. At this year's [KubeCon in Chicago](https://www.linkedin.com/feed/update/urn:li:activity:7130232961806843905), we engaged in many enriching conversations with the Ops community about IaC and the challenges it presents. It was there that we got the opportunity to talk about **Infrastructure as Catalog (InCa)**, our open model, tailor-made to address the complexities of IaC. The validation we received for our approach was genuinely motivating for us. What is Infrastructure as Catalog (InCa) ---------------------------------------- InCa is Facets' open model that redefines how organizations manage cloud infrastructure. It simplifies complex architecture into a cloud-neutral, declarative catalog. This approach allows users to define their infrastructure needs at a high level, utilizing a unified language.  This leads to three key outcomes:  * Streamlined infrastructure management * Enhanced flexibility * And effective knowledge-sharing across diverse entities and the broader open-source community. To put it simply, imagine the process of constructing a house.  Just as creating a blueprint provides a detailed plan for a house before construction begins, developing a comprehensive blueprint during the software architecture planning phase is equally essential before starting a project. This blueprint serves as a single source of truth, giving everyone involved a clear understanding of the system's design and required components — and offers a complete view of the entire software architecture. This is the core principle of InCa. A typical structured overview (like the one given below) provides a foundational catalog to begin with. ![Multi-layered architecture](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/add-a-subheading-18-1706182280297-compressed.jpg) The process of building infrastructure for any product can be visualized as a multi-layered architecture, each layer representing a fundamental aspect of the overall system.  These layers typically include cloud accounts, networking elements, orchestration processes, and finally, the resources.   It is at this topmost layer – the resource layer – where the most critical components reside, including application services, databases, caches, and other key elements.  InCa is strategically designed to operate at this resource layer! **How InCa fits in the IaC puzzle** -------------------------------------- InCa is about effective communication. It’s a unified language that enables the conceptualization and realization of infrastructure goals seamlessly. * Documenting Architecture: By documenting the entire infrastructure architecture in a centralized catalog, InCa promotes collaboration, transparency and ease of reference. * Promoting Abstraction: InCa functions as a facilitator for managing services within your infrastructure, such as setting up a MySQL database or a compute instance. Users have the liberty to choose the most appropriate InCa module for their needs, confident that the end result will consistently align with their intended objectives without altering the overarching definitions. * Boosting Flexibility: This extended functionality allows you to modify the flavor in your catalog while keeping the core definition intact. For example, if transitioning from an on-premise MySQL database to a cloud-based one, you just need to update it in the catalog. * Streamlining Management: Having each resource type defined individually offers its own benefits. It makes the system modular and facilitates easier management, versioning and collaboration. **InCa Use Cases** ------------------ * **Automated IaC Creation:** Utilizing user-defined intents, the catalog can automatically generate Infrastructure as Code (IaC). * **Audit and Compliance Verification:** Users can compare their deployments with the original catalog––aiding in the detection of any discrepancies in the infrastructure, reducing the possibilities of infrastructure drifts. * **Root Cause Analysis:** InCa enhances Root Cause Analysis by integrating data from change management and observability systems, offering deeper insights. * **Simplified Architectural Visualization:** Create streamlined visual representations of your architecture, facilitating more efficient knowledge sharing. **You're invited to contribute!** --------------------------------- Infrastructure as Catalog (InCa) is an open model that gets better with help from people like you. By transforming complex architectures into a cloud-neutral, declarative catalog, InCa not only streamlines infrastructure management but also enhances flexibility and fosters effective knowledge sharing. As we navigate the complexities of modern technology, initiatives like InCa will provide the essential frameworks that enable organizations to thrive in an evolving landscape. Want to contribute? [Click here](https://github.com/Facets-cloud/InCa/tree/main). --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## What is a Developer Self-Service Platform and Why Does it Matter? Author: Anshul Sao Published: 2024-01-11 Category: Blogs Meta Description: This article will dive into the need for developer self-service platforms—their components, benefits, challenges, and more—to understand their growing relevance in enabling engineering velocity. Tags: Internal Developer Platform, Developer Productivity , developer self service URL: https://blog.facets.cloud/what-is-a-developer-self-service-platform Speed is critical in the tech world. Organizations want to ship code quickly to gain a competitive edge. However, developers often need more time for essential infrastructure, environments, and access from Ops teams. These delays waste precious time, and slow down development velocity and productivity. What if they had a self-service environment that allowed them to independently provision infrastructure, deploy code, and monitor apps without dependency? That’s where developer self-service platforms come in—they provide developers with automated tools and infrastructure to ship code quickly while managing the underlying complexity. But why do they matter?  This article will dive into the need for [developer self-service platforms](https://www.facets.cloud/developer-self-service)—their components, benefits, challenges, and more—to understand their growing relevance in enabling engineering velocity. By the end, you'll have clarity on how self-service platforms can also transform software delivery at your organization. Let's get started. **What Is Developer Self-Service?** ----------------------------------- ![A simple diagram showing how self-service enables empowers developers](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/frame-1-1-1704975838203-compressed.png) ​[Source](https://www.cnpatterns.org/infrastructure-cloud/self-service)​ "Self-service" refers to giving developers direct access to the tools and environments needed to do their jobs without gatekeepers or bottlenecks. For example, a developer needing a new testing environment would previously have to open a ticket and potentially wait days for Ops to provision resources. With self-service, that same developer can instantly spin up a new testing environment in minutes with a button with the help of predefined paths. Self-service eliminates delays by removing humans or processes from repetitive infrastructure tasks. Instead, it automates provisioning, configuration, and management with code and policy guardrails set by Ops teams. This self-service capability empowers developers with greater autonomy over their workflow. At the same time, it frees up Ops teams from mundane upkeep and allows them to focus on higher-order concerns. Let’s now understand self-service from Ops and developer perspectives. ​**Why Use Self-Service Platforms for Ops Teams?** -------------------------------------------------- Self-service for Ops helps create and set up guardrails, well tested provisioning templates to cater to the infrastructure needs of developers. A self-service platform provides the following to Ops teams: * **Automated infrastructure provisioning -** Ops team can instantly spin up pre-configured environments on demand using self-service channels, eliminating manual work. They may be able to delegate this work to devs for lower environments * **Reusable Infrastructure-as-code -** Ops team distribute reusable IaC templates, enabling consistency, version control, and repeatability. * **Consistent Environment management -** Solutions provide visibility and control across the entire environment lifecycle from creation to decommission, allowing teams to track, monitor, and manage infrastructure – preventing drift. * **Centralized Role-based access controls -** Ops team can grant or delegate access to users from a centralized platform without dealing with complex access configurations of every tool.  * **Policy enforcement -** Ops team uses Guardrails embedded in code to enforce security, compliance, architecture, and governance standards. * **Centralized Auditing -** Centralized event logging and access audit trails improve ability to trace activity back to individual users without configuring or collating audit trails of individual tools. * **Integrations -** Tie-ins with existing developer tools like IDEs, repos, pipelines, etc., enable teams to seamlessly gather all information in a self-service manner from tools they already use. With these capabilities, developers can obtain the resources needed for local development, testing, staging, and production without bottlenecks within organizational guardrails. Teams reduce delays and manual processes, allowing them to ship software faster. Meanwhile, Ops retains centralized visibility and guardrails across all environments and acts as an enabler for the larger development team. This improves security while still granting teams autonomy within the platform. ### **Benefits of a Self-Service Platform for the Ops teams**​ Conversely, what benefits do self-service platforms offer for Ops teams? How does it help them focus less on tasks and more on higher-level strategy? ### 1\. Allow Ops to Move from Tasks to Strategy In traditional models, Ops team spends countless hours on manual, repetitive infrastructure tasks like: * Provisioning and configuring servers, * Setting up networks and load balancers, * Tuning databases, * Patching vulnerabilities, * Tweaking firewall policies. This endless list of tasks focuses Ops on low-level infrastructure upkeep. Enabling developer self-service lifts this burden through [infrastructure automation](https://www.facets.cloud/no-code-infrastructure-automation) and policy guardrails. With self-service, Ops define infrastructure as code (IaC) templates and policy controls and expose them through an internal product. This codifies institutional knowledge into a well-defined platform, over manual processes and adhoc automation. Automation and built-in policy enforcement then handle the grunt work of provisioning and managing infrastructure. With this setup, Ops can now focus on high-value initiatives like: * Improving security posture by defining once and reusing over and over * Hardening governance and compliance and enforcing through a platform than documentation alone * Build cost optimization and resource utilization strategies * Architecting infrastructure for scalability and reliability * Building shared tools and services for developers * Improving further ways to improve cross-team collaboration Self-service allows Ops to apply their expertise to strategic engineering challenges rather than repetitive maintenance. This helps unlock innovation velocity across the org. ### **2\. Improve Security and Compliance** Giving developers infra access may improve their autonomy but compromise security and compliance. However, self-service policy controls and guardrails prevent this from happening. With self-service, Ops defines compliant infrastructure configurations as code in a reusable fashion. This bakes in security best practices and governance by default—[taking Ops from by-audit to by-design](https://blog.facets.cloud/shifting-the-devops-paradigm-from-by-audit-to-by-design/).  The platform enforces these [standardized](https://blog.facets.cloud/standardization-in-security-cost-compliance-and-observability/) policies on all provisioned resources and prevents misconfiguration and repeated audits. For example, Ops can define that production databases must be encrypted, replicated, and monitored. The self-service platform would then automatically ensure any database a developer spins up meets these policies. Centralized policy, implemented in code, helps you increase Ops confidence that governance is baked into the stack, and prevents developers from intentionally or accidentally creating non-compliant resources. But there’s more. Comprehensive centralized audit logging provides visibility into all user activity for monitoring potential security events, and tracing any provisioned resource back to an individual user improves accountability. ### **3\. Speed up Time from Commit to Deploy** One primary Ops goal is improving code flow from commit to deployment. However, manual infrastructure provisioning and change approval processes often bog down software delivery.   Self-service platforms allow the acceleration of this pipeline. With self-service, Ops enables one-click environment creation and automated CI/CD pipelines. Together with cultural changes, self-service gives developers responsibility for owning the deployment process end-to-end. Ops maintains guardrails and oversight but avoids being the release bottleneck. This speed and autonomy energize developers and allow the business to ship new features faster. At the same time, integration with Ops tools gives visibility over the entire pipeline. This improves reliability as Ops can roll back changes quickly if issues emerge post-deploy. Now, let’s look at developer self-service and how it impacts the developer workflows. **Why Use Self-Service Platforms for Developers?** -------------------------------------------------- Developer self-service focuses specifically on the capabilities aimed at improving developers' workflows. Developer self-service platforms provide developers with direct access to * **Environments -** Instantly spin up/down infrastructure environments on-demand for coding, testing, debugging, etc. * **Tools -** Self-service access to use dev tools, IDE plugins, databases, and more without configuring all on their own. * **App services -** Managed services like databases, queues, caches, storage, etc., available on-demand improving new feature launch. * **Data -** Available on-demand, test data sets, masked data, backups, etc. without a lot of coordination Also getting access to release, monitoring and audit data helps track fast moving changes with ease. * **Build automation -** CI/CD pipelines with pre-built templates requiring zero Ops involvement to automate code integration and delivery. * **Collaboration -** Share projects, code, environments, and tools with other developers to foster teamwork. The goal is to create an automated and integrated internal dev platform with minimal delays to give engineers everything they need. [Developers gain autonomy](https://www.facets.cloud/blog/implementing-devops-navigating-the-top-4-challenges) over their toolchain and can work without being blocked on resource requests. ### **Benefits of a Self-Service Platform for Developers** Why is self-service so critical for developers? What specific benefits does it provide over traditional models? #### **1\. Give Developers Autonomy Over the Tech Stack** In traditional infrastructure models, developers have little say over their tools and technologies. Ops acts as a gatekeeper because of operational concerns like security, cost, and standardization. While these are valid concerns, top-down standardization can hamper developer productivity. Developers get stuck using outdated languages, frameworks, and databases dictated by Ops, making them less efficient to meet business needs. With developer self-service, devs now have the autonomy to provision their tech stack.  For example, a team could instantly spin up a PostgreSQL database rather than going through a two-week process to request it from Ops. Combining automation and policy guardrails gives developers more tech flexibility while meeting governance needs. #### **2\. Free Developers from Infrastructure and Tools Tinkering** Despite access to modern tools, developers can still lose significant time tinkering with underlying infrastructure or tools instead of writing code. Configuring servers, networking, databases, and other Ops tasks distracts from building applications. Developer self-service platforms abstract away backend complexity so engineers can focus on coding.  For example, a dev on your team could use a single CLI command to instantiate a production-ready container environment for their application but within the allowed budget and an air-gapped controlled environment. This Container-as-a-Service approach allows coders to focus their skills on developing software rather than wrangling infrastructure. By handling all the underlying infrastructure, self-service platforms let developers spend more time innovating and maximizing the business value they create. **The Internal Platform Model** ------------------------------- A fully-featured [internal DevOps platform](https://www.facets.cloud/blog/driving-engineering-efficiency-with-internal-platforms) integrates with tools across four major categories. Let’s look at the categories and examples of tools within the same.  ### **1\. Infrastructure Provisioning** Infrastructure-as-a-service solutions allow users to provision virtual machines, containers, serverless functions, and other fundamental computing resources. For example, using a platform like [Facets](https://facets.cloud/), developers could leverage an internal "cloud" to spin up transient sandbox environments without waiting on ticket fulfillment from Ops.  Some standard solutions include: * **Virtualization -** Cloud VMs or on-premise VM managers like OpenStack, Nutanix, oVirt, and VMware allow Ops to pool and allocate compute resources. * **Container orchestration -** Cloud-native container management or Open platforms like Kubernetes, Docker Swarm, and OpenShift containerized infrastructure as portable workloads. * **Serverless/FaaS -** Cloud serverless solutions or open Solutions like OpenFaaS and Kubeless provision auto-scaling function workflows. * **Resource allocation -** Cloud management platforms like VMware vRealize automate, and govern provisioning. * **Infrastructure-as-Code -** Terraform, Ansible, and Puppet define infrastructure in declarative configuration files. Additionally, robust API/CLI access allows these tools to integrate into Facets so users can deploy pre-approved configurations with a single click or API call. ### **2\. Environment Management** Environment management solutions track all resources across their lifecycle from creation to decommission. They offer visibility, access controls, cost reporting, and quota management to help Ops govern resources.  Here are some commonly used tools: * **Infrastructure monitoring -** Platforms like Datadog, Prometheus, and Grafana provide utilization, uptime, and performance metrics. * **Release Management -** Solutions like ArgoCD, Harness, and Octopus Deploy manage changes to k environments. * **Service catalogs -** Catalogs like backstage publish infrastructure offerings. * **Access governance -** Tools like HashiCorp Vault and CyberArk manage secrets and control access. * **Cost management -** Cloudability, ParkMyCloud, and Kubecost report spending across resources. * **Quota enforcement -** Throttles prevent teams from over-provisioning resources like CloudCheckr and vRealize Automation. Robust lifecycle management ensures compliance while granting developers the freedom to self-service resources. ### **3\. Developer Acceleration** Developer acceleration tools increase engineer productivity with self-service access to data, collaboration, and dev tools.  **Common solutions include:** * **Data plane -** Offers test data sets, reference data, and masked data for developers. Tools like Delphix, and Cloud Data Services allow self-service data access. * **IDE Plugins -** Access environments, collaborate, and deploy apps directly from IDEs like VS Code, IntelliJ, and Eclipse via plugins. * **Collaboration -** Solutions like Slack, GitHub, and GitLab foster communication and code sharing. * **Tool distribution -** Package and image registries like JFrog Artifactory and Sonatype Nexus proxy-approved developer tools and dependencies. * **CI/CD -** Pre-built pipelines in tools like Jenkins, CircleCI, and TravisCI automate code integration and delivery. Developer acceleration technologies maximize productivity while ensuring compliance via policy guardrails. ### **4\. Managed Services** Managed services provide commonly needed capabilities like databases, messaging, storage, and caching. Teams can self-provision these shared services instead of running their own.  **Some options include:** * **Storage -** Shared object/block/file storage like MinIO, Ceph, and GlusterFS. * **Databases -** Managed PostgreSQL, MySQL, SQLite, Redis, MongoDB, and more. * **Messaging -** Queues and streaming like Kafka, RabbitMQ, and ActiveMQ. * **Caching -** In-memory caches like Memcached and Redis to boost performance. * **Services -** Common app needs search, logging, notifications, and AI. Service catalogs allow developers to consume these managed capabilities on-demand without an Ops burden. Built-in redundancy and failover improve resiliency versus DIY options. ### **Developer Self-Service: With vs. Without Self-Service** To understand the benefits of self-service in action, let's walk through some examples comparing experiences with and without self-service. Here’s a glance before we dive deeper into each scenario.  **Scenario** **Without Self-Service** **With Self-Service** **Provisioning a New Test Environment** Developers submit a ticket and wait days for Ops to provision infrastructure. Developers use a web console or CLI to spin up pre-configured environments instantly on demand. **Deploying a New Microservice** Devs wait for Ops to manually configure networks, load balancers, and firewalls to deploy the service. Devs use a pipeline template to automatically build, test, and promote the service into prod with configured dependencies. **Installing a New Developer Toolchain** Developers submit a request to Ops to install new tools like IDEs and compilers, which get backlogged. Devs automatically instantiate a workflow template to provision a standardized dev environment with preferred tools. **Take Charge And Empower Your Teams With Self-Service** -------------------------------------------------------- Traditionally, developers were constrained by rigid systems—waiting weeks for access, stuck using legacy tech, and bogged down in tickets.  It gets in the way of innovation. But you have a chance to transform this.  With self-service, you can empower your team to build the future autonomously and liberate them from bottlenecks while enabling oversight. And the promise of self-service is easily achievable with solutions like [Facets.Cloud](https://facets.cloud/).  Facets provides extensible infrastructure provisioning, environment management, developer acceleration, and application services—a self-service platform that adjusts to your workflows and becomes your own.  So take that first step—whether through a pilot project or even a proof of concept. There will be challenges to work through, but this is how organizations evolve—through expanding what’s possible.  The future of Ops is self-service. Are you ready to transform how your teams work and innovate? --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Next in DevOps: A User-Centric Platform Engineering Approach Author: Pravanjan Choudhury Published: 2023-12-21 Category: Blogs Meta Description: Platform engineering is transforming DevOps, shifting operations teams into facilitators and here, we talk about how redefining workflows and empowering developers allow operations teams to facilitate faster software delivery. Tags: platform engineering, Proactive DevOps URL: https://blog.facets.cloud/next-in-devops-user-centric-platform-engineering-approach ![representation of user centric platform engineering practices](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/01-1704196021684-compressed.png) The evolution of DevOps has been marked by a series of transformative changes, with the recent widespread adoption of Platform Engineering being the latest. However, the evolution hasn’t stopped there. Platform engineering has had a series of transformations too. And it’s not just about adding new tools and technological enhancements. It’s more than that. ### **So, What’s More to These Transformations** The goal now is to move away from command-based operations towards smooth, automated DevOps processes. This not only enhances developer experience and productivity but also, in the process, evolves operations teams. Here [platform engineering](https://blog.facets.cloud/handbook-to-platform-engineering-journey/) plays a pivotal role. How? The idea is to build platforms by first understanding contemporary developer workflows and effectively identifying areas of friction in the software delivery processes. **The Evolution of the Ops Role: Stepping Into a Facilitator's Role** --------------------------------------------------------------------- When an organization embraces platform engineering, the operations team naturally evolves.  This evolution first redefines the organization's internal DevOps frameworks, then accelerates its path to achieving goals. As this transition progresses, the Ops team takes on the role of an ‘infrastructure facilitator’ more prominently.  This evolution requires the Ops team to rethink their role – acting as a bridge and ensuring developers receive the tools and resources they need for more efficient code deployment to production. Platform engineering changes how the development and operations teams collaborate, moving beyond simply handling tickets back and forth (Ticket Ops).  Here's how Ops teams can spearhead this change: 1. **By breaking down dependencies:** By minimizing daily dependencies, Ops teams can free up bandwidth to concentrate on improving the internal product for developers. 1. **By empowering developers:** The Ops team uses the extra time to refine "golden paths," allowing developers to quickly address most of their requirements without compromising cloud posture. 1. **By driving purpose and standardization:** The "[golden paths](https://www.redhat.com/en/blog/designing-golden-paths)" result in consistent and accessible DevOps processes. 1. **By driving aligned autonomy:** The Ops team will expand its role to include quality assurance (QA) and security, allowing these groups to operate more independently. This change involves a shift from simply executing tasks to acting as facilitators. By creating streamlined processes, we enhance developer productivity and agility. ### **How Businesses Profit from Platform Engineering** With the integration of [IDPs](https://blog.facets.cloud/internal-developer-platforms-the-secret-weapon-for-developer-productivity/) and the implementation of Platform Engineering, the primary objective is clear: enhance visibility and control over software delivery. This allows organizations to accelerate application development, mitigate risks effectively, and optimize costs. One crucial factor in achieving this is an organization's ability to swiftly adapt to changing business needs. These changes often arise from market influences for a variety of reasons. Such adaptations often lead to changes in infrastructure, which may involve geographic expansion, local company strategies, or multi-cloud approaches. The introduction of platform engineering brings several key benefits, including: 1. **Agility:** The ability to swiftly adapt to market demands and competitive pressures, setting the pace in the evolving business landscape by launching newer workloads. 1. **Faster geographic expansions:** Seamlessly launch services in new regions, maintaining efficiency and compliance. 1. **Country-specific compliance requirements:** Facilitate the implementation of local laws, preserving agility and efficiency. 1. **Facilitate multi-cloud operations:** Enable consistent experiences across various cloud providers without sacrificing coherence. **Final Thoughts** ------------------ Platform engineering isn't just a momentary trend; it fundamentally reimagines the way developers engage with technology and how organizations shape their DevOps workflows. At the heart of this transformation is a focus on user-centric platforms. These platforms prioritize end-user needs, leading to more efficient and streamlined processes that accelerate development. Meanwhile, the notion of productization is about transforming technological solutions into standardized products, enhancing accessibility and in-house adoption are the primary factors determining whether a platform succeeds or fails. Collectively, these ideas foster agility, allowing organizations to adapt rapidly to changing needs and challenges. Moreover, they enhance resilience, ensuring that systems and processes have the robustness to bounce back and evolve in the face of disruptions. At [Facets](https://facets.cloud/), we're deeply committed to the values of user-centric platform engineering, and the success of [our early adopters](https://www.facets.cloud/case-study) attests to its impact. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Sample Page Author: Facets.cloud Published: 2023-12-04 URL: https://blog.facets.cloud/sample-page This is a page. Notice how there are no elements like author, date, social sharing icons? Yes, this is the page format. You can create a whole website using Superblog if you wish to do so! --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## The Business Impact of Internal Developer Platforms for Improved ROI Author: Anshul Sao Published: 2023-11-30 Category: Blogs Meta Title: Maximizing Business ROI with Internal Developer Platforms (IDPs): A Strategic Guide Meta Description: How can businesses maximize ROI from internal developer platforms? Examine the measurable impact IDPs have on revenue, costs, productivity, and time-to-market. Tags: internal development platform, platform engineering, idp URL: https://blog.facets.cloud/business-impact-of-internal-developer-platforms-for-improved-roi ​[Internal developer platforms](https://blog.facets.cloud/internal-developer-platforms-the-secret-weapon-for-developer-productivity/) (IDPs) have become increasingly relevant in modern business. They unify the internal tools, services, and workflows that companies use to improve developer productivity and experience.  According to a [Forrester Opportunity Snapshot](https://www.modeln.com/wp-content/uploads/2023/03/model-n-forrester-report-2023.pdf), implementing an IDP can improve time-to-market and increase developer productivity and customer satisfaction. As businesses aim to maximize return on investment (ROI), IDPs are proving to be a strategic investment.  But how can you implement an IDP and see similar ROI growth? Let’s find out.  The Importance of Internal Developer Platforms (IDPs) ----------------------------------------------------- Investing in internal developer platforms makes sound business sense for companies that rely on software as a critical differentiator.  > _IDPs combine all the systems developers use into a unified experience. They provide engineers with the infrastructure and enablement layer to build products and services more efficiently._  This infrastructure generally includes centralized code repositories, shared libraries and frameworks, automated testing and delivery pipelines, and self-service tools. Engineers focus on more high-value, innovative work by removing repetitive tasks and friction points in the development process. IDPs also foster collaboration, knowledge sharing, and best practices adoption across engineering teams. All of this ultimately translates to faster delivery of high-quality software. What Role Do IDPs Play in Maximizing ROI ---------------------------------------- IDPs are not just cost centers. Instead, they should be viewed as strategic investments that can maximize ROI in several ways: * Accelerating time-to-market for new products and features * Enabling faster iteration and experimentation * Reducing costs by eliminating redundancy and duplication * Increasing developer productivity and efficiency * Improving developer experience leads to better talent retention * Providing data insights to make informed product decisions * Serving as catalysts for more comprehensive digital transformation initiatives IDPs essentially act as force multipliers for engineering teams by optimizing the software development lifecycle. The cost savings and increased output enabled by IDPs have ripple effects that ultimately result in higher revenues, lower costs, and maximized ROI. What Business Metrics Do IDPs Impact ------------------------------------ Several vital business metrics see measurable improvements from implementing internal developer platforms. The areas most impacted include: ![The benefits of improved developer experience for businesses represented in a step by step format](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/6568224caabad96c26dda777zaej8la17gvlxk9splpxxxr0x9lsfjuh-fd83ewndliou5wawfcts-s-1h39myoskvdfda-5s29cg1dt6zzuycpet7zpavtgzofgbhqceyzvz-g5hzhdmjafxoq-srphba4cqcyodnzog6c3wpns-1701865728576-original.png) [Source](https://thenewstack.io/developer-platforms-key-findings-from-a-forrester-snapshot/) ### **1\. Revenue Growth** One of the most direct results of effective IDPs is increased revenue. IDPs directly accelerate the creation of revenue-generating products and features by providing developers with an agile platform to quickly build and release high-quality software tailored to users' needs. Rather than getting bogged down by infrastructure complexities, developers can focus on innovation to solve customers' pain points. IDPs also foster innovation by freeing developers from mundane maintenance tasks. > _According to the Snapshot Survey,_ **_85% of survey respondents agreed that investing in internal developer platforms and_** [**_improving DevEx_**](https://www.facets.cloud/blog/improve-developer-experience-with-specialized-dev-environments) _can help drive revenue growth for their firms._  ### **2\. Reduced Time-to-Market** By centralizing the tooling, frameworks, and processes involved in application development, IDPs significantly reduce the time it takes to conceptualize, build, test, and launch new products or features. IDPs minimize duplicate work across teams and allow the reuse of code, APIs, and services to accelerate release cycles. Companies can take an idea from conception to launch much faster. This reduction in time-to-market provides greater flexibility to respond to emerging trends, customer demands, and competitive pressure. Teams can iterate rapidly, testing concepts and validating ideas before bringing finished products to market. > _The same survey also found that_ **_77% of companies attributed measurable improvements in time-to-market_** _to their usage of internal developer platforms._ ### **3\. Enhanced Customer Satisfaction** The combination of faster release cycles and higher-quality software also enhances customer satisfaction. IDPs empower developers to address issues quickly and consistently deliver upgrades and new features based on direct customer feedback. With these rapid responses to customer needs, companies can nurture greater brand loyalty among users and build lasting relationships. IDPs help ensure customers have access to the latest innovations delivered reliably. **Satisfied customers provide word-of-mouth referrals, use products more frequently, and are open to exploring additional services**. Maintaining high customer satisfaction is vital for sustained revenue growth. ### **4\. Cost Savings** Looking specifically at the financial impact, IDPs can reduce the time to market, offering significant cost savings: #### **Lower Time-to-Market Results in Lower Costs** Each day or week shaved off the development lifecycle translates directly to tangible cost savings from: * Less developer time allocated to a project from start to finish * Improved developer productivity by allowing them time to focus on important tasks * Faster realization of revenue potential once the new product comes to market #### **Improved Market Lead** Beyond payroll savings, faster time-to-market unlocks significant financial upside: * Faster product development can give your business a first-mover advantage allows for capturing more market share * An extended period of exclusive offering before competitors launch similar products in the market * Extended period of customer feedback and validation to inform ongoing product iterations * Higher returns on R&D investments when technology reaches users faster * Revenues begin flowing in sooner from new products and features Together, these financial benefits maximize ROI from technology investments. IDPs are accelerators for both cost savings and identifying new revenue streams. ### **5\. Better Developer Experience and Talent Retention** IDPs streamline developer workflows, giving them more time to focus on creative work instead of the daily grunt. This helps improve both developer experience and, with that, good talent continues to stay with the company. #### **Improving Developer Experience (DevEx)** [Developer Experience](https://www.facets.cloud/blog/improve-developer-experience-with-specialized-dev-environments) (DevEx) refers to engineering teams' satisfaction, productivity, and effectiveness. Several factors contribute to DevEx: * **Tooling** \- Quality of coding, testing, debugging, and collaboration tools. * **Process** \- Effectiveness of systems for project and code management. * **Culture** \- Work environment, leadership, and psychological safety. * **Productivity** \- Ability to progress and see results on meaningful work. * **Support** \- Resources for help, knowledge sharing, and mentoring. IDPs directly enhance DevEx by providing unified, high-quality tooling, streamlining workflows, and enabling productivity through automation. #### **Boosting Talent Retention** While hiring good talent is difficult enough, organizations must also work to retain their top performers. Companies lose invaluable institutional knowledge without strong retention and suffer high retraining costs. **IDPs boost retention by:** * Providing access to cutting-edge tech stacks that engage developers * Enabling leadership opportunities in platform design and maintenance * Fostering innovation, creativity, and collaboration without adding bloat to the workflows. Developers stay challenged, sharpen skills, and advance their careers - all while driving business impact. The result is a virtuous cycle of top talent and technology accelerating each other. ### **6\. Catalyst in Digital Transformation** On a broader level, internal developer platforms serve as a vital catalyst for digital transformation within companies. They empower organizations to optimize processes and customer experiences through software continually. IDPs democratize software development across the enterprise. They allow any team to leverage shared services to build applications aligned to business objectives and customer needs. **Key benefits:** * Organization-wide agility to respond to changing market dynamics * Data-driven decision-making from connected systems * Ecosystem thinking that breaks down silos The outcome is a responsive organization wired for constant digital innovation, and the combination of these business metrics makes IDPs a clear ROI win for most businesses. But how can you get started with improving your ROI using IDPs? It’s simple. How to Maximize the ROI of Internal Developer Platforms ------------------------------------------------------- With IDPs proving their ability to impact key business metrics positively, the focus shifts to how companies can implement platforms to maximize ROI. * **Get Executive Buy-In**: The first step is to get executive buy-in. IDPs require upfront investment and organization-wide adoption, which can be resource-intensive. Make a data-backed business case showing the benefits and highlighting success stories from industry peers.  * **Start Small, Then Scale**: When introducing new ways of working, start with a pilot focused on a single product or team. Target projects likely to see quick wins so you can demonstrate the capabilities of the new approach. With evidence of improved outputs, it becomes easier to scale to more teams in phases. * **Involve Developers Early**: Developers are both the users and beneficiaries of IDPs. Get their input when designing tools and workflows to ensure high adoption. Empower developers to contribute to the platform’s evolution over time. * **Prioritize Developer Productivity**: Look for ways to save developers time on every task. Carefully evaluate existing tools and processes to identify pain points. Automate manual tasks where possible. * **Integrate Security Best Practices**: Build security upfront, including access controls, secrets management, and compliance guardrails. This avoids playing catch-up later and reduces risk. * **Focus on Extensibility**: The platform should adapt as needs evolve. Look for an extensible solution so your team does not waste time building the core and instead can build on top of the core with their services and tooling. * **Measure Results**: Define metrics aligned to business goals, like reduced time-to-market or feature delivery rates. Track progress at regular intervals to quantify ROI and identify areas for improvement. * **Continuous Improvement:** Treat the platform as a product requiring ongoing investment. Dedicate resources for upkeep, support, and new capabilities. Solicit user feedback and engage developers to be co-creators. Internal developer platforms require thoughtful planning and care to maximize ROI. But once implemented successfully, they become invaluable infrastructure empowering innovation across the organization.  While the apparent option may seem to be building a custom IDP, you would spend a lot of resources reinventing the wheel.  Instead, buying an extensible IDP is a better long-term decision as it requires lesser maintenance without losing the extensibility you would get from an in-house IDP. How Facets Helps Businesses to Maximize IDP ROI ----------------------------------------------- [**Facets**](https://www.facets.cloud/) is an extensible internal developer platform designed to simplify platform engineering for complex cloud workloads. Instead of your teams adjusting to a new workflow, Facets fits right within your existing way of working and can be modified to suit your exact requirements.  Here's how Facets can help businesses achieve the IDP best practices and maximize ROI: ![details of how facets.cloud helps businesses implement best practices and maximize ROI](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/inside-blogs-1702625840243-compressed.png) **​ ** **1\. Extensible and Customizable** * Facets is built as an open framework that teams can extend by integrating new tools and services, which allows the platform to adapt as an organization's needs evolve. * Development teams can build custom services and tooling on top of the core Facets platform, ensuring extensibility and providing you with an IDP that grows with the business. ### **2\. Automates Repetitive Tasks** * Facets provides reusable infrastructure templates and automated environment provisioning, eliminating repetitive manual work for ops teams. * Self-service capabilities like one-click deployment and built-in observability free up ops teams from mundane tasks so developers can focus on higher-value delivery. ### **3\. Fosters Collaboration** * With Facets, dev and ops teams work from a unified interface and shared architecture catalog. This single source of truth facilitates better collaboration. * Standardized workflows ensure all teams work consistently and speak a common language about services. ### **4\. Enhances Security and Compliance** * Facets comes with security best practices like access controls and secrets management out of the box. * Deep auditing provides transparency into who changed what and when, strengthening compliance. ### **5\. Accelerates Time-to-Value** * Reusable templates help teams launch new environments 25x faster so products and features can be built and delivered more rapidly. * This automation and improved collaboration translates to faster development cycles and time-to-market. Gain the Competitive Edge and Maximize ROI with IDPs  ----------------------------------------------------- ​[Internal developer platforms](https://blog.facets.cloud/in-house-or-third-party-internal-developer-platform/) have transitioned from "nice to have" to "need to have" in the race to build software better and faster than competitors. IDPs directly accelerate product development while optimizing engineering productivity and costs. By providing the foundation for developers to focus on delivering business value vs. building infrastructure, IDPs boost ROI across critical metrics like time-to-market, customer satisfaction, and revenue growth. As software becomes increasingly central to competitive differentiation, investing in an extensible and scalable IDP is no longer an option. Platforms like [Facets](https://facets.cloud/) offer turnkey solutions to help companies implement IDPs that evolve with the business. With the right platform, organizations can maximize ROI while navigating complex digital transformations with agility. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Internal Developer Platforms: The Secret Weapon for Developer Productivity Author: Anshul Sao Published: 2023-11-23 Category: Blogs Meta Title: Internal Developer Platforms for Developer Productivity Meta Description: This article will dive deep into the developer productivity challenges, explain how IDPs improve productivity through workflow optimization and automation, and why IDPs matter for modern tech-first organizations. Tags: Developer Productivity , internal development platform URL: https://blog.facets.cloud/internal-developer-platforms-the-secret-weapon-for-developer-productivity Developer productivity is mission-critical in today's tech-first world—the pace of delivering high-quality applications and features can make or break a company.  However, according to [Stripe's developer survey](https://stripe.com/files/reports/the-developer-coefficient.pdf), developers spend an average of 17.3 hours per week on maintenance tasks. Where speed is critical, this is a hindrance to developer productivity. And tech-first businesses need a better productivity solution. > Enter [Internal Developer Platforms (IDPs)](https://facets.cloud/). IDPs provide developers with automated environments, tools, and workflows that help them ship code faster.  This article will dive deep into the developer productivity challenges, explain how IDPs improve productivity through workflow optimization and automation, and why IDPs matter for modern tech-first organizations. Let's get started. The Current State of Developer Productivity ------------------------------------------- The Stripe developer survey identified that developers spend over [40%](https://stripe.com/files/reports/the-developer-coefficient.pdf) of their time on repetitive tasks like setting up infrastructure, deploying code, and switching between contexts. The rest is split between code maintenance, debugging, and meetings. That leaves less time for writing and deploying new code to production. This creates big problems for engineering teams. Simple projects drag on for weeks, and even onboarding new engineers could be faster because developers get pulled into many non-coding tasks. Shipping new features and fixes takes longer than it should. The tech talent shortage makes things even harder, and [43% of tech decision-makers](https://www.mckinsey.com/capabilities/people-and-organizational-performance/our-insights/beyond-hiring-how-companies-are-reskilling-to-address-talent-gaps) reported a skill gap within their organization. To fill the void, developers are stretched thin and asked to work on tasks outside their core expertise. This leads to burnout, low morale, and people leaving jobs—further adding to the problem. With the wide array of changes in the tech space, the old approaches aren't cutting it anymore. Engineering teams need solutions that remove roadblocks and let developers focus on writing code.  That’s why [over 51% of companies](https://www.puppet.com/resources/state-of-platform-engineering) have moved to an IDP, and 93% of those businesses consider it a step forward in the workflows. But what exactly is an IDP, and why does it matter? What is an Internal Developer Platform (IDP)? --------------------------------------------- An [internal developer platform](https://blog.facets.cloud/in-house-or-third-party-internal-developer-platform/) is a software delivery architecture that provides developers with the infrastructure, tools, and workflows required to build, deploy, and monitor applications efficiently and at scale.  An IDP aims to combine all the systems developers use into a single, unified experience. This includes things like: * Code editors to write code * Code versioning systems (like GitHub) * Systems to build and test code automatically * Environments to deploy code after testing * Monitoring tools to check for issues This is in stark contrast to the disjointed tools used traditionally. With an IDP, developers can set up and provision infrastructure like databases and servers based on pre-configured templates—no time spent waiting for Ops to provision resources for the team. > IDPs aim to consolidate choices for organizations like containers under the hood—all completely managed behind the scenes by the platform. And the most critical factor—collaboration. IDPs make it easy for developers to work closely with other teams involved in the software delivery process. Built-in collaboration features help everyone stay on the same page. Let’s dive a little further into the benefits of an IDP. ### Traditional Environments vs. Internal Developer Platforms [Traditional developer environments](https://blog.facets.cloud/rethinking-architecture-from-unstructured-diagrams-to-structured-requirements/) rely on disjointed tools and manual processes, leading to fragmented workflows. Developers spend inordinate time configuring infrastructure, fixing dependencies, context switching between systems and other forms of waste. In contrast, IDPs provide an integrated platform that combines all systems involved in the software lifecycle. Resources are provisioned on-demand, deployments are automated, and collaboration is built-in. This consolidated experience within IDPs leads to significant productivity improvements. ### Benefits of Using an IDP Here are some significant ways IDPs improve developer productivity: * **Increased velocity** - IDPs accelerate release cycles by providing standardized build and deployment pipeline setups with greater reusability. These efficiencies add up to a significant increase in release velocity. * **Reduced overhead** - IDPs automate infrastructure provisioning, tool configuration, and other mundane tasks. A Stripe survey found developers waste 50% of their time on tasks instead of coding, which IDPs directly reduce. * **Improved system reliability** - IDPs standardize and templatize infrastructure setups, minimizing errors from ad-hoc configurations. Environments are consistent and more stable. * **Enhanced collaboration** - IDPs connect developers, ops engineers, QA, security, and other teams with built-in tools for communication and visibility, thus streamlining collaboration. * **Higher developer satisfaction** - IDPs provide a consumer-grade, self-service developer experience that helps improve developer satisfaction as they can focus more on what they love most—code. * **Faster innovation** - IDPs enable developers to experiment more and build new capabilities faster by eliminating productivity bottlenecks. The velocity translates to tangible business outcomes. ![A progressive mind map representing the multiple benefits of internal development platforms](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/655edde132949f9dd0dc6c08x5ms4noxbimuwc70o3zi-a47rl37eczdhp4ex2zrabfvtdpcfc2wk9f9haj69aayljp7lon6n-vtrtzmp2ego0wzgo5bt0emhdiwdys7jp-pd6b0xe3bz7zl6-nplajxehnmpprlmlby2jkhazcrmhsqqxojgqd7sf1dcuwr8g5x6nwagpt4sci8yyq-1701865730687-original.png) What Are Some Features to Look for in an IDP? --------------------------------------------- You’ll have two choices when you’re in the market for an [IDP—build or buy](https://blog.facets.cloud/in-house-or-third-party-internal-developer-platform/). Both approaches have pros and cons. While building gives you customizability and control, you are stuck with a product that requires constant maintenance and dedicated resources for upkeep.  On the other hand, buying one can be confusing because of the number of features you must choose from. Also, you don’t want to be stuck with a rigid IDP that does not adjust to your existing workflows. Let’s look at the features you should consider when picking an IDP.  ### **Unified Developer Experience** An IDP should provide a unified UI for developers to access all the tools, systems, and information they need. Seamless navigation between code, builds, deployments, and monitoring should exist without switching contexts. Unified identity and access management across all integrated tools are crucial for a streamlined workflow. ### **Flexible Infrastructure Provisioning** ![A screenshot of Facet's infrastructure provisioning screen](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/655edde1cf5e283d9d3ea989ox7hjuh2mfhb0fd2aozldok58i0ztbas71tfz1nbhg3gljbqbsrfsjf0yhyglk7amlv07g5da2s2rpmputrffxh6ifnubskv3x2i4d8gmi5ixvwdw6h2excz0opgjyk9x8tda2ghjda2i5myiovapihydblrif4mngduhp9fpk94jg2negacwhvua4tnq-1701865731668-original.png) The IDP should allow declarative, template-based provisioning of infrastructure on-demand through infrastructure-as-code. A self-service catalog for deploying pre-built, pre-configured environments is also valuable.  You also need the IDP to integrate with major cloud providers to spin up resources dynamically. For instance, [Facets](https://facets.cloud/) adapts to infrastructure automation, enabling developers to provision resources how and when they want, without imposing opinions. ### **Collaboration Tools** You also need to introduce collaboration between teams to improve efficiency. Developers should easily discuss work and share context with other teams through built-in communication capabilities. This includes activity streams, notifications, chat integrations, comments on work items, and wikis attached to pipelines.  ### **Powerful Observability** The IDP should provide integration with observability tools to provide full-stack visibility to the developers for monitoring system health and troubleshooting issues. Logging, tracing, and error tracking should seamlessly integrate into the deployment pipelines, while visual metrics dashboards and log aggregation provide valuable visibility within the same unified interface.  ### **Flexible Access Controls** You need the IDP to manage access when working with multiple teams and roles. This is important for platform governance. The IDP should allow your Ops team to add and modify security policies and give access rights to a specific set of users.  ### **Extensible, Customizable and inner-sourced** Extensibility is of utmost importance. Most IDPs on the market lock you into a specific set of features that have been built. However, when an IDP offers open APIs, plugins, and automation frameworks, you can easily extend and modify the IDP to fit your workflows. IDPs should ensure that they promote valuable inner-sourced contributions that can be easily reused across the entire organization. [Facets](https://facets.cloud/) is one example of such an extensible IDP. It offers a beautiful UI pleasing to the eye and is extensible, with nuanced control over all the features without adding complexity to the platform. Why do IDPs Matter? ------------------- ![A representation of devops maturity model and the several layers to it. ](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/655edde2090ee9ecd44f77adirfcxksls8-fufuf2-spwfmbspuz2njqockuhzkt6fegbpahhecdn7jzcjd93-3d7fnspimpgvoj1xdbite1kkp-ak9jf9y8g3iqalzc7uqxcfirdnmwkjf4fyonw0o38smamvt4xduxrfjtptobycqmwsrvafo56ov0fo4dmg8bqcj24ftjd5t8fx427q-1701865732432-original.png) IDPs present a compelling vision for transforming developer productivity. They directly address the [top pain points developers face today](https://dev.to/devteam/what-are-your-top-three-everyday-pain-points-as-a-developer-23j2) by streamlining tooling, automating infrastructure and deployment, and enabling self-service. According to [internaldeveloperplatform.org](https://internaldeveloperplatform.org/why-build-an-internal-developer-platform/), teams using IDPs benefit from greater velocity, reduced overhead, quicker innovation cycles, and higher job satisfaction—developers can focus on high-value creative work instead of rote tasks. The idea behind an IDP is simple—bring order to the fragmented workflows and improve developer productivity.  Developers can seamlessly move between coding, testing, and deploying from a single screen instead of constant context switching.  Teams across the organization can have a [single source of truth (SSOT)](https://www.facets.cloud/blog/a-comprehensive-approach-to-maintaining-a-drift-free-infrastructure), thus avoiding [infrastructure drift](https://www.hashicorp.com/blog/terraform-cloud-adds-drift-detection-for-infrastructure-management). Done right, IDPs have the potential to move the needle where traditional infrastructure solutions fall short. They represent an exciting evolution that could improve development—and the added efficiency will only become more critical for businesses over time. Take the Next Step Toward Developer Productivity ------------------------------------------------ Improving developer productivity has a multiplier effect on engineering velocity, release frequency, and overall business growth. IDPs address the core issues slowing down developers, like manual configurations, switching contexts, misconfigurations, etc.  They boost developer productivity by providing integrated tools, automated environments, and streamlined collaboration under the same roof acting as a single source of truth for the organization. So if you’re looking to scale innovation faster, consider leveraging infrastructure-as-code, CI/CD pipelines, and self-service environments provided by modern IDPs. But if you’re worried about IDPs being difficult to implement or not extensible enough, try [Facets](https://facets.cloud/). It’s built by tech industry veterans who deeply understand the pain points of developers. Every feature addresses a pressing pain of developers working in collaborative, fast-paced environments. And it provides a fully extensible platform that your developers can work with and modify according to your workflows.  IDPs are transforming how modern digital teams build software and organizations are seeing an average of 20% increase in developer productivity. Internal platforms are the next step toward developer productivity—the question is, are you ready to make the switch? FAQ --- ### How can organizations measure improvements in developer productivity from IDPs? Metrics like lead time, deployment frequency, time to restore service, and mean time to resolution help quantify productivity gains with IDPs. Surveys measuring developer experience and analyzing help tickets also provide insights. ### What are some challenges in implementing an internal developer platform (IDP)? Challenges include: * Getting stakeholder buy-in. * Choosing the proper integration tools. * Migrating legacy systems. * Measuring ROI. Having an incremental roadmap and garnering developer feedback is critical. ### Do developers need training to use an IDP effectively? IDPs focus on providing intuitive self-service interfaces. However, training developers on onboarding, accessing environments, using CI/CD pipelines, etc., helps extract maximum value. ### How long does it take to implement an IDP? Implementing an MVP with core capabilities typically takes 4-6 weeks. However, IDPs like [Facets](https://facets.cloud/) can help you achieve full implementation within the same or less time.  ### Are IDPs suitable for companies just starting their dev teams? If you are a pre-product market fit, the answer would be no. At that stage, you should focus on product fit rather than engineering excellence. However, post-PMG, IDPs provide an excellent foundation for building engineering culture and practices. The automated environments and guardrails help accelerate new teams. ‍ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## New Feature Launch: A Quick Introduction to the Guardrail Policies Author: Anshul Sao Published: 2023-11-07 Category: Product News Meta Title: New Feature Launch: Introduction to the Guardrail Policies Meta Description: We're introducing our latest feature: Guardrail policies, a solution designed to streamline policy management for your entire infrastructure using the power of Open Policy Agent (OPA) integration. Tags: devops, guardrail policies, platform engineering URL: https://blog.facets.cloud/guardrail-policies At [Facets](https://www.facets.cloud/), we understand the intricacies involved with complex engineering setups. As organizations scale and multiple teams work simultaneously, ensuring seamless operations becomes paramount. Today, we're introducing our latest feature: Guardrail policies, a solution designed to streamline policy management for your entire infrastructure using the power of Open Policy Agent (OPA) integration. Let’s dive into the details. **What are Guardrails?** ------------------------ In the context of Platform Engineering and DevOps, Guardrails are like predefined rules or policies that guide developers and teams toward adhering to certain standards or best practices during the software development lifecycle. This ensures that everyone on a development team moves in the same direction toward achieving the common goals of a project.  The Platform or DevOps team sets guardrails and creates an environment where developers can make decisions independently, maintain collaboration, enable faster decisions, and mitigate risks associated with software development. Instead of everyone provisioning infrastructure in opinionated ways or your operations team running around doing it, guardrails provide regulated autonomy.  **Why do you need guardrails?** ------------------------------- When you adopt Platform Engineering principles, it becomes crucial that you have a well-defined framework of policies that ensure operational efficiency and compliance. The Guardrails feature can potentially help you in the following ways: ### **Consistent and standardized Infrastructure:** With multiple developers working with different end goals, there’s an increased risk of conflicting resource details or incorrect configurations, leading to increased manual intervention and configuration management. Guardrails promote consistency and standardization across your infrastructure by employing best practices. It helps in reducing errors and misconfigurations and easing the troubleshooting process thereby saving time and enhancing output quality. ### **Cost Management** Inefficient resource utilization and unnecessary expenses arise from overprovisioning, resulting in resource wastage and additional costs. Guardrails help teams provision resources optimally by setting limits on cloud resource allocations to reduce the risk of cost overruns. ### **Security, Compliance, and Auditability** In the face of tightening regulatory and compliance requisites, guardrails serve as a beacon of compliance assurance. Moreover, they mitigate the risks associated with security infringements by ensuring databases are configured accurately and cloud storage buckets are not publicly accessible without proper authorization. ### **Enhancing efficiency and collaboration** Guardrails boost operational efficiency and facilitate swift decision-making as teams have a coherent understanding of what’s permissible and what’s not. They also foster a collaborative culture by aligning all members of the development team toward common goals. **Facets’ Guardrail Policies** ------------------------------ In Facets' Guardrail policies can be used by the Ops or Platform teams to create, manage, and enforce policies seamlessly. Leveraging the versatility and domain-agnostic nature of Open Policy Agent, this integration ensures policy decision-making is decoupled from enforcement, resulting in a more efficient and streamlined workflow. **Why Open Policy Agent?** -------------------------- The choice of OPA was strategic. We’ve seen that a lot of policy enforcement is done manually - in wikis or docs, giving rise to tribal knowledge. Moreover, the ecosystem of policy enforcement is fragmented. To solve for this, we wanted to take a unified policy enforcement approach across the entire technology stack. OPA's ability to handle the entire policy ecosystem seamlessly, coupled with its capacity to integrate with various tech tools, made it the ideal solution for our platform. **How it works in Facets** -------------------------- With the integration of OPA, Facets enables users to manage and enforce policies through Facets UI.  **Policy Creation and Management** OPA allows you to define and update policies independently of your application code. You can express rules and policies using Rego, a language used to write logic-based policies. This separation of concerns makes it easier to manage and evolve policies. In Facets, you can define policies based on blueprints, environments, resource types, and resource names. By applying policies on resource types, or environment level, it becomes easier to enforce and manage policies for multiple resources.  Users can easily create, view, edit, and manage policies from a centralized location. In the upcoming release, we’ll also be providing users with the option of cloning, enabling, and disabling the policies. ![details about how a user can create, edit, and manage policies from Facet's centralized location.](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/654b345c012aac0bdc232e52imagepercent201-1701865733500-original.png) Create, edit, and manage policies from a centralized location **Policy Application and Violations** In Facets, policies are enforceable based on selected criteria, such as specific blueprints, environments, resource types, and resource names. To make policy enforcement easier, users will receive feedback on policy violations, including details of the violation and suggestions for resolution. We’ve also added severity tags to policies by flagging warnings and errors. Users can also block releases if errors are critical.  ![steps about how a user can receive feedback on policy violations in the Facet's validations panel](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/6549bdac9ff3b5e90eff9cdff4pepvwovk9iapelrbhynq0i5s9jqg2qybxoipsstn0jeb0tjrrwi27ufr90q3teyjmsowbgb7wrzwayue0qo3fapvhmluvozn2bgziivsahvnwlf2tf2ylbuurhtfyrpjegkgl3n1kjgmq3aghmu-1701865734481-original.png) Receive feedback on policy violations in the validations panel **Use-cases** ------------- Here are some use cases where setting guardrails can help in standardization, security, and operational efficiency: **Resource Provisioning Limits:** You can set guardrails to limit the maximum resources that can be provisioned for specific services or applications. For example, you can restrict teams from requesting more than 3 cores for applications in any environment. **Security Compliance Enforcement**: It ensures that all systems have the latest security patches installed, strong password policies, and firewall rules. For example, you can set guardrails to ensure S3 buckets are not public, and EC2 instances of certain types are not allowed. **Network Access Control:** You can implement strict rules for network traffic, such as restricting access to specific IP addresses or network segments, or employing VPNs for secure remote access. **Data Backup and Retention Policies:** You can define policies for data retention periods and backup frequency to prevent data loss in the event of hardware failures, natural disasters, or cyber-attacks. **Compliance Monitoring for Cloud Services:** This includes monitoring data encryption, access control, and data residency requirements to maintain compliance with regulations such as GDPR or HIPAA. **Resource Tagging Policies**: Implementing guardrails for resource tagging ensures that all resources are appropriately labeled with relevant metadata.  **Conclusion** -------------- With the introduction of Guardrail Policies powered by Open Policy Agent integration, Facets aims to streamline policy management for all infrastructure needs. With this, users can enable real-time policy evaluation for evolving environments. Moreover, Facets provides a centralized location to manage policies so everyone has complete visibility of rules. This ensures that resources are optimally provisioned, and have security best practices baked in. Get started with guardrails and improve Developer and Ops experience. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Unveiling the Platform Engineering Mixer: An Evening Full of Camaraderie Author: Pravanjan Choudhury Published: 2023-10-10 Category: Blogs Tags: platform engineering meetup, devops networking event, devops gathering URL: https://blog.facets.cloud/unveiling-the-platform-engineering-mixer-an-evening-full-of-camaraderie-and-networking In the world of technology and engineering, staying ahead of the curve is the name of the game. Recently, we hosted an electrifying event that brought together DevOps industry experts, innovators, and tech enthusiasts under one roof—the Platform Engineering Mixer. If you couldn't attend, don't worry; this blog is your backstage pass to the exciting insights from the event. **Networking, Dinner, Drinks, and More!** ----------------------------------------- The event took place in the heart of Bengaluru, at Daddy’s, Indiranagar. The venue itself was a fusion of innovation and ambiance—a space where people could come together, share ideas, and enjoy. Top DevOps and Engineering leaders of Bengaluru attended the event and discussions around DevOps, Platform Engineering, and Developer Productivity flowed throughout the evening. While our guests also enjoyed delicious dinner and drinks, the highlight of our evening was the panel discussion hosted by Pravanjan Choudhury, the CEO, and Co-founder of [Facets.cloud](https://www.facets.cloud/).  Pravanjan was joined by Kaushik Mukherjee from [Udaan.com](https://udaan.com/), Piyush Kumar from [Capillary Technologies](https://www.capillarytech.com/), Gaurav Sahi from [AWS India](https://aws.amazon.com/local/india/), and Manoj Kulatharayi to discuss why enhancing developer productivity in an ever-evolving ecosystem is important.  Read ahead for more valuable insights from the Panel Discussion. **Why is the importance of developer productivity and experience growing?** --------------------------------------------------------------------------- Piyush, the CTO of Capillary Technologies, kicked off the discussion by addressing the challenges that developers face due to the fast-changing tech ecosystem. According to Piyush, “Developers are now tasked with constant upgrades and management of tools. They’re expected to know a variety of tools and skills used across the software delivery lifecycle.” All of this increases the cognitive load on developers and leaves them with less time to spend on developing the core product.  With developers being the architects of a company's intellectual property, any time spent on non-monetization tasks is a disservice. “Moreover, everything from release management to deployments to testing becomes more complex if we don’t solve for this. That’s why, in recent times there’s a growing focus on improving developer productivity and streamlining their experience”, said Piyush. ### **Custom Metrics for Improved Productivity** Kaushik Mukherjee from Udaan shared valuable insights on measuring developer productivity. He advocated for designing custom metrics tailored to specific problems, leading to significant improvements. This approach allows teams to pinpoint areas of concern and optimize their processes effectively. He also advised companies to focus on both stability and agility, instead of working for just one. ![DevOps thought leaders on one stage discussing all things platform engineering ](https://uploads-ssl.webflow.com/62566ffa5e87f6550e8578bc/652508f7e07f9c4c3dd06862_image%20(2)-min.png) ### **The Cloud Conundrum**‍ The discussion eventually moved towards addressing the paradox of cloud technology. The panel discussed how companies can mitigate challenges related to managing the cloud.  While the cloud was designed to simplify infrastructure procurement and software delivery, it has also given rise to a different set of challenges. Despite all the best practices around managing the cloud, companies are still struggling with it.  Gaurav Sahi from AWS India stressed on the importance of giving developers autonomy while maintaining essential guardrails. He said, “Too often, developers invest more time customizing tools than using them effectively.”  To solve for this, he asked companies to build teams that can bridge the gap between developers and technology. These teams should understand developers' pain points, speak their language, and provide valuable guidance. Ultimately, upskilling and fostering effective communication and collaboration are crucial steps toward boosting developer productivity. Adding to this, Manoj Kulatharayi advised companies to invest time in building developer-friendly tools and frameworks that streamline the entire software delivery process. “This will in turn increase release velocity and also help in retaining knowledge across the organization”, he said. ### **Creating a Cost-Conscious Culture** The discussion was concluded after addressing one of the major cloud challenges – the unexpected increase in costs. The panel agreed that this often happens because companies don’t have visibility into their infrastructure. The consensus among the panelists was the need to instill a cost-conscious culture. “Managing cloud costs should not be the responsibility of only a selected few people”, said Piyush Kumar. He added that everyone on the team should be aware of the costs and provision the right amount of resources. To enable this, developers should have complete visibility into how resources are being utilized versus what’s provisioned. What followed this was an intense round of Q&A where attendees further clarified points discussed in the event.  ### **Wrapping up** The panel discussion shed light on the challenges and solutions for enhancing [developer productivity](https://www.facets.cloud/developer-self-service). In a world where technology continually evolves, it is imperative for businesses to adapt and empower their development teams. Creating a cost-conscious culture, optimizing cloud usage, and providing developers with the right tools and metrics are key steps toward achieving this goal. As the tech landscape continues to evolve, the insights shared by these DevOps and Engineering leaders serve as a valuable roadmap for companies seeking to thrive in the digital age. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Build vs. Buy: Deciding Between an In-House or Third-Party Internal Developer Platform Author: Anshul Sao Published: 2023-10-02 Category: Blogs Tags: devops, internal development platform URL: https://blog.facets.cloud/in-house-or-third-party-internal-developer-platform ![a comparison of building vs buying an internal developer platform](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/build-vs-buynew-1704189534685-compressed.png) Building software is complex. Engineering teams face immense pressure to churn out quality code at breakneck speeds while ensuring legacy systems don't break. Many companies now rely on [Internal Developer Platforms](https://blog.facets.cloud/internal-developer-platforms-the-secret-weapon-for-developer-productivity/) (IDPs)—inhouse built toolchains that help developers deliver faster to stay ahead of the curve. **But that leads to a critical question**: Should you build your IDP from scratch or buy one off-the-shelf? This is a common dilemma facing engineering leaders, and the decision carries significant implications for [developer productivity](https://www.facets.cloud/blog/driving-engineering-efficiency-with-internal-platforms), costs, and more. This article will help you decide whether to build or buy an IDP. We'll look at companies that started with homegrown IDPs and eventually switched to a third-party IDP and how that shift impacted business.  First, What is an IDP? ---------------------- An [internal developer platform (IDP)](https://www.facets.cloud/internal-developer-platform) is a customized system created in-house or by a third party to improve efficiency for an organization’s Dev and Ops teams. IDPs provide a central interface where developers can access all the tools and services they need for coding, building, testing, and deploying applications. They aim to remove friction during deployment by standardizing tools and best practices. This standardized approach allows developers to focus on writing code rather than configuring tooling and processes.  At their core, IDPs enable engineers to build, test, integrate, and deploy applications faster and with higher quality. **What are the Components of an IDP?** -------------------------------------- Developers use many different tools to build and deploy software. This can make operations complicated and disjointed. There are isolated systems that don't connect well, and IDPs solve this by linking together essential tools like: * Source code repositories * Continuous integration/continuous delivery (CI/CD) pipelines * Infrastructure provisioning tools * Monitoring and logging dashboards * Documentation platforms * Ticketing systems But what are the benefits of implementing an IDP in your Dev workflows? ### Benefits of IDPs Engineering teams can realize several advantages with an IDP. Here are some of the major ones: * Improved developer productivity due to streamlined workflows * Increased release velocity, allowing more features to be released * Higher code quality through enhanced debugging and collaboration * Standard processes and tools across all product teams * Scalable platform for onboarding more engineers As software complexity increases in modern technology stacks, IDPs have become indispensable infrastructure for high-performing engineering organizations. With a robust IDP, companies can accelerate innovation and maintain a competitive advantage. ### Should You Build or Buy an IDP—Quick Overview But when it comes to IDPs, you’re met with two choices—spending resources and developer time to build an IDP or buying an IDP that allows similar flexibility. Let’s look at a quick comparison. Build vs. Buy an Internal Developer Platform—A Detailed Look ------------------------------------------------------------ Let’s dive deeper into the debate and see if it makes more sense for you to build your IDP or buy one that integrates with your toolchain. ### The Build Approach for Internal Developer Platforms (IDP) Going custom with your Internal Developer Platform is like designing your dream house - you get the creative freedom to build something tailored to your team's needs. I can see why that appeals to some teams! ### **Advantages of the build approach:** * **You can fully customize it** for your unique workflows and specialized needs. For instance, if your team is building a custom data platform that requires specialized applications, building an IDP lets you bake all that in from the start. * **You can seamlessly integrate your toolchain**. It’s like designing your ideal kitchen with all the must-have appliances and gadgets. You can only add the integrations you need in your IDP instead of spending time and resources building integrations for all tools. * **You maintain full ownership and control**. You can add features to your IDP on your timeline—no reliance on an external team where you request and wait for the components to be prioritized.  However, building a custom Internal Developer Platform is more complex than throwing features together.  ### **Disadvantages of the build approach:** * **It takes significant time and engineering resources** to build something robust from scratch. You need skilled developers and lots of bandwidth. It's not a quick weekend project. And every hour not spent on your core product offering, your crown jewel, is not focusing on your competitive advantage. * **Once built, it needs ongoing maintenance and care** to handle upgrades, security, scaling needs, etc. You're signing up to be a full-time gardener. * **The costs add up fast**. Between Dev time, infrastructure, maintenance, etc., it's a serious investment, like buying a luxury car. Budget accordingly. It can be an added distraction.  Building a custom Internal Developer Platform can be a compelling approach for the right team with advanced needs and resources to invest. But take into account the commitment and complexity involved. ### The Buy Approach for Internal Developer Platforms (IDP) Buying an off-the-shelf Internal Developer Platform (IDP) is like moving into a fully furnished home, ready to go. You skip the headache of building everything yourself from the ground up. There are some convincing reasons why getting a ready-made IDP could be the way for your organization: * **It brings standardization**. Everyone uses identical versions of tools and services throughout the organization. This also helps you avoid infrastructure drift, helping bring everyone on the same page.  * **It boosts developer productivity**. An IDP makes things like environment provisioning, building pipelines, and deploying apps ridiculously easy. This leaves your developers free to focus on writing code and shipping features faster. * **Collaboration between teams improves.** A single pane of glass gets dev teams, Ops, security, and others on the same page. * **It’s faster to onboard new developers** since everything's pre-configured and ready to roll. Less ramp-up time means developers can get to production quickly. * **A quality IDP is designed to scale** with your organization as you add more teams. * **Governance frameworks** for security, compliance, monitoring, and more are baked right in. This means your Cloud is optimized from Day 1 Going the buy route isn't without some potential downsides to consider: * **Some pre-packaged IDPs may be challenging to customize**. Though you may see IDPs designed to be standardized, many modern IDPs are fully customizable based on your workflows. For example, [Facets](https://facets.cloud/) is built from the ground up to be extensible by design, giving you complete control over how your organizational workflows are stitched together.  * **There’s dependence on the vendor's timeline** for critical updates, fixes, new features, etc. Build vs. Buy an IDP: Real-World Business Impact ------------------------------------------------ Let's look at real-world examples to see the build vs. buy decision and its impact on business. These case studies are from companies that have implemented IDPs and provide tangible lessons on the pros and cons of each approach. [Treebo Hotels](https://www.facets.cloud/case-study/treebo) built its custom platform at first. But soon, the company faced challenges maintaining and managing multiple environments. That’s when they decided to buy an IDP.  They picked [Facets](https://facets.cloud/)—a self-hosted, self-serve IDP that simplified their manual processes with logging, monitoring, and alerting workflows. With this change, Treebo could onboard developers faster, with dedicated resources spun up automatically, thus saving hours of wasted resources and improving dev productivity.  Another example is from the gaming industry, which is highly code-heavy and has many disparate processes—including code, graphics, design, modeling, and more. There are multiple layers of complexities involved.  When [GGX](https://www.facets.cloud/case-study/ggx) decided to migrate its cloud solution from AWS to Google Cloud, it faced many issues arising from these complexities. Due to this, they expected the migration to be completed in 3 months. Running sandbox environments for external game developers was also tough using the homegrown tools.  That’s when they decided to try a third-party IDP. [Facets](https://facets.cloud/) accelerated GGX's cloud migration **from three months to just two weeks**!  It also smoothened release management for faster deployments. With [drift-free](https://www.facets.cloud/blog/a-comprehensive-approach-to-maintaining-a-drift-free-infrastructure) sandboxes and 100% automation, GGX achieved a faster setup with minimal effort and resource wastage. **The takeaways?** * IDPs provide pre-built tools and workflows that automate the redundant tasks involved in cloud deployment, freeing up developer time and energy for innovation. However, building a custom IDP requires significant upfront effort. * IDPs standardize environments and processes, reducing unexpected issues down the line. Some off-the-shelf solutions offer better support and extensibility, helping you design the IDP to fit your current workflows. * Adapting and scaling efficiently as needs change requires flexible infrastructure. Homegrown platforms often involve more overhead, making it challenging to modify and upgrade. * Balance is key between control/customization and maintenance burden. Building an IDP allows more customization but requires more upkeep, while some standardized IDPs may limit flexibility. Build or Buy..the verdict? -------------------------- It feels safe to stick to what works. Your team may side with building a custom IDP to fit their workflows and process requirements.  But it isn’t about the technology itself—it's the productivity and innovation an IDP enables for your developers. You need to weigh the benefits and drawbacks of building vs. buying an IDP that fits your needs.  An effective IDP helps create the best environment for your engineers and adds value to the organization. It enhances their experience, not hinders it.  If building one from scratch works for your team, take that idea for a spin. And if you’re considering buying an IDP,  [try Facets](https://facets.cloud/)—an off-the-shelf IDP purpose-built for engineering teams. It is built by experts in developer-centric design, making Facets the one third-party IDP that balances extensibility and configurability while providing you complete peace of mind regarding updates and security.  _Want to see how Facets can help optimize your existing dev workflows?_ [_Book a 1:1 demo today_](https://www.facets.cloud/demo)_._ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Effortless Infrastructure Visualization: A Look into Facets' Blueprint Designer Author: Anshul Sao Published: 2023-09-21 Category: Product News Tags: blueprint designer, infrastructure visualization URL: https://blog.facets.cloud/effortless-infrastructure-visualization-a-look-into-facets-blueprint-designer Tech footprint, coupled with the complexity of multi-cloud and multi-environment setups, is increasing at a fast pace. While Infrastructure-as-Code (IaC) came to the rescue, it didn’t give us a solution to visualize architecture in real-time and make changes easily. With Facets’ Blueprint Designer, we offer a [no-code](https://www.facets.cloud/no-code-infrastructure-automation) approach to infrastructure visualization and management. Let’s see how. Why do you need real-time architecture visualization? ----------------------------------------------------- Unlike static documents that quickly become outdated, real-time architecture visualization always stores the current state of your architecture and serves as a living document.  Effective visualization bridges the communication gap between teams and helps them have a common understanding of the system and its dependencies. This also helps them stay away from potential issues and challenges such as: **Cost overruns**: When you know what’s working where, you can optimize resources and shut down unused ones. This helps you track and control cloud spends, avoiding surprise budget overruns. **Security and compliance risks:** Sufficient visibility into resource configurations can save you from serious security hazards and help in adhering to regulations**.** ‍**Troubleshooting:** When issues or outages occur, you will be able to identify the root causes easily. This ensures you’ve more than [99.99% system uptime](https://www.facets.cloud/case-study/capillary-technologies) and business continuity. Unraveling the Facets Solution ------------------------------ [Facets](https://www.facets.cloud/) recognized these challenges and devised Blueprint Designer that solves for visibility with a no-code approach. A blueprint in Facets is nothing but a template of your architecture which acts as a single source of truth. The blueprint designer helps you seamlessly visualize your infrastructure, create and edit resources, and keep a tab on configuration overrides. Let's delve into the capabilities: ### **1\. High-level View of your Infrastructure** With the Blueprint Designer, Facets users can visualize their entire infrastructure in one place – whether the resources are spread across multiple clouds or environments, or K8s clusters. With this single source of truth, you get complete visibility into what’s working where and the relationships between all resources.  Facets provides a tabular view and a graphical representation of your architecture that keeps updating whenever you make changes to your infrastructure. With this knowledge, you can optimally provision resources and keep track of unused infrastructure. ![A high-level architecture view of Facet's control pane in graph and table mode](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/650bd040c295dc50835fde45bpdesigner-1701865744987-original.gif) High-level architecture view in graph and table mode ### 2\. **Granular details for every resource** While a bird’s eye view of your infrastructure is necessary, you also need to know the granular details of your resources. You can view all the information on - how they’re allocated, and their configurations, right from the Blueprint designer and stay on top of resource performance.  ![Granular details of resources in control pane: resource interrelationships, where they’re deployed, and specs](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/650bd09e694bbef66623ea27screenshotpercent202023-09-18percent20atpercent2040256percent20pm-1701865746126-original.png) Granular details of resources: resource interrelationships, where they’re deployed, and specs ### **3\. Resource Catalog for Effortless Resource Creation** Facets eliminates the need for exhaustive research by providing a database of pre-configured resources. Users can select resources without having to write a single line of Terraform code. This empowers you to start using resources swiftly, saving time and reducing the learning curve. [Learn more.](https://readme.facets.cloud/docs/adding-resources) ### 4\. **Flexibility to Customize** While pre-configured resources are convenient, Facets also offers flexibility. You can edit Terraform files to tailor these resources to your unique requirements. This ensures that you can adapt resources to fit your specific use cases and configurations. ### **5\. Clear Visibility into Overrides** Configuration overrides are required in order to customize resources at the environment level. It's crucial to keep track of what configurations have been overridden and by whom. Facets' Blueprint Designer provides a clear visualization of these overrides at the environment level, ensuring transparency and accountability in your infrastructure management. Conclusion ---------- Facets' [Blueprint Designer](https://www.facets.cloud) empowers you with complete visibility into your architecture with a no-code solution. This, in turn, allows you to become 100% cloud-optimized, and efficient.  [Book a Demo](https://www.facets.cloud/demo) to know more. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Blue-Green Deployment Conundrum: The Need for Versioned Secrets/ConfigMaps Author: Facets.cloud Published: 2023-09-13 Category: Tech Articles Tags: versioned secrets , ConfigMaps URL: https://blog.facets.cloud/blue-green-deployments At [Facets](https://facets.cloud), we are committed to optimizing the user experience and ensuring smooth application releases. One particular challenge we encountered was implementing blue-green deployment strategies for applications. While this approach promises seamless updates, we soon realized that it's not all smooth sailing, especially when dealing with secrets and configmaps. In this blog post, we'll dive deep into the complexities we faced and the solution we found to ensure the safety and stability of our deployments. Join us as we explore the need for versioned secrets/configmaps and how they can save your business from potential disasters. **The Blue-Green Deployment Dilemma** ------------------------------------- Blue-green deployments are a powerful way to release new versions of applications without downtime. However, when dealing with applications that rely on secrets or configmaps, things can get tricky. Let's illustrate this with an example: Imagine you have an application, let's call it "Application A," that references "Secret A." Now, you need to update "Secret A" and this change triggers a blue-green deployment for "Application A." This means that during the transition, you have both active and preview pods of "Application A" running simultaneously, some with old data of "Secret A" and others with new data of "Secret A". ![representation of blue and green teams deploying software](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/65015bfc1cf16edc4eedc19dbluepercent20greenpercent20deploymentpercent20-percent2001-1701865747379-original.png) Blue and green deployments  The problem here is that without versioning, there's no clear way to distinguish which pods should use the old secret and which should use the new one. To put it simply, if the active pods are restarted, they will get the changes from the updated “Secret A.” This ambiguity can lead to significant issues in the production environment. In the worst case scenario, these complications can even bring your business to a halt. **The Solution: Versioned Secrets and ConfigMaps** -------------------------------------------------- To mitigate this issue and ensure a seamless transition during blue-green deployments, we found the solution in **versioned secrets and configmaps**. By introducing versioning, you create a clear distinction between different iterations of your secrets and configmaps. This  ensures that each pod gets the correct configuration. Here's how it works: 1. **Versioned  Secrets and ConfigMaps**: Start by versioning your secrets and configmaps. For example, instead of updating "Secret A" , create a new version, "Secret A-v1". This way, both old and new pods can reference the appropriate version. 2. **Rollback Safely**: In case of any issues during the deployment, [rolling back](https://blog.facets.cloud/rollback-to-a-previous-version-of-a-microservice-using-facets/) becomes much safer. You can simply switch the pods back to the previous version of the secret/configmap. 3. **Clear Versioning Strategy**: Implement a clear versioning strategy to keep track of changes and ensure that secrets/configmaps are updated systematically. This might involve using version numbers, timestamps, or other identifiers that work for your organization’s context. **Conclusion** -------------- Blue-green deployments offer significant advantages in terms of minimizing downtime and risk during application updates. However, when dealing with secrets and configmaps, the lack of versioning can introduce confusion and potential disasters. By adopting Versioned Secrets and ConfigMaps, you can ensure that your pods always have access to the correct configuration, even during the transition phase of a blue-green deployment. This not only enhances the reliability of your deployments, but also provides a safety net for rolling back changes if needed. At [Facets](https://facets.cloud), we learned the importance of versioning the hard way. But now we’re better equipped to handle complex deployments - while maintaining the stability and integrity of our applications.  Thank you for reading! We hope you found it informative and valuable for your infrastructure management. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Facets’ $4mn seed funding: The founding story and the road ahead Author: Pravanjan Choudhury Published: 2023-09-04 Category: Blogs Tags: facets.cloud, seedfunding, fundingnews URL: https://blog.facets.cloud/facets-4mn-seed-funding-the-founding-story-and-the-road-ahead Today, we celebrate Facets' $4 million seed funding, led by 3one4 Capital. This milestone provides a moment to reflect on our journey and share our DevOps vision with the world. Origin: The na**​gging problem** -------------------------------- My decade-long run at Capillary Technologies and especially the last few years as the CTO gave me a panoramic view of its meteoric rise. But rapid growth often masks the intricate engineering challenges underneath. For starters, Capillary was building new products, launching in new geographies, acquiring organizations, and starting to serve super-enterprise customers. From a DevOps perspective, this meant we needed to launch new environments in multiple geographies, do private deployments for super-enterprise customers while continuing to push releases - really really fast. This meant a lot of the burden fell on the Ops team (which is always a lean team in proportion to the bigger Development team). Ops found itself in a perpetual burnt-out state - juggling numerous deployments, monitoring releases across regions, and striving to keep pace with rapid development. The result was unoptimized cloud, bloating costs, and slowing down of feature releases - not to mention unhappy developers. By this time, Anshul, Rohit, and I had been working together for many years. We often found ourselves engrossed in debates on how best to ensure the Ops team was able to manage these growing demands. We debated, ideated, and discarded a dozen solutions before we broke down the problem from First Principle thinking.  Our belief crystallized around two core ideas: transform the Ops team from owners to enablers of infrastructure, and make infrastructure so intuitive that developers can autonomously meet their requirements, albeit within guardrails. **From Idea to Inception: Facets** ---------------------------------- We became clear that to work on this idea, we needed a complete rethink; a new purpose, that became the genesis of Facets.  Our conversations with organizations spanning across domains further solidified this belief. Although diverse in their challenges, a recurring theme emerged: DevOps bandwidth bottleneck stifling the development team and a constant catchup for an optimized cloud environment. This challenge was representative of a broader systemic problem within the DevOps space -- The prevailing mindset of the Ops team acting as the custodians of infrastructure, rather than facilitators.  Our conviction grew**:** DevOps needed a rethink. The old ways of managing DevOps had to end.  The broader industry was also trapped in a vicious cycle. While we had frameworks for every technology, every company was literally reinventing the wheel when it came to DevOps, leading to piecemeal developer experiences. It became clear to us: DevOps needed a significant overhaul. Our philosophy behind Facets has never been about merely building a tool; it has been about filling a glaring market void. The alignment in our vision and shared fervor for enhancing the developer experience seamlessly transitioned us into natural co-founders. At its heart, Facets aims to dismantle the convoluted process that companies undertake to stitch up DevOps workflows. With Facets, we offer a ready-to-use Cloud Deployment Platform that’s also flexible enough to cater to unique infrastructure needs. **The next chapter: The DevOps revolution** ------------------------------------------- Revolutionizing industry norms is a Herculean task. For DevOps to transform, a foundational shift in perception is crucial. Facets stands on the brink of this transformation, bolstered by our recent funding. In the next few years, we aspire to technologically fortify Facets and enter new markets, advocating for Platform Engineering - an embodiment of our philosophy that prioritizes developer autonomy with Ops as infrastructure facilitators. Our journey may be uncharted, but our resolve is steadfast: to usher in a new DevOps era centered on developer experience and productivity. You can find the official press release of the funding [_here_](https://www.prnewswire.com/news-releases/facetscloud-raises-4-million-in-seed-funding-to-revolutionize-devops-301916799.html) --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Simplifying Log Management with Grafana Loki and Facets Author: Rohit Raveendran Published: 2023-08-23 Category: Product News Tags: Garafana Loki , Loki integration , Loki and Facets URL: https://blog.facets.cloud/simplifying-log-management-with-grafana-loki-and-facets Monitoring and analyzing log data efficiently is important. It ensures smooth operations and helps identify potential issues promptly. But [log management](https://blog.facets.cloud/navigating-high-capacity-log-management-insights-from-the-loki-at-scale-webinar/) can quickly become a challenging puzzle since logs are scattered across various systems, applications, and services, requiring Devs and Ops to navigate through different platforms and interfaces.  This is where [Grafana Loki](https://grafana.com/oss/loki/), an open-source and powerful log aggregation system, comes into play. And, by pairing Loki with [Facets](https://www.facets.cloud/), an infrastructure management platform, you can unlock a wealth of benefits that revolutionize the way logs are managed. In this blog, we will explore the features and advantages of this powerful integration, shedding light on how Facets simplifies log management. Understanding Grafana Loki -------------------------- Grafana Loki is a robust log aggregation system built to streamline log management across multiple sources. It centralizes logs from diverse components, such as applications, databases, proxies, and more, within a unified dashboard. This setup makes it much simpler to analyze logs without the hassle of jumping between different logging systems. However, using Loki isn't a walk in the park. Setting up and getting Grafana Loki just right can be a bit of a difficult task, especially if you're not well-versed in the world of log aggregation and distributed systems. And if Loki or the broader Grafana environment is new to you, brace yourself for a steep learning curve. Mitigate all complexity with Facets ----------------------------------- However, with Facets, you can reap all the benefits of Loki while making it extremely simple to use All log data without losing context Facets provides an intuitive and user-friendly interface so that you can view all your application log data right where you need it, just with a single click. This unified view allows both developers and operations teams to gain comprehensive insights without the hassle of jumping between different tools and losing context. Out-of-the-box setup -------------------- Integrating Loki with Facets means you have zero setup overhead. The pre-configured solution allows users to dive straight into log analysis without having to spend any time or effort on setting up multiple tools. ![Facets dashboard reflecting ease of integrating grafana loki with facets](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/64e5b4e0dae2e32a9d72e846quicklog-1701865748245-original.gif) Integrating Loki with Facets Easy Log filtering --------------------- One of the standout features of Facets is its intuitive search and navigation capabilities. Traditional log analysis tools often require complex queries and commands to retrieve specific data. Facets creates a set of predefined filters that speeds up the troubleshooting process. ‍ ### Alerting and troubleshooting made simpler Facets takes log management to a whole new level by seamlessly integrating with open-source tools like Alertmanager to provide proactive alerts and notifications. And, with its AI-powered chatbot, Facets explains the nature of the issue and suggests potential solutions to any log errors. This dynamic combination of proactive notifications and AI-driven insights transforms log management from a reactive process to a proactive, solution-driven approach, empowering both developers and operations teams to mitigate issues swiftly and with a deep understanding of the problem at hand. ‍ Wrapping Up The integration of Grafana Loki with Facets brings unparalleled power and simplicity to log analysis and management. By harnessing the strengths of these two robust tools, organizations can extract valuable insights from their log data, make data-driven decisions, and ensure optimal performance and reliability of their systems.  Whether in cloud-native environments or traditional setups, this integration is a game-changer. Embrace the power of Grafana Loki and Facets today and embark on a journey of enhanced log visibility and streamlined troubleshooting. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## From Chaos to Consistency: Maintaining a Drift-Free Infrastructure Author: Rohit Raveendran Published: 2023-08-17 Category: Blogs Tags: Infrastructure drift, infrastructure consistency, drift-free infrastructure URL: https://blog.facets.cloud/comprehensive-approach-to-maintaining-a-drift-free-infrastructure In today's rapidly evolving technological landscape, businesses are quickly adopting a [cloud-native approach](https://www.forrester.com/report/the-state-of-cloud-in-india-2023/RES179406?ref_search=3525299_1688554347242). Embracing cloud-native solutions has become a pivotal step in their journey to digital transformation, enabling them to gain a competitive edge and cater to the ever-changing demands of modern consumers. As businesses move towards a cloud-native approach, the importance of maintaining a consistent infrastructure cannot be overstated. To keep pace with this fast-moving digital race, organizations must adhere to the mantra of "**build fast and ship even faster**." Swift software development cycles and accelerated deployment timelines are imperative to seize market opportunities, launch innovative products, and respond swiftly to customer feedback. However, with a continuous focus on build and scale faster, the best practices for managing infrastructure are usually put on a back-burner, eventually leading to a phenomenon commonly known as “infrastructure drift”. What is Infrastructure Drift? ----------------------------- ![An image depicting infrastructure drift could show various elements of a network or system infrastructure progressively diverging from a central, organized structure, representing the deviation of the configuration from its intended or documented purpose.](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/64ddbd8cbcbb593ae5475103driftpercent20imagepercent20-percent201-1701865749598-original.png) Infrastructure Configuration Drift [Infrastructure drift](https://www.hashicorp.com/blog/terraform-cloud-adds-drift-detection-for-infrastructure-management) is when the configuration of different environments within the infrastructure deviates from its intended purpose or the one documented. Let’s consider an example of updating an application with a new feature: ### **Alex’s Dev Environment Tale** Developer Alex is tasked with developing a new feature for a web application. In his local environment, he is using v5.7 of a particular database that the application depends on. Now the team understands that v5.7 is used across development, staging, and production environments.  Alex successfully develops the new feature by using a function that’s only available in v5.7 of the database. This new feature works perfectly in his local environment, and he happily commits the code.  ### ‍**Staging Setback** Now the code is pushed to staging where the team expects the same database v5.7 to be running. However, unbeknownst to Alex and other developers, an Ops engineer downgraded the staging and production environment to v5.5 as he was fixing an urgent issue.  Alex’s new feature fails in the staging environment since the function he used during development with v5.7 is not available in v5.5. The team now spends hours diagnosing the issue, thinking that it must be a problem in the code. ### ‍**The Revelation** Finally, after spending hours, the team realizes that Infrastructure Drift (version mismatch) is the root cause of this issue. Post this, there are delays, frustrations, and potential conflicts between the teams as they decide on whether to update the staging environment or refactor the code. Contributing Factors to Infrastructure Drift -------------------------------------------- ‍In an ideal world, infrastructure drift should not happen, but we aren't living in one. Several factors contribute to infrastructure drift, and understanding these causes is crucial to implementing effective prevention strategies: **Manual Intervention:** When manual changes are made to the infrastructure outside of the automated processes, it can lead to discrepancies across environments. **Human Error:** Mistakes made by team members during configuration updates, development, and when spinning up or deploying new environments can introduce inconsistencies and drift. **Lack of Automation:** Inadequate or partial automation in infrastructure management can result in manual changes and deviations from the desired state. **Inconsistent Changes Across Environments:** Without a standardized process, different environments might undergo separate updates, leading to drift over time. **Workarounds:** In urgent situations, developers might implement temporary workarounds that don't align with the standard configuration, causing drift. ‍**A well-known tech fact_:_** _If there's a workaround to a solution, expect it to be used and eventually abused, and workarounds in cloud-native applications and environments could eventually result in the same - abuse!_ Negative impact of Infrastructure Drift --------------------------------------- ![Two people on a tandem bicycle symbolizing synchronized environments in infrastructure, highlighting potential malfunction and crash due to lack of coordination.](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/64ddd5b3b2ce60c1cf1674b5tandempercent20v21-1701865750805-original.png) Tandem bicycle = Infra environment in sync with each other Just like a couple on a tandem bicycle, every environment in your infrastructure should work in tandem and be in sync with each other. Now imagine if one of them does not operate as intended, the bicycle will not operate smoothly and may eventually lead to a crash. Similarly, infrastructure drift can result in organization-wide delays in development and deployment goals, especially for the Ops teams, who play a pivotal role in managing and maintaining the infrastructure. It is critical to gain insights about the root causes and consequences of infrastructure drift. These insights could aid in preventing drift by proactively implementing effective strategies. Though quite obvious, here are some of the key negative impacts of infrastructure drift: **Operational Inefficiencies:** Infrastructure drift can lead to operational inefficiencies, as teams spend precious time troubleshooting and rectifying inconsistencies. Instead of focusing on innovation and improvements, Ops teams may find themselves dealing with repetitive issues caused by drift. **Delayed Deployment:** Drift-related issues can cause delays in software deployment. Deployments that worked flawlessly in one environment may fail in another due to discrepancies, necessitating thorough investigations and modifications. **Increased Downtime:** When drift-related problems surface in production environments, they can result in unplanned downtime. Service disruptions have a direct impact on user experience and can be costly for the organization in terms of lost revenue and credibility. **Escalating Support Costs:** Addressing drift-related issues can lead to escalating support costs. Ops teams may need to invest extra resources, including personnel and tools, to troubleshoot and mitigate the consequences of configuration discrepancies. **Security Vulnerabilities:** Inconsistent configurations can inadvertently introduce security vulnerabilities. For example, a forgotten update in one environment may leave a system exposed to potential threats, creating security risks. **Cloud Resource Wastage:** Infrastructure drift can cause cloud resources to be misaligned or underutilized. Misconfigured instances or redundant resources lead to unnecessary cloud costs, impacting the organization's overall budget. **Compliance and Audit Concerns:** In regulated industries, infrastructure drift may raise compliance and audit concerns. Non-compliance with standards and regulations can lead to penalties and damage the organization's reputation. **Difficulty in Scaling:** Inconsistent configurations across environments make it challenging to scale the infrastructure efficiently. As organizations grow and demand increases, infrastructure drift becomes a significant obstacle in achieving seamless scalability. **Resource Intensive Remediation:** Rectifying infrastructure drift can be resource-intensive, requiring extensive manual effort and rework. This diverts valuable resources from more strategic initiatives and slows down development cycles. Common Approach to Manage Infrastructure Drift ---------------------------------------------- [Infrastructure as Code (IaC)](https://www.facets.cloud/no-code-infrastructure-automation) \- IaC is a go-to option for organizations looking to mitigate infra drift, but still, most of them struggle to handle it efficiently. This is because IaC, when not done right, does not help. Even when IaC is implemented, there is still an option to use workarounds, which can again be abused - leaving behind a long trail of damage control to be done. ‍**Continuous Configuration Management -** Implementing a continuous configuration automation tool like Ansible enables companies to consistently manage configurations across the infrastructure. The tools are designed to continuously monitor consistency across the infrastructure and send out alerts if any discrepancies are found. However, this approach feels a bit outdated in the Kubernetes Era. The above approaches may work for some companies. However, there’s more to be done to consistently [maintain a drift-free infrastructure](https://blog.facets.cloud/comprehensive-approach-to-maintaining-a-drift-free-infrastructure/). How To Efficiently Prevent Infrastructure Drift? ------------------------------------------------ By embracing the [Single Source of Truth](https://en.wikipedia.org/wiki/Single_source_of_truth) (SSOT) as a core tenet of infrastructure management, organizations can build a robust foundation for a drift-free and stable cloud-native environment. The central platform empowers Ops teams with the tools they need to navigate the complexities of infrastructure management while staying ahead of the digital transformation curve with confidence and precision. Let's see how a [single source of truth](https://www.facets.cloud/no-code-infrastructure-automation) can help organizations attain a drift-free infrastructure and other tangible benefits in the long term. ### **Building a Single Source of Truth** With the above setup in place, the person deploying an application does not have access to make any changes to the infrastructure. If changes need to be made, they have to be done at the source level - which is your single source of truth - imagine it to be a [blueprint of your infrastructure](https://readme.facets.cloud/docs/blueprint). ‍**What Are the Characteristics of Single Source?** --------------------------------------------------- ### ‍**Automation to maintain guardrails** Automation plays a critical role in maintaining guardrails in infrastructure management. It ensures that all environments are standardized with the intended conditions. Although some flexibility is allowed, any changes must be made only to the source, without bypassing the automated process. The single source of truth helps developers by providing audit trails, transparency, and guardrails within which changes are facilitated. The goal of having these guardrails is to make developers' lives easier, not just in the present but also in the future. Along with automation, there should be flexibility allowing developers to make changes. Here is how automation should function: ### ‍**Flexibility with Liability** Flexibility with liability means that the developers have the flexibility to make changes to the environment, but within the guardrails. And making these changes also comes with a liability to do so. Frequent synchronization ensures that any changes to the source are audited and rectified promptly. ### **Auditable** The single source should have a historical record of changes made to the infrastructure configurations. If a drift is detected, it should be easy to rollback to the previous states. Additionally, the auditability can also provide insights into who made the change, and when it was made. Enabling teams with better accountability and traceability.  ### **Immutable Infrastructure** With the central source of truth, the infrastructure is treated as immutable. Any new changes and updates are then made to the single source that reflects a ‘Blueprint’ of the infrastructure. This also discourages deviating away from the single source, reducing the likelihood of drift.  ### **Continuous Synchronization** Single source ensures that there’s continuous sync happening across the environments. Any new changes introduced at the sources are propagated to the relevant environments by an automated process. This ensures that every environment is consistent with the latest configurations.  ### **Transparency**  Developers can have transparency across the complete infrastructure instead of just being concerned about their own environments.’ **Conclusion** -------------- If not managed promptly and proactively, infrastructure drift can become a significant obstacle for companies to achieve their business goals. A single source of truth is the key to [achieve consistent and drift-free infrastructure](https://www.facets.cloud/). Embracing a centralized solution with right strategies will enable organizations to meet the demands of cloud-native environments and excel in the tech-driven landscape. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## K8s PostgreSQL Operator: Transform your database management experience Author: Rohit Raveendran Published: 2023-07-25 Category: Tech Articles Tags: PostgreSQL Operator, K8s PostgreSQL URL: https://blog.facets.cloud/k8s-postgresql-operator In our Kubernetes-centric world, certain tasks still fall outside the scope of Kubernetes, leading to a disjointed experience. Traditional methods often lack the automation and consistency that we have come to expect. One such task that highlights this challenge is managing database credentials. As we delved deeper into this challenge, we realized that database credentials need to be a part of the K8s fold. Allow us to introduce the PostgreSQL Operator, a tool designed to transform your database management experience. **How does it work?** --------------------- The PostgreSQL Operator introduces two Custom Resource Definitions (CRDs) - Role and Grant. The Role CRD is used to define a database user while the Grant CRD is used to specify the privileges and permissions granted to the user on database. The operator then automates the creation and management of these roles and grants in the PostgreSQL database. By leveraging the power of Kubernetes reconciliation loops, the operator ensures that the actual state of the system always matches the desired state, providing a reliable and consistent database user management experience. ### Advantages The PostgreSQL Operator brings two significant advantages to the table - it empowers developers and abstracts complex PostgreSQL tasks. **Developer empowerment** Traditionally, managing database credentials and permissions has often required a central team acting as gatekeepers, resulting in slow and inefficient processes that create bottlenecks in the workflow. The PostgreSQL Operator revolutionizes this process by granting developers the autonomy to define roles and permissions using the Role and Grant CRDs. This reversal of control means that instead of a central team managing the database, developers can write a manifest, get it approved, and seamlessly move it along a CI pipeline to create users. This democratization of the process significantly speeds up workflows and enhances overall efficiency. **Simplifies PostgreSQL tasks** ------------------------------- On the other hand, the PostgreSQL Operator also simplifies complex PostgreSQL tasks. Take managing users and teams for instance, each may require varying levels of access to various tables within PostgreSQL schema. It is a complex task involving intricate SQL queries to setup and maintain fine-grained access control. With the PostgreSQL Operator, you can easily define such permissions in a Grant CRD, which the operator then translates into the appropriate SQL commands. This abstraction of complex tasks ensures that permissions are always up-to-date, thanks to the power of Kubernetes reconciliation loops. In essence, the PostgreSQL Operator not only empowers developers by democratizing the process of managing database credentials but also simplifies complex PostgreSQL tasks, making the management of PostgreSQL databases more efficient and inclusive. **Installation** **1\. Pre-requisite:** A Kubernetes secret that contains base64 encrypted PostgreSQL Database details, such as username, password, endpoint, port, database and role\_password . **2\. Install the Helm Chart**: To begin using the PostgreSQL Operator, start by installing the Helm chart provided in the official repository. You can find the chart at the following GitHub URL: [https://github.com/Facets-cloud/postgresql-operator/tree/main/chart](https://github.com/Facets-cloud/postgresql-operator/tree/main/chart). **3\. Create CRDs for Roles and Grants**:Once the Helm chart is successfully installed, you need to create a Custom Resource to define the desired roles and grants for PostgreSQL. The PostgreSQL Operator repository provides examples for creating custom resources. Here are a couple of examples: * Role Example - [https://github.com/Facets-cloud/postgresql-operator/blob/main/examples/role.yaml](https://github.com/Facets-cloud/postgresql-operator/blob/main/examples/role.yaml) * Grant Example - [https://github.com/Facets-cloud/postgresql-operator/blob/main/examples/table-grant-all-table-all-privilege.yaml](https://github.com/Facets-cloud/postgresql-operator/blob/main/examples/table-grant-all-table-all-privilege.yaml). You can explore more examples under the examples directory in the PostgreSQL Operator repository. **4\. Check Role and Grant Status** After creating custom resources, you can verify the status of the roles and grants using kubectl commands. Run the following commands: a. To check the status of roles: * This command provides an overview of the defined roles and their current status within the PostgreSQL cluster. b. To check the status of grants: * This command provides an overview of the defined grants and their current status within the PostgreSQL cluster. By checking the role and grant status, you can ensure that the desired state specified in the CRDs is being applied correctly within the PostgreSQL cluster. **Demo** -------- A quick demo on deployment of PostgreSQL Operator, creation of Role and Grant ![deployment of PostgreSQL Operator, creation of role and grant](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/64c0d9071079283f42de7ad6demo-1701865752133-original.gif) Deployment of PostgreSQL Operator **Before you go** Give the [PostgreSQL Operator](https://github.com/Facets-cloud/postgresql-operator) a try today and experience the revolution in database management. We welcome contributions, so join us in enhancing this tool and shaping the future of PostgreSQL management. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Dynamic Cloud Interoperability: Redefining Cloud Agnosticism With Ease Author: Anshul Sao Published: 2023-05-31 Category: Blogs Tags: cloud agnostic , cloud interoperability URL: https://blog.facets.cloud/dynamic-cloud-interoperability-redefining-cloud-agnosticism ![Flowchart showing a developer choosing MySQL services with options for AWS Aurora, GCP CloudSQL, and Azure Flexible Server via a DCI Manifest.](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/frame-1000004675image-4-1703159624693-compressed.png) In today's digital era, cloud computing is not a luxury but a necessity for businesses of all sizes. Many of these businesses strive to achieve a 'cloud-agnostic' architecture.  The goal? To avoid dependency on a single cloud provider and benefit from the competition, flexibility, and risk mitigation. But this pursuit often leads to a tricky conundrum: How can businesses navigate different cloud services without getting 'locked in' to one provider? The Challenge of Being Cloud Agnostic ---------------------------------------- Being '[cloud agnostic](https://blog.facets.cloud/cloud-agnostic-and-cloud-native-at-the-same-time/)' ideally means a business can switch between Amazon's cloud (AWS), Microsoft's (Azure), Google's (GCP), or others without much difficulty. However, achieving true cloud agnosticism often comes with considerable time, effort, and cost. A common misunderstanding is that being cloud agnostic means avoiding the managed services each cloud provider offers, like AWS RDS or S3. This misunderstanding leads businesses to handle everything in-house, such as setting up and maintaining databases, which can quickly become a costly and time-consuming process. Introducing Dynamic Cloud Interoperability (DCI) ------------------------------------------------ At Facets, we challenged this narrative. We wondered: What if businesses could remain cloud agnostic and still leverage the convenience of managed services each cloud provider offers? And thus, the concept of "[Dynamic Cloud Interoperability](https://blog.facets.cloud/mastering-cloud-flexibility-with-dynamic-cloud-interoperability/)" (DCI) was born! We've developed an abstraction layer(DCI Manifest) that serves as the cornerstone of Dynamic Cloud Interoperability. This revolutionary technology allows businesses to employ the same infrastructure setup across AWS, Azure, and GCP without any need to alter their applications. Essentially, a database service like AWS RDS can smoothly transition into a CloudSQL in GCP or a flexible server in Azure with zero hassle! We have built a repository of manifests, including, but not limited to the following examples: DCI Manifest Implementation Redis Helm chart (All), ElastiCache(AWS), Memory Store (GCP), Azure Cache(Azure) Load Balancer NLB + Nginx (all), ALB (AWS), GLB (GCP), AGIC(Azure) This fresh approach allows businesses to switch between cloud providers without forfeiting their beneficial services. No more anxiety over vendor lock-in. No more resource-draining setup and maintenance. It's a significant win, we believe! Does DCI excite you as much as it excites us? We see it as a significant leap forward in cloud computing, merging the best parts of each cloud service while preserving the freedom to switch between them. If you're eager to be a part of this transformative chapter in cloud computing, we invite you to [connect with us](https://www.facets.cloud/book-a-demo). Let's reshape the future of cloud computing together. With Facets and DCI, the future of the cloud looks brighter and more accessible than ever! --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Transform your DevOps access control with Facets RBAC Management Author: Anshul Sao Published: 2023-05-19 Category: Product News Tags: devops, rbac management URL: https://blog.facets.cloud/transform-your-devops-access-control-with-facets-rbac-management Access management is a crucial part of managing resources safely. However, it can often be as complex as a challenging puzzle. Addressing the Access Management Challenge ------------------------------------------ Ops must grapple with various tools for releases, cloud accounts, etc., each necessitating distinct access management. The inconsistency of access language across these platforms only adds to the problem, making it a cumbersome task to provide developers with the appropriate access for their daily work. Facets' RBAC Management aims to remedy this inefficiency, offering a consolidated solution in a language that aligns with your product. ![GIF of Facets.cloud User Management interface showing system-defined roles, a selected 'Roles' tab, and a 'Create Custom Role' button for customization.](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/6467237b87b740a6fa8593a0rolesgroups-1701865754299-original.gif) Access management with Facets ### Communicating the Language of Access The language of access control has long been a stumbling block for many organizations. The existing tools often fail to comprehend or accommodate the needs of businesses wanting to limit access to specific environments or group of resources. This not only creates friction in daily operations but also presents significant challenges during audits and compliance reviews. Facets' RBAC Management offers a solution that speaks the language of your business. Whether it's restricting access to specific environments, granting read access to resources in production environments, or limiting the microservices and environments where a developer can perform releases - [Facets](https://www.facets.cloud/) has you covered. ​[Kubernetes](https://readme.facets.cloud/docs/viewing-persistent-dashboard-for-k8s-events-in-grafana) access management, a substantial issue in itself, will be addressed in detail in subsequent product news. Effortless, Intuitive, and Powerful ----------------------------------- Facets' RBAC Management offers a one-stop solution for managing access across your DevOps processes. Read about User management [here](https://readme.facets.cloud/docs/user-management-2). It simplifies access management, providing control in the language you comprehend best, and lays the groundwork for a smoother, more efficient DevOps experience.‍ With Facets' RBAC Management, you can now divert less time to managing access and more time to what genuinely matters - delivering superior software, quicker and more effectively. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Overcoming Local Development Hurdles in Microservices with Environment Leasing Author: Anshul Sao Published: 2023-05-09 Category: Tech Articles Tags: frontend applications, microservices URL: https://blog.facets.cloud/overcoming-local-development-hurdles-in-microservices-with-environment-leasing In today's development landscape, local testing of code has become increasingly challenging, especially in setups that involve microservices. As the architecture grows more intricate, dependencies multiply, and applications become more challenging to run locally, developers face a daunting task to ensure their code works seamlessly in such environments. To address this issue, it is important to understand the challenges developers face in testing their code locally. These challenges are broadly classified into three categories for OLTP systems: * **Frontend applications:** These are static SPAs or mobile applications that access a publicly exposed API server. Local development is relatively straightforward for these applications, as developers can easily point to any test environment and test all features. * **REST or SOAP APIs:** These services sit behind load balancers and receive direct API calls. They call other microservices, databases, caches, and sometimes cloud resources, making it difficult to establish connectivity to all these dependencies from a local environment. Even if developers manage to run the service locally, mocking all the dependencies can be a tedious and inconsistent process. * **Private RPC microservices:** These services may have RPC or HTTP endpoints but are called from other microservices internally and can have dependencies like Public APIs. Testing these APIs locally is even more challenging, as developers need to run other microservices that call them to test the integration effectively. Although calls can be mocked, the challenges mentioned for Public APIs still apply.‍ **Ephemeral Environments or Environment-as-a-Service (EaaS)** are often proposed as solutions to the challenges of local testing in microservices. EaaS solutions offer distinct setups for each developer or pull request. This can be a scalable approach to address the issue, as it allows developers to test their code in a consistent environment that mirrors the production environment. While this method might seem straightforward at first glance, implementing it effectively can be complex. Solutions, such as launching copies of microservices in Kubernetes for every pull request, might not fully address all the complexities involved. The substantial challenge comes in managing aspects like routing, service discovery, shared databases, and complex dependencies. Handling these tasks individually can be a significant undertaking for developers, making it crucial to explore other potentially more manageable solutions. Environment leasing ------------------- An alternative solution, especially for those using Kubernetes, involves setting up a shared Dev environment. In this environment, all services run versions that mirror their Quality Assurance (QA) counterparts – essentially stable, testable versions of the services. It is the joint responsibility of the team to keep this environment stable. All feature development should occur locally, with developers performing rigorous testing until their work is ready to merge into the mainline codebase, often referred to as the 'master' branch. This approach is key to maintaining a shared Dev environment. With that in mind, let's explore how local development would unfold within this shared Dev environment: * **Frontend applications:** These static SPAs or mobile applications that accessed a publicly exposed API server. With local development, developers could easily point these to the Shared Dev environment for feature testing. So, the process remains largely the same * **REST or SOAP APIs:**  These services posed a challenge as they sat behind load balancers and received direct API calls. They called other microservices, databases, caches, and sometimes cloud resources, making it difficult to establish connectivity to all these dependencies from a local environment. However, with tools like [Telepresence](https://www.telepresence.io/) , developers can now create a two-way network proxy between the local machine and the remote Kubernetes cluster. This allows them to develop and debug services as if they were running in the shared Dev environment, simplifying the process significantly. * **Private RPC microservices:** These services may have RPC or HTTP endpoints but are primarily called from other microservices internally and can have dependencies like Public APIs. Earlier, testing these APIs locally was a difficult as it required running other microservices that called them. But now, with Telepresence, developers can set their services as sinks and call downstream microservices in the shared Dev Environment to receive calls on their laptops, enhancing debugging capabilities and reducing local setup. ![Diagram of a Kubernetes Cluster on AWS with API, services, database, and local machine connections.](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/frame-1000004679image-1703160679531-compressed.png) In the shared Dev environment, the ability for concurrent, non-interfering development is a significant advantage, especially with Frontend applications and REST or SOAP APIs. Multiple developers can interact with the shared Dev cluster simultaneously. However, this is different for Private RPC microservices. Given the fact that only one sink can be active at a time, it necessitates an environment leasing process per such service, requiring coordination among team members. Despite this, the significant advantages - such as enhanced debugging capabilities and a streamlined local setup - make this approach a worthwhile strategy for managing these complex services.. Another potential challenge with this approach arises when multiple features in backend development might necessitate different database schemas. However, it's important to remember two things: firstly, not every feature will induce changes to the schema. Secondly, any schema-level changes should be thoroughly considered and implemented before the development itself and should reach the QA environment ahead of the development. Adhering to these principles ensures that there will be no conflicts when merging code from different features down the line, thereby maintaining the integrity and efficiency of the development process. To Summarize... --------------- While Ephemeral environments or Environment-as-a-Service (EaaS) offer a solution for local testing, they may not be the best fit for all, due to potential complexity and cost considerations. By understanding the challenges associated with different types of applications and adopting a shared development environment approach in Kubernetes, developers can effectively test their code locally, streamline the development process, and enhance collaboration within their teams. This approach, combined with tools like Telepresence, offers a more cost-effective and manageable solution for local code testing and debugging. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Now Stay Proactive with Advanced Alerts And Notifications Author: Rohit Raveendran Published: 2023-05-03 Category: Product News Tags: notifications & alerts, notificaiton subscription URL: https://blog.facets.cloud/staying-proactive-with-advanced-alerts-notifications Staying informed about potential issues in a complex system is crucial to ensure system reliability, uptime, and user satisfaction. Without timely information, issues can go unnoticed, leading to increased downtime, decreased performance, and frustrated users. By staying informed, teams can quickly identify and address issues, preventing them from escalating and impacting system health. Manual Setup is Complex  ------------------------ It's time-consuming and complex to set up notifications for each individual event or deployment across multiple systems and environments. A typical organization will configure notifications in Build and CD tool for deployments, alerts in Prometheus, and environment-related notifications with cloud providers. This implementation also lacks proper visibility for all configured notifications. ![Screenshot of Facets.cloud Notification Center with highlighted Notifications tab and Create Channel button in the settings menu](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/6452227819901e0b00a10544notifi2-1701865756403-original.gif) Facet's Control Pane Notification Subscription Feature --------------------------------- Facets provides users with the ability to easily subscribe to various notifications, catering to different roles and requirements. SREs can subscribe to in-environment Prometheus-based alerts, while Administrators can track all environment changes, such as deployments and modifications to environment attributes. Additionally, QA teams can receive notifications for deployment success and release updates, enabling them to trigger their test suites accordingly. To ensure that users receive only the most relevant information, Facets allows them to fine-tune their notification subscriptions. This includes narrowing down the scope of alerts by severity, focusing on specific application deployments, and more. Facets also offers flexible delivery options for notifications, supporting integration with services like PagerDuty, Zenduty, and Slack. Alternatively, users can configure web-hook notifications with structured data to call other integrations or trigger jobs in systems like Jenkins, further enhancing the platform's versatility. In conclusion, Facets provides a single-pane view of all notifications with a robust and effective way to stay informed about events and deployments across systems, with the flexibility to customize notifications to their specific needs. Out-of-the-box Integrations --------------------------- Facets offers integration with [Slack](https://slack.com/intl/en-gb), [Zenduty](https://www.zenduty.com/), and [Pagerduty](https://www.pagerduty.com/) out of the box. Users can also choose to add custom Webhooks and payload to receive notifications. Read more about it [here](https://readme.facets.cloud/docs/notification-channels-subscriptions) and find out how your developers can stay on top of alerts and notifications with ease! --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Rethinking Architecture: From Unstructured Diagrams to Structured Requirements Author: Rohit Raveendran Published: 2023-04-24 Category: Blogs Tags: devops maturity, software architecture URL: https://blog.facets.cloud/rethinking-architecture-from-unstructured-diagrams-to-structured-requirements Software architecture is the backbone of any software system. Architecture refers to the high-level organization of a software system, including its components, their relationships, and the interactions between them. It provides a **blueprint** for designing, implementing, and maintaining software systems. Traditionally, documenting software architecture involves creating static diagrams and design documents that describe various aspects of the system, such as its structure, behavior, and deployment. Traditional documentation methods often fall short when it comes to keeping up with the pace of development. ![Infographic illustrating the growth in DevOps maturity, showing progression from manual to automated processes between Developers, Operations, and Cloud Architecture](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/644672ffeb060348372689fddevopspercent20-1701865757571-original.png) DevOps Maturity Model **Problems with the traditional approach** **Static, stale and hard to maintain:** As the system evolves, these documents often become outdated, leading to inconsistencies between documentation and actual implementation. Keeping the documentation up-to-date is challenging for busy teams. **Tribal knowledge:** Many aspects of the system may be known only to a few individuals, making it difficult for new team members to get up to speed. Oftentimes, this knowledge goes away with the employee, leaving a gap in understanding of the system. **Reinventing the wheel:** Teams may implement their own solutions to common problems, unaware that other teams have already solved the same issue. **Our philosophy: architecture-first approach** We believe that architecture documentation should be single source of truth. Any new change, either in terms of software or infrastructure, should be made in the architecture and then percolate down to the environments. This approach promotes consistency, reusability, and a better understanding of the system's overall design. In order to achieve a mature DevOps process, we need to enable both, Devs and Ops, to make changes in architecture with proper guardrails. We accomplish this by following these three core principles: **1\. Consistent language** We propose using a Structured Architecture Requirement Language to formally describe the system architecture. It also helps in modularisation and reuse, accelerating the entire development process and promoting a cohesive user experience. **2\. Architecture as single source of truth** Architecture documentation needs to be the single source of truth for deployments. Everyone can use the common protocols to make changes to architecture, preventing any isolated changes in environments. This also enables powerful things like audit, versioning and rollback. **3\. Automation** With recent advancements in cloud SDKs, Kubernetes and Terraform, standardization of well-architected principles, we now have the possibilities to introduce automation for infrastructure provisioning and configurations taking the blueprint as an input. Manual operations or custom automations are time consuming and error prone.‍ To understand different levels of DevOps maturity, let’s Imagine that a new product is being developed and it requires deployment of a few microservices. It also requires a MySQL database, with backups, read/write optimisations and other fine tuned configurations. In the traditional approach, Devs will raise a request with the Ops team to provision infrastructure for them. To accomplish this, the Ops team will leverage some scripts and manual steps after understanding the requirements. With this approach: Architecture documentation is not updated and soon becomes stale. Developers rely on Ops team for architectural changes. Manual/automated deployment by Ops team can cause inconsistencies. If we consider an evolved DevOps process for the same, then both Dev and Ops team read from and update the architecture document. Devs will add their requirements in architecture and Ops will provision the required items using a combination of scripts and manual steps. This solves the problem of stale architecture, however manual steps and scripts still present the same sort of issues with inconsistent deployments. As the DevOps process reaches maturity, architecture acts as a single source of truth for automated deployments. In this case, Devs can still add their requirements to architecture, but the provisioning is taken care of automatically, freeing up the Ops team and reducing any risk of manual errors. **Possibilities beyond provisioning infrastructure** We propose that architecture should not be limited to infrastructure and micro-services. Things like alerts, observability, monitoring, CD pipelines, database schemas etc. should also be part of architecture.  A uniform and automated way to plug in alerts to any service, or enabling APM, or setting up a new CD pipeline, possibilities are endless here. Enabling developers with a self-service model for infrastructure requirements also empowers the DevOps team to take on bigger challenges like cost optimization, disaster recovery, compliance, cloud posture, optimizations and security, resulting in overall improvement at organizational level. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Prevent Downtime with The K8s SSL Validity Exporter Monitoring Solution Author: Rohit Raveendran Published: 2023-04-24 Category: Tech Articles Tags: SSL Validity Exporter, Kubernetes URL: https://blog.facets.cloud/k8s-ssl-validity-exporter-monitoring-solution Did you know that one of the most common causes of website downtime is expired SSL certificates? When certificates expire, it can cause website and application downtime, which can result in lost revenue and damage to reputation. Despite its critical importance, not monitoring SSL certificate expiration dates is a common mistake organizations make. In 2018, an expired SSL certificate caused a [widespread outage](https://www.bbc.co.uk/news/business-46499366) for O2 mobile customers in the UK. And in 2020, a root CA certificate issued by Sectigo expired, [causing issues](https://access.redhat.com/articles/5117881) for a large number of websites. Announcing the K8s SSL Validity Exporter ---------------------------------------- Facets is excited to introduce [**K8s SSL Validity Exporter**](https://github.com/Facets-cloud/k8s-ingress-ssl-metrics-exporter), which provides a central monitoring solution for SSL certificate expiration dates in any Kubernetes cluster. It also exports the metric for the entire certificate chain, including root and intermediate certificates, thus providing a comprehensive solution to this issue. With our solution in place, you can rest easy knowing that your SSL certificates are being actively monitored. How does it work? ----------------- The exporter scans for Kubernetes ingress objects to determine the unique set of domains to monitor. It then initiates a TLS connection and retrieves the certificate chain for each domain. For each certificate in the chain, the exporter publishes a gauge metric called **ssl\_expiry**, with the number of days until expiry as the gauge value, and relevant labels. ### Installation The easiest way to deploy our exporter is via deploying the helm chart: * Add the Facets helm repository: * Install the helm chart: In case you use the prometheus-operator, our helm chart creates a ServiceMonitor that ensures that your prometheus is configured to scrape the new **ssl\_expiry** metric. ### Setting Up Prometheus Alerts To configure Prometheus to alert the relevant teams when an SSL certificate is nearing expiration, create a new rule using the following YAML: With this setup, tracking SSL certificate expiration dates will be a breeze, and you can rest easy knowing that your website is secure. ### Before you go… Facets uses the K8s SSL Validity Exporter extensively in our product, as do our customers, and website downtime due to SSL certificate expiration has become a thing of the past. We welcome you to try it and let us know your feedback. You can also contribute directly to the [Github](https://github.com/Facets-cloud/k8s-ssl-validity-exporter) project.   At Facets, we solve these types of issues on a daily basis. We recommend you try out our self-serve DevOps automation platform, [Facets.cloud](https://www.facets.cloud/). Reach out to our teams for a [demo](https://www.facets.cloud/book-a-demo) and learn how we can help you transform the last mile of cloud delivery! --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## From Tracking Success To Becoming Efficient: Defining DevOps Metrics Author: Anshul Sao Published: 2023-04-13 Category: Blogs Tags: devops metrics, dora metrics URL: https://blog.facets.cloud/defining-devops-metrics Peter Drucker is often quoted as saying, "you can't manage what you can't measure”. This is especially true in the world of technology and software development. In DevOps, **collecting, analyzing, and measuring** metrics is a critical aspect. A [study by Atlassian](https://www.atlassian.com/whitepapers/devops-survey-2020) in 2020 found that DevOps success directly correlates with important metrics such as MTTR and Lead Time, as well as business metrics like revenue and profitability. On the other hand, another opinion resonated amongst the respondents of the study: “It is difficult to measure the impact of DevOps progress, and organizations do not have a clear way to measure success,” voted half of the respondents. In this post, we will be discussing how organizations can gain valuable insights into their development and deployment processes, and make data-driven decisions to improve efficiency using DevOps metrics. **Why measure DevOps Metrics?** ------------------------------- Effective measurement of [DevOps metrics](https://blog.facets.cloud/defining-devops-metrics/) enables teams to have a bird's-eye view of the development and deployment process, enabling them to identify bottlenecks and areas of improvement. **Identifying bottlenecks:**  It is important to track where delays are occurring in the development and deployment process and work to eliminate them. Metrics such as Lead Time and Cycle Time can be used to track delays. **Improving collaboration:** To improve collaboration between teams, metrics such as Change Failure Rate and Mean Time to Recover (MTTR) can help teams identify areas where collaboration can be improved. **Increasing visibility:** Providing company-wide visibility into the entire development and deployment process can offer insights into how long it takes to move from one stage to the next. **Identifying cost-saving opportunities:** By reducing operational inefficiencies, teams can identify areas where they can save money. Metrics such as Deployment Frequency and Lead Time can direct teams where to optimize their DevOps processes. **4 Key DevOps Metrics according to DORA** ------------------------------------------ The goal is to measure performance by breaking down abstract processes in software development and making them visible through data. These data points would then guide stakeholders to take necessary steps to streamline processes and increase software throughput and stability. Additionally, we will reference [DORA's 2022 State of DevOps report](https://cloud.google.com/blog/products/devops-sre/dora-2022-accelerate-state-of-devops-report-now-out) to compare team performance between low-to-medium-to-high performers and see what success and efficiency look like. **Mean Time To Recovery**: MTTR, short for Mean Time to Recovery, is a metric that calculates the average duration required to recover from product and system failures, known as incidents. For instance, if an organization pushes improvements in the form of better automations or enhanced collaboration between teams, they may experience a decline in MTTR. This suggests that the enhancements are having a positive impact, leading to faster incident resolution and better service availability. According to the DORA report, teams can evaluate their performance based on the time required to restore service during an outage. High-performing teams take less than one day to restore unplanned outage or service impairment.‍ **Lead Time**: Lead time refers to the time it takes for a feature or user story to move from the initial request to the point of release to production. It includes the time taken for planning, development, testing, code review, and deployment, as well as any delays that occur during the process, such as waiting for approvals or dependencies. For example, if lead time is consistently high, it may indicate bottlenecks in the development process, such as slow code reviews or inefficient testing practices. DORA has consistently found that high performers have shorter lead times than low performers.‍ ‍**Deployment Frequency**: Deployment frequency measures how often changes are deployed to production. A higher deployment frequency means a quicker release of new features and updates, a faster response to customer feedback, and a reduced risk of disruptive deployments. In contrast, organizations that deploy infrequently may struggle to keep up with customer demands and market changes. This can result in frustrated customers, missed opportunities, and, ultimately, lost revenue. Overall, a high deployment frequency is a key indicator of a successful DevOps culture. According to DORA, high-performing organizations deploy code to production for end-users on demand, whereas low-performers take up to once per month or even six months to deploy one code.‍ **Change Failure Rate:** Change Failure Rate (CFR) measures the percentage of changes that fail during deployment. A high CFR indicates issues with the development or deployment process, such as poor testing or insufficient automation. To improve CFR, focus on improving testing and quality assurance processes, as well as investing in infrastructure that can support frequent code deployments. Overall, a low CFR is a key indicator of a successful DevOps culture that values quality and reliability. Organizations that prioritize this metric are likely to see improved customer satisfaction and faster time to market. **DevOps Metrics beyond DORA** ------------------------------ While the DORA metrics are widely used and highly effective, there are many other metrics that can provide valuable insights. In addition, I am also providing a link to a [Google Sheet](https://docs.google.com/spreadsheets/d/1iBsW9IMXed1iQUUX92N9pwKiGRThStQV_mMLCVXrnmY/edit#gid=0) that includes several other metrics, including the ones listed below. The sheet outlines their basic definition, how to measure them, and the stage at which to measure those metrics. **Cycle Time**: Cycle time refers to the time it takes for a feature or a user story to move from the start of the development process (such as planning or design) through to deployment and release. It includes all the steps involved in the development process, including coding, testing, code review, and deployment. This metric offers valuable insights into the speed and efficiency of the software delivery process, enabling teams to identify bottlenecks and streamline operations. The goal is to reduce cycle time by automating as many tasks as possible, removing bottlenecks, and improving collaboration and communication between teams. While a good cycle time will vary based on the complexity of the application and the specific needs of the business, high-performing DevOps organizations typically aim for a cycle time of just a few hours or less. **Defect Escape Time**: Defect Escape Time is a critical metric that measures the time it takes for defects to be detected from the point they are introduced into the code until they are discovered in production. A high Defect Escape Time indicates that defects are not being detected early in the development and testing process, which can lead to a poor user experience. Generally, a defect escape time of a few hours or less is considered good in a high-performing DevOps organization. However, the acceptable level of defect escape time can vary depending on the criticality of the application, the business requirements, and the level of risk associated with defects. **Defect Escape Rate**: Defect escape rate measures the percentage of defects that are not detected during testing and are discovered after the software is deployed in production. Defect escape rate can be calculated by dividing the number of defects discovered in production by the total number of defects. A good Defect escape rate in DevOps is a low percentage. Generally, a defect escape rate of less than 5% is considered good in a high-performing DevOps organization. **Automated Test Pass Percentage:** The automated test pass percentage is defined as the percentage of automated tests that pass successfully during a specific period. For instance, if a software team runs 100 automated tests and 90 of them pass, the automated test pass percentage would be 90%. This metric indicates the reliability of the software being developed; a high pass percentage suggests that the software is likely to be free of bugs and issues, while a low automated test pass percentage suggests that there may be issues with the software that need to be addressed. **Application Performance Metrics**: Application performance measures responsiveness, reliability, and scalability – gauging how well an application performs – in terms of speed, stability, and resource utilization under various workloads and conditions. Metrics related to application performance help teams monitor and optimize the application's performance throughout the development lifecycle – from development and testing to deployment and production. For instance, response time measures the time it takes for the application to respond to a user request, while throughput measures the number of transactions or requests that the application can handle in a given period. Error rate indicates the frequency of errors and failures in the application, and resource utilization measures the amount of resources such as CPU, memory, and disk space, that the application uses. By tracking these metrics, teams can pinpoint performance issues and bottlenecks and optimize the application to ensure it satisfies performance requirements and provides a positive user experience. **Challenges in measuring DevOps Metrics Correctly** ---------------------------------------------------- Measuring DevOps metrics can be a tricky affair, as there are several challenges that organizations must overcome. **Lack of standardization:** One of the main challenges in measuring DevOps success is the lack of [standardization](https://blog.facets.cloud/standardization-in-security-cost-compliance-and-observability/). Different organizations have their own unique approaches to DevOps implementation, making it difficult to develop a standard set of metrics that can be used to measure success. This makes it challenging to benchmark progress, and can lead to confusion and frustration amongst stakeholders. **Limited visibility:** Another challenge is limited visibility. DevOps requires effective collaboration and communication between different teams, but tracking progress can be challenging when visibility is limited. The right tools and processes in place to capture and analyze data on different aspects of DevOps implementation can do the trick. **Difficulty in defining and measuring success:** Perhaps the most significant challenge in measuring DevOps success is defining what success actually means. Success can mean different things to different organizations, and even within the same organization, there may be different opinions on what constitutes as success. Some organizations may focus on speed of delivery, while others may prioritize stability and reliability. Defining success requires careful consideration of organizational goals, as well as taking into account the needs of different stakeholders, including customers, developers, and operations teams. **Tools for effective DevOps Metrics tracking** ----------------------------------------------- To effectively track DevOps metrics, it is important to use a combination of tools to gather data and insights from every stage of the DevOps pipeline. Here are three types of tools that can help you effectively track key metrics: **CI/CD Tools:** CI/CD tools help automate the process of building, testing, and deploying software, enabling you to measure metrics such as build success rates, test coverage, and deployment frequency. Some popular CI/CD tools are Jenkins, CircleCI, and Travis CI. **Monitoring Tools:** Monitoring tools help you track the performance of your software in production environments. By monitoring key metrics such as server response times, error rates, and user activity, you can identify potential issues and make data-driven decisions to improve your application's performance. Some popular monitoring tools are New Relic, Datadog, and Prometheus. **Collaboration Tools:** Effective collaboration is critical to the success of any DevOps team. Collaboration tools such as Jira, Trello, and Asana can help you track progress on tasks, assign responsibilities, and communicate with team members. By tracking collaboration metrics such as task completion rates and response times, you can identify areas where collaboration can be improved. **Customer Support Ticketing Tools:** All the metrics in the world are irrelevant if your efforts do not reduce customer support tickets. Tools such as Zendesk, Freshdesk, Jira Service Management, and Salesforce Service Cloud help teams manage customer support tickets effectively by providing a centralized platform for ticket management, communication, and knowledge base creation. In addition, using these software tools, you can automate features like canned responses and workflows, helping to reduce response time and improve ticket resolution time. ### **Let’s Co****nclude** While standard metrics can guide you towards a well-run DevOps practice, it is important to design your own metrics based on your current development and business context. We perform regular studies with engineering leaders to learn how they use metrics to transform their tech. In one such learning session, [Kaushik Mukherjee](https://www.linkedin.com/in/kaushikm21/), a seasoned engineering leader, gave us an example of how he used a metric to guide a couple of quarters of development sprints. Kaushik observed that they needed to reduce the defect rate but not the cost-reduced velocity. Hence, they created a metric that would measure the rate of new defects introduced per the number of deployments in a given period, and worked to drive this metric down. That's just one example of how designing your own metrics can help you achieve success and drive continuous improvement in your development process. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Save costs of non-production environments with Availability Rules Author: Rohit Raveendran Published: 2023-04-05 Category: Product News Tags: availability rules , non production environments URL: https://blog.facets.cloud/save-costs-of-non-production-environments-with-availability-rules We are excited to announce the latest addition to Facets – Availability Rules. Non-production environments can account for up to 20% of overall [cloud costs](https://blog.facets.cloud/cloud-cost-optimization-efficiency-by-design/). Switching them off during periods of nonusage, such as weekends or nights, can provide quick savings. However, writing custom scripts or manually managing the lifecycle of these environments can be tedious and prone to missed cost-saving opportunities. With our new Availability Rules feature, nonproduction environments can be scheduled to hibernate and wake up on a defined schedule. For example, you can configure your dev environment to run from 9 a.m. to 9 p.m. and schedule it to scale down after 9 p.m. and scale up again in the morning as required. ![Dashboard view of Facets.cloud Environment Overview for QA with details on release stream, region, node types, and recent successful releases](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/642a74e26b19ef7f45877584scaledownscaleupgiphy-1701865760214-original.gif) Availability Rules with Facet's Control Pane By having the ability to control which deployments to scale down and which to keep running, you can ensure that essential workloads are always available. In addition, [Facets.cloud](https://www.facets.cloud/) ensures that supporting workloads such as Prometheus alert manager, etc., are always accessible. The Availability Rules also shuts down all managed databases in the cloud, such as Aurora or RDS, to help you save on costs. Overall, Availability Rules empower you to manage your non-production environments more efficiently while keeping costs under control. Maximizing ROI on Your Cloud Investments ---------------------------------------- Controlling cloud spend is essential for any organization, but it can be challenging to manage in a SaaS environment. Fortunately, there are some often overlooked strategies that can help you get more from your cloud investment. If you're looking for new ways to optimize your cloud costs, be sure to check out our blog post, "[Managing Cloud Spend in SaaS: 7 Overlooked Opinions](https://www.facets.cloud/blog/managing-cloud-spend-in-saas-7-overlooked-opinions)". You'll find practical advice and expert insights on how to get the most from your cloud infrastructure while keeping costs under control. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Kubernetes ExternalName: A Hidden Pitfall for DNS Resolution Author: Rohit Raveendran Published: 2023-03-08 Category: Tech Articles Tags: Kubernetes, externalname URL: https://blog.facets.cloud/kubernetes-externalname-a-hidden-pitfall-for-dns-resolution Kubernetes is a powerful container orchestration platform that has become increasingly popular for managing containerized applications. One of the key features of Kubernetes is its ability to manage services, which are used to expose applications running inside a cluster to other components within or outside the cluster. One such feature is ExternalName, which enables a Kubernetes service to act as a proxy to an external resource outside the Kubernetes cluster. This feature is particularly useful for accessing resources such as databases, messaging systems, or other external services that are not hosted within the Kubernetes cluster. However, while ExternalName is a beneficial feature, it can also lead to a hidden pitfall for DNS resolution. Let's dive deeper into this pitfall and explore how to avoid it. The Pitfall ----------- Suppose you have a Kubernetes cluster with a service named "default" configured as an ExternalName pointing to [www.google.com](http://www.google.com/). If you attempt to access any service in the default namespace without fully qualifying the domain name, the DNS resolution will start failing. For instance, if you have a service named "serviceA" in the namespace "default," then "serviceA.default" will start resolving to [www.google.com](http://www.google.com/) instead of resolving to "serviceA.default.svc.cluster.local." This happens because Kubernetes creates a CNAME record for "\*.default.default.svc.cluster.local" pointing to the ExternalName resource's DNS name, which in this case is "[www.google.com](http://www.google.com/)." This means that any unqualified DNS query for services in the "default" namespace will be redirected to "[www.google.com](http://www.google.com/)" by default. This can cause unexpected behavior in your applications and make it difficult to diagnose DNS issues. The Solution ------------ To avoid this pitfall, it is recommended to use a different name for the ExternalName service than "default." This will prevent any unqualified DNS queries from being redirected to the ExternalName resource's DNS name. Additionally, it's important to note that this pitfall can occur in any namespace, not just the default namespace. So, when configuring ExternalName services, always choose a unique name for the service to avoid any potential conflicts. Another solution to avoid this pitfall is to fully qualify the domain name when accessing services in the same namespace. This will ensure that the DNS resolution is correct and that your applications can communicate with each other without any issues. Conclusion ---------- ExternalName is a useful feature in Kubernetes for accessing resources outside the cluster. However, it can cause unexpected behavior for DNS resolution if not used carefully. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Streamline Build Promotion Management across Different Environments Author: Anshul Sao Published: 2023-02-28 Category: Product News Tags: Build promotion management , kubernetes dashboard URL: https://blog.facets.cloud/streamline-build-promotion-management-across-environments Promoting builds across different environments can be challenging for most teams. Especially when you have multiple stakeholders involved, like Dev, QA, and Ops teams. If you don't have visibility to the entire build process, it's tricky to ensure that every artifact is deployed to the right environment and the relevant teams are in sync. Typically, organizations can have different build strategies to ensure pipelines are predictable. They can have two methods to decide the process around how builds are promoted to environments: 'branch-based environment pipelines', which has a one-to-one mapping of a branch to environments type, or 'Single branch-based environment pipelines' where artifacts are promoted from the main branch to different environments either manually or through automation. Most commonly, a Hybrid approach is adopted that combines these two methods. However, these methods can be cumbersome and opaque for developers and other parties involved in the process. Facets simplifies the build promotion management process by providing complete visibility into each stage: right from the code integration, through CI/CD systems like Jenkins, Bitbucket Pipelines, Git Actions, down to the final production environment.‍ ![Defining Environment Types with Facets](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/64009ad89b97b87142325669newnewgfgfhd-1701865761336-original.gif) Defining Environment Types with Facets [Facets](https://www.facets.cloud/) offers the ability to define which environment types will receive builds from your CI system and which environments are connected in a promotion hierarchy. This helps developers understand how their builds are flowing and eliminates confusion around which build is destined for which environment. In addition, Facets provides features to help control who has access to promote a build to the next environment and maintain a complete history for the same. This ensures that only authorized personnel can move builds between environments, making build promotion more secure and transparent. In conclusion, Facets streamlines the build promotion management process, providing complete visibility and control over builds from CI/CD systems to the final production environment. Try Facets today to revolutionize how you manage builds across different environments, eliminate confusion, and reduce manual effort. ### Out of the Box Integrations with Facets Facets integrates with a variety of external tools like [Prometheus](https://prometheus.io/), [Grafana](https://grafana.com/), [Kubernetes Dashboards](https://github.com/kubernetes/dashboard), and more! Read more [here](https://readme.facets.cloud/docs/external-tools-and-usage) to see how Facets can help you with the comprehensive management of resources. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## The By-Design Approach to Simplifying Config management - Part 2 Author: Anshul Sao Published: 2023-02-13 Category: Product News Tags: by-design, CONFIG MANAGEMENT URL: https://blog.facets.cloud/simplifying-config-management-part-2 In the first [part](https://www.facets.cloud/blog/simplifying-config-management-part-1) of our series, we discussed the inefficiencies of traditional config management tools and introduced Facets as a simpler solution for managing service endpoints. In this second part, we will focus on another critical aspect of configuration management - credential management. Developers often face challenges when it comes to tracking usernames, passwords, and access to various services and cloud resources. The current process, which involves requesting credentials from DevOps and storing them in a secret store, or even worse, sharing them directly with developers, is both time-consuming and a security risk. Facets solves this problem by automating the entire process of credential management with a by-design approach. Our approach, which you can read more about in our [blog](https://www.facets.cloud/blog/shifting-the-devops-paradigm-from-by-audit-to-by-design), is designed to ensure that the creation, storage, and rotation of credentials are done consistently and securely. This allows developers to access the services they need without worrying about managing different credentials for different environments. By referring to services by their logical or abstract names in the blueprint, Facets eliminates manual effort and reduces the risk of security breaches. All credentials are stored securely, making it easier for developers to access the services they need. In summary, Facets provides a comprehensive solution to the challenges of configuration management, including credential management. Streamline RBAC with Facets --------------------------- Managing RBAC and granting access to developers in multiple environments can be a hassle for DevOps teams. But with Facets, it's time to say goodbye to those headaches! Experience effortless user management through custom role creation, user group management, and granular permissions. Streamline your workflows, enhance collaboration, and make your DevOps life easier with [Facets' RBAC User Management](https://blog.facets.cloud/transform-your-devops-access-control-with-facets-rbac-management/). Find out [how](https://readme.facets.cloud/docs/user-management) now! --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Shifting the DevOps Paradigm: Transitioning from by-Audit to by-Design Practices Author: Anshul Sao Published: 2023-01-24 Category: Blogs Tags: devops, devops metrics, by-design, Proactive DevOps URL: https://blog.facets.cloud/shifting-the-devops-paradigm-from-by-audit-to-by-design In this post, I want to share some insights gained from my advisory engagements on cloud spend. I advised over 20 companies on cloud cost optimization. Unfortunately, what we envisaged to be short-term engagements where I'd analyze bloating costs and then share tips and tricks ended up being multi-quarter engagements requiring continued support. This was highly inefficient! How did we end up here? To answer this, let’s take a step back and examine our DevOps practices and tools in more detail. Endless audit loops… -------------------- While investigating cloud spend, we would dig through spend dashboards, discover insights, sort them through priorities, assign to teams, and then measure them again - _every week_. The toil it took on teams was immense. It was like Groundhog day. We had success on many occasions and celebrated the pure dollar savings. However, this really made me question our fundamental approach- how can we think preemptively about this? Make no mistake, continuous verification and auditing is an absolutely necessary practice. We all need to routinely audit costs and practices, but it should not be the _only_ way. Expanding on this, I realized that it isn’t limited to [cloud cost](https://blog.facets.cloud/cloud-cost-optimization-by-design-a-strategic-approach-to-cloud-cost-planning/). We take this "by-audit’" approach in _almost all_ of our DevOps practices. The by-audit approach --------------------- I call this approach of taking stock retrospectively as 'by-audit'. These days when I come across a DevOps tool or process, I classify whether it  fixes things "by-audit" or "by-design". Let’s look closer at how we approach practices when you think by-audit: 1. **Compliance** : You perform quarterly/annual audits, retrieve the non-compliances and try to isolate which teams they belong to and assign them for fixing. It doesn't guarantee the same issues won't be back in the next audit. 2. **Security** : Higher than required privileges, open database credentials, improper network segmentation, and security groups are common areas where you pull reports out and try to figure out with teams whether they are genuine or misconfigurations. 3. **Disaster Recovery** : Many companies set up simple backups and runbooks on how to recover from disasters and the confidence in these is usually low. So the DevOps team either live with blind optimism hoping for the best or, on the other extreme, live in abject fear and perform DR drills several times in a quarter because they are paranoid that the runbook will drift. 4. **Completeness around Monitoring** : There is always this lack of confidence whether the required dashboard and alerts are complete or not or whether the right team is being notified or not. It’s a guessing game. So as issues happen, we jump into a process of checking all dashboard and alert configurations. We wrote an article on this [here](https://www.facets.cloud/blog/observability-by-design). This applies to tools as well. For instance, let's consider you have a tool that analyzes exposed credentials of a database. A database expert then analyzes the report. The expert  identifies the services connecting to the database, finds the owners for it and informs them so they can re-examine it. Think by-design --------------- ![think by design](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/63cfddd22af1e2a82ac05ecfaws2-1701865762626-original.png) With the "by-design" approach we ensure adoption of best practices and principles before implementation.   To continue the above example of the credentials for the database, a "by-design" approach would ensure that there is a formal way to request a credential for each of the services _beforehand_. Also, it would require you to ensure that there is a policy on credential creation, isolation, storage, and rotation which is applied _uniformly_ when this is fulfilled, programmatically. It must also be important to ensure that this is the _only_ way credentials can be created.If the above is taken care by design, the need of verification tools will go down significantly. The codified policy and code can be statically verified and can be moved upstream to the CI pipeline. Similarly, with AWS Cost Explorer, where you analyze costs per team and identify teams consuming the most resources, it is a "tool of verification". While AWS Budgets, where you allocate budgets for a team combined with automation that prevents teams to go over budget, will be a "tool of design". Comparison of by-Audit versus by-design --------------------------------------- ![by-audit versus by-design](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/63cfda03867ebf2819087565tablepercent20dark-1701865763693-original.png) Let’s expand on these. ### Goals If you have a burning problem like cost bloating, then the by-audit process comes in handy. You would need to put a team together, pull out all reports, break them down and assign them to developers to fix them. However, more often than not, these practices become the norm. By-design processes provide long-term benefits and ensure things stay fixed. You may need audits still, but the workload dramatically reduces. ### Ownership Typically, a central team is responsible in the by-audit approach. Here, the governance is said to be centralized, because ownership is with one team. Generally, they use certain tools and analyze reports. This is fine for the short term. However, with the by-design approach the goals are long-term. Ownership is given to multiple teams so governance is de-centralized and teams get more autonomy. This ensures clear ownership boundaries so that every team is aware of their share of tasks. ### Benefits Apart from routine audits, usually, by-audit processes are useful in case of anomalies e.g. a security breach or a cost spike. While it's easy to tackle these anomalies with audits, it's not sustainable for improving baselines. By-design processes ensure that the baseline improves overall - i.e. your default security posture,  observability coverage, and cost baselines. ### Focus While moving from by-audit to by-design processes, the focus must shift from Tools to Platform thinking. The questions that need to be answered are : 1. How will this tool be used by the developers directly without additional cognitive load on them? 2. How will this tool integrate earlier in my delivery pipeline so the mistakes don't propagate to production? Conclusion ---------- Platform teams of today must shift from stitching tools and audit-driven practices. Indeed, [Platform engineering](https://thenewstack.io/platform-engineering-what-is-it-and-who-does-it/) is centered around a by-design mindset. Platform teams need to think about how to devise ways where the well-architected aspects of cloud and practices are 'ensured' and not required to be audited. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## The By-Design Approach to Simplifying Configuration Management - Part 1 Author: Anshul Sao Published: 2023-01-18 Category: Product News URL: https://blog.facets.cloud/simplifying-config-management-part-1 While there are many solutions for service discovery, managing different service endpoints (DBs, caches, and queues), URLs, and ports for different environments can be a time-consuming effort. DevOps teams use configuration management tools such as [Ansible](https://www.ansible.com/), [Chef](https://www.chef.io/), or [Puppet](https://www.puppet.com/) while developers use configuration files to manage service endpoints. However, these methods still have their own pain points like keeping track of new and changing endpoints is a manual effort. Launching a new environment can be a real hassle, with new configuration files or central maps needing to be created. With Facets, all of that changes! Facets offers an alternative to traditional configuration management. It allows you to refer to services by their logical or abstract names and refer to them in your application as environment variables or files like the sample below. This means you no longer have to maintain a central map of the environment to services or switch between different configuration files based on the current environment. Launch new Environments with ease --------------------------------- With Facets, you can [launch a new environment](https://readme.facets.cloud/docs/launch-an-environment) with a single click! The platform takes care of creating and wiring the necessary information, giving developers and platform engineers a hassle-free experience. Say goodbye to semi automations and hello to a seamless experience. Upgrade to [Facets](https://www.facets.cloud/) today! --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Successfully Implementing DevOps: Navigating the Top 4 Challenges Author: Rohit Raveendran Published: 2023-01-12 Category: Blogs Tags: DevOps Challenges, DevOps Impementation URL: https://blog.facets.cloud/implementing-devops-navigating-the-top-4-challenges Based on our interactions with over 200 organizations, [our previous article](https://www.facets.cloud/blog/is-your-devops-implementation-complete) concluded that very few have streamlined DevOps and that challenges with implementation are a common occurrence. In the middle of their DevOps implementation journey, Developer and Ops teams are getting overwhelmed with firefighting. As a result of this, teams are getting distracted from focusing on and solving core business problems, leading to fewer features being developed or deployed. For starters, even though 83% of IT leaders report their organization uses DevOps approaches, 78% say their teams are performing below expectations, according to a study by Puppet1. Gartner [predicted something similar](https://www.gartner.com/smarterwithgartner/the-secret-to-devops-success) in 2019, stating that "75% of DevOps initiatives will fail to meet expectations." In this article, we will investigate the research to determine the extent of these challenges. We have reviewed multiple whitepapers published by the industry's leading voices to gain insight into these challenges and are sharing our findings. Let's begin. Impact on Developer and Ops Productivity ---------------------------------------- According to Haystack2 and [Survation](https://www.survation.com/), 74% of developers find themselves “working on operations in some form, even if only as part of their job.” And, only 26% "solely" work in Product Development. ![Impact on Developer and Ops productivity | Facets | Implementing DevOps: Navigating the Top 4 Challenges](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/63bfd033335f382f02dec07aengineeringpercent2520work-1701865765050-original.png) _Study to understand the impact of COVID-19 on Software Engineers, Haystack_ This was not surprising to us. While building new products or deploying new features, speed matters. And as _many_ iterations take place, handoffs happen between Developer and Ops teams. These handoffs, over time, become much likelier and responsible for information and team productivity loss6. To reduce these handoffs, we introduce automation. But, [deployment automation](https://cloud.google.com/architecture/devops/devops-tech-deployment-automation) is complex and requires the Ops team to use multiple tools correctly and work with the Development teams, decreasing their bandwidth to work on innovations instead. Another piece of evidence we found that impacts Developer and Ops productivity was lacking visibility into incoming "unplanned work." According to Jellyfish3, 45% of engineering managers think that making sure everyone is focused on the highest priority work is the greatest challenge in 2022. ![Top challenges faced by engineering managers | Facets | Implementing DevOps: Navigating the Top 4 Challenges](https://uploads-ssl.webflow.com/62566ffa5e87f6550e8578bc/63bfd03308864cf3c8295936_image%2520(3).png) _2022 State of Engineering Management, Jellyfish_ The same analysis also finds the average percentage of time a developer team spends on unplanned work. It increased by three percentage points in 2021 (22%) from 2020 (19%). > "Teams spent 22% of time on unplanned work in 2021, slightly more than they spent in 2020 at 19%. This sudden increase in unplanned work may signal that management is unaware of certain types of work that are taking time away from established priorities." - According to the report Now that’s a pain point! Developers’ productivity suffers, resulting in slower release cycles. In organizations where [DevOps implementation is incomplete](https://www.facets.cloud/blog/is-your-devops-implementation-complete), or a cultural Shift-left is nonexistent, inefficiencies are widespread due to exceeding cognitive load on the Ops team. This is happening because keeping up with the best practices and managing multiple tools and upgrades is challenging. An example of this is Treebo, a well-known hotel chain in the premium-budget segment from India. As [Kadam Jeet Jain](https://www.linkedin.com/in/kaddy/), CTO & Co-founder of Treebo tells in his own words: _"The biggest challenge for the Ops team was that they would spend a good 70-80% of their time in solving production issues or helping the development teams out in debugging, this was frustrating for both teams. The turnaround times of the team would inevitably be higher because of this."_ That said, another challenge organizations encounter is a lack of talent with an eye for detail and a skillset to reduce [toil](https://sre.google/sre-book/eliminating-toil/). DevOps Skillset Shortage ------------------------ The DevOps market crossed $7 billion in 2021 and is predicted to expand at a CAGR of more than 20% from 2022 to 2028, reaching a value of more than $30 billion. And as a result, job opportunities are also increasing. ![devops hiring in the US](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/63b6a0e6e70adc7947f749dfimage-1701865767105-original.png) That's just the United States! It's no longer a mystery that DevOps engineers are in high demand, with over 112K new job ads on LinkedIn. But this question sticks around: Is the availability of individuals with DevOps skills ample as well? The answer is No. Very few keep the know-how and are efficient in building automation to reduce toil. According to the trends captured in the Puppet report1, shortage of skills, at 33%, is the single largest contributing factor to why organizations struggle and face DevOps challenges. > "Armed with a clearer sense of the task at hand and having begun to automate more, mid-level teams cite a shortage of skills (33%), legacy architecture (29%), organizational resistance to change (21%), and limited or lack of automation (19%) as the primary blockers to better DevOps practices." - According to the report It's not the ideal case. With increased organizational efficiency as a byproduct, your reliance on DevOps engineers should decrease. Environmental Inconsistencies Leading to Unpredictable Outcomes --------------------------------------------------------------- According to CSA4, Lack of Internal Guidance (33%) followed by Insecure Default Settings (18%) and Negligence (16%), are the major contributing factors to misconfigurations. We'll call them Environmental Inconsistencies. ![Primary causes of misconfigurations in organisations | Facets | Implementing DevOps: Navigating the Top 4 Challenges](https://uploads-ssl.webflow.com/62566ffa5e87f6550e8578bc/63bfd03395f4a837852d6d9b_Primary%2520causes%2520of%2520misconfigurations%2520in%2520organizations%2520(1).png) _Secure DevOps and Misconfigurations Survey Report, CSA_ Tracking changes in your cloud workload becomes increasingly difficult as the size of your team grows. It's a perfect recipe for what we'll discuss next. Environment drifts are common in organizations where new code releases happen numerous times daily. That said, the consequences of poorly managed drifts can be far-reaching — from growing frustration in engineers due to delayed deployments to unplanned downtimes and security vulnerabilities costing real dollars. Environmental inconsistencies are manageable in general. However, substantial internal advice is required. The CSA Report4 emphasizes this: > "The primary reason cited for these misconfigurations was flawed or lacking internal guidance (33%). This indicates that the guidance organizations are developing internally is ineffective for preventing misconfiguration." - According to the report The need of the hour is to have single-click environments with collaborative workflows. Faster issue discovery will then result in faster code deployments, and less time to go live with the changes. Underestimated or Wasted Cloud Spend ------------------------------------ Gartner [forecasts](https://www.gartner.com/en/newsroom/press-releases/2022-10-31-gartner-forecasts-worldwide-public-cloud-end-user-spending-to-reach-nearly-600-billion-in-2023#:~:text=Worldwide%20end%2Duser%20spending%20on,18.8%25%20growth%20forecast%20for%202022.) public cloud spending will reach roughly $600 billion in 2023 — a 43% spending increase from 2021. Big spending, indeed. But, what about spending predictions or wasteful spending? According to Flexera5, wasted cloud spending has increased to 32% in 2022, from 30% in 2021. > "Spend is likely less efficient and likely even higher on average, as many organizations tend to underestimate their amount of waste." - According to the report According to the same study, 66% of respondents report an increase in cloud spending that was "higher than initially planned this year." ![Underestimated or wasted Cloud Spend | Facets | Implementing DevOps: Navigating the Top 4 Challenges](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/63bfd034f41daf95f88b5ec1image-1701865769688-original.png) _State of the Cloud Report, Flexera_ Cloud cost optimization challenges best of the best. It requires a [by-design cost-optimization approach](https://blog.facets.cloud/cloud-cost-optimization-efficiency-by-design/) to DevOps. **Let's Summarize** ------------------- Scalability, reliability, and cost productivity are essential in today's fast-paced industry. Maintaining a competitive edge requires quick deployment times, upgrades, and new features, as well as maturity in dealing with DevOps challenges. However, with new innovations, massive disruption in the DevOps value chain will continue in the coming years, and the trend toward innovation-led processes is no surprise. Platform Engineering proposes new approaches to accelerate and rethink DevOps. In our future articles, we'll go over this topic in greater depth. ### **Sources** 1. [The 2021 State of DevOps Report, Puppet](https://puppet.com/resources/report/2021-state-of-devops-report/) 2. [Study to understand the impact of COVID-19 on Software Engineers, Haystack](https://haystack-books.s3.amazonaws.com/Study+to+understand+the+impact+of+COVID-19+on+Software+Engineers+-+Full+Report.pdf) 3. [2022 State of Engineering Management, Jellyfish](https://jellyfish.co/resource/2022-state-of-engineering-management-report/) 4. [Secure DevOps and Misconfigurations Survey Report, CSA](https://cloudsecurityalliance.org/artifacts/secure-devops-and-misconfigurations-survey-report/) 5. [State of the Cloud Report, Flexera](https://info.flexera.com/CM-REPORT-State-of-the-Cloud) 6. [The Art of Lean Software Development](https://books.google.co.in/books?id=0VsK9cVZauQC&pg=PA19&dq=50%25+information+lost+in+handoff+50%25&hl=en&sa=X&redir_esc=y#v=onepage&q&f=false) --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Changelog for Configurations In Facets: Making It Easy to Inspect Version History Author: Anshul Sao Published: 2023-01-04 Category: Product News Tags: Infrastructure as Code (IaC), devops URL: https://blog.facets.cloud/changelog-for-configurations Change is the only constant in DevOps! Production systems are constantly under pressure to stay on top of the various types of changes that are continually happening: changes in code, configurations, resource sizing, or shifting dependencies among services. Changelogs are often non-existent, and even if they exist, they may be scattered across places like Github deployment logs etc. This makes it tedious to perform backtraces. Often, this can affect productivity, especially while debugging or checking why the behavior of services has changed. We’re excited to make it easy to inspect Version History for all microservices and databases with [Facets](https://facets.cloud/). Through the single pane of view you can instantly see what has changed, when it was changed and which team member has carried out the changes. In addition, you can compare the previous version against the current config and restore if required! ![Managing IaC with Facets](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/63b55f063a6a5b0df8d95d8ffinal-gif-confirmed-1701865770732-original.gif) Insert description Managing IaC with Facets ------------------------ We’re passionate about creating a single view that makes it easy for developers and DevOps to collaborate. For consistent, error-free deployments check out how you can [Manage IaC](https://readme.facets.cloud/docs/manage-iac-versions) with Facets. ‍ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## How to Rollback to a Previous Version of a Microservice Using Facets Author: Anshul Sao Published: 2022-12-23 Category: Product News Tags: microservices, ROLLBACK, RELEASE MANAGEMENT URL: https://blog.facets.cloud/rollback-to-a-previous-version-of-a-microservice-using-facets Sometimes the only way forward is to go back! In this age of rapid developer cycles, you need to be able to roll back to the last known good version of your build with ease. With [Facets](https://www.facets.cloud/) you can see the current registered Artifact and view its last 5 versions. The Artifact rollback feature lets you select which artifact you want to rollback to with a single click. ![](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/63a5378bc8c1bdb67af9549bgiphy-final-1701865771820-original.gif) Once the Rollback is successful you get a message indicating the updated version. ![](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/63a537ae6b52566a006472a3image-1701865772684-original.png) On the next release of the environment, the updated Artifact version will be released.‍ ### Advanced Release Management with Facets Releases can be automated or manual. [Facets.cloud](https://facets.cloud/) is built with a belief that all kinds of Releases should be managed through a single mechanism. [Check out](https://readme.facets.cloud/docs/releases) how you can manage Releases with us. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## How to Compare Microservices Across Environments Using Facets Author: Rohit Raveendran Published: 2022-11-29 Category: Product News Tags: microservices, developer self service URL: https://blog.facets.cloud/compare-microservices-across-environments-using-facets I’m excited to introduce the compare mode for services hosted with [Facets](https://www.facets.cloud/). ![](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/6388345e330617879b69f2d8bloggiffy-1701865773668-original.gif) Services are usually deployed across multiple environments. Often, the behavior of services differs in these environments causing issues. More often than not it is due to some flag or configuration which differs across environments. However, it's difficult and tedious to compare the state of the services. Facets provides an intuitive way to visualize these differences and better understand environment-specific nuances in one go. Additionally, you can also directly determine if this configured state differs from what was initially set in the [Blueprint](https://readme.facets.cloud/docs/blueprint) (desired state). ### Empower your developers with self-serve With Facets, you can instantly spin up new environments that are secure, consistent and have observability metrics built into them. Share the responsibility in a responsible way! Read our docs [here](https://readme.facets.cloud/docs/create-an-environment) to learn more about managing environments. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## In Conversations: Driving Engineering Efficiency with Internal Platforms Author: Pravanjan Choudhury Published: 2022-11-21 Tags: Internal Developer Platform, Developer Productivity URL: https://blog.facets.cloud/driving-engineering-efficiency-with-internal-platforms I’m thrilled to welcome our first guest, Ramesh Nampelly, Senior Director of Cloud Infrastructure and Platform Engineering at [Palo Alto Networks](https://www.paloaltonetworks.com/?utm_source=google-jg-emea-portfolio&utm_medium=paid_search&utm_term=palo%20alto%20networks&utm_campaign=google-portfolio-portfolio-emea-multi-awareness-en&utm_content=gs-18707974515-140675529017-630724229712&sfdcid=7010g000001dKD0AAM&gclid=Cj0KCQiAmaibBhCAARIsAKUlaKQD7hAgsv_OhPmaMapUS3fPeUGDvDdnKYEfNvETAxDVyuXJvAjw1pkaAjFSEALw_wcB). From SRE platforms to chaos engineering for service resiliency- Ramesh has worked on engineering effectiveness from many perspectives! In this interview, Ramesh gives me a unique view into how tech companies solve their DevOps and [platform engineering](https://blog.facets.cloud/handbook-to-platform-engineering-journey/) challenges. ‍**Mukta:** _Ramesh, welcome and thank you for joining us! It’s a pleasure to have this chat with you. As an enterprise cybersecurity platform, Palo Alto Networks has a huge number of products across many verticals of cyber security. Take us under the hood! How do your engineering teams support all these products?_ **Ramesh:** Hi Mukta, great to be here. Well, to give you an idea of my org, I joined Cloud Delivered Security Services (CDSS) at Palo Alto Networks. PAN has four groups, or Speedboats as we call them internally. These are Prisma Strata ( Netsec - Network Security) , Prisma Cloud ( Cloud security) , Cortex ( SIEM and XSIAM) and Unit42 ( Security consulting). The CDSS falls under NetSec group. With many engineers, CDSS teams could be viewed as a bunch of internal startups. Every service team is responsible for end-to-end delivery and operations i.e. Dev/QA to staging to production! So, we had to figure out a way to bring these under a unified governance model while not impacting the current delivery cadence. We have over 9 customer facing services each with their own CI/CD pipelines, infrastructure management and observability tools. So, we needed a two-layered cloud infrastructure and platform approach. In this approach, the common platforms, frameworks & tooling is owned by the central team (i.e. my team) and the service-specific implementation is owned by the concerned service team.  We also embraced inner sourcing ( an internal open source) model in which the core or central team owns a given platform service but the contribution can come from anywhere. **Mukta**: _So where did you start? Did you build an [internal platform](https://blog.facets.cloud/in-house-or-third-party-internal-developer-platform/)?_ **Ramesh:** Yes. We researched and didn’t find a single solution that satisfied all our requirements. The closest option we found was the Spotify [Backstage](https://backstage.io/) platform. My team started POC with an initial goal to provide a Service Catalog defining ownership i.e. which dev team owns which service. The first use case we solved was discoverability ( explore and query). Previously teams used Google sheets, confluence, google docs etc. to find ownership. The Backstage Service Catalog served us well, it pulls in the meta-data around services and pulls in all the artifacts etc. into a simple UI. The service teams appreciated it when we gave them the demo. Next we tackled efficiency through self-service automation: for example, if one of the teams solved a given problem or figured out a new tooling then how do we transfer that knowledge and learning to another team? We built Devclues (our internal developer platform based of backstage) which creates the required scaffolding in the form of templates. So for example, developers could bring up a k8s or a kafka cluster,  a react or a Go app, or secret manager (vault) integration with a few clicks using these templates. We have also leveraged backstage plugins like [cloud cost](https://blog.facets.cloud/cloud-cost-optimization-by-design-a-strategic-approach-to-cloud-cost-planning/) insights - since cloud costs tracking was important to us. We  extended the 'costs insights' plugin to provide granular level of insights of what’s contributing to the cost. For example, which SKU is contributing the most is it compute, cloud storage, logging or networking etc. We’ve made sure this plugin provides engineer level insights in addition to exec level view. Also, we started extending the capabilities of the core platform with features that were important to us like integrating with OPA ( Open Policy Agent). Now developers can work with the guardrails and cost optimization in mind, instead of having different practices for each team. This still allows the flexibility that’s needed for developers but with best practices. In addition to the internal developer platform, we are also working on an **observability** platform to help engineering teams to achieve better service reliability with optimized costs. Over the last year, we built an internal observability platform called “Garuda” using open source software like grafana, stackstorm and  vector.dev etc. This platform is going through an adoption phase as we speak. We have over 3 teams (or tenants) operating their services in production using this platform. **Mukta:** _How was this received? Did it meet the expectations of internal teams?_ **Ramesh:** Let me be very honest here. For any internal platform, building is easy but adoption is a tough one unless you have buy-in from your customers. At PAN, from the start of the journey, we took customer adoption very seriously and built platform features in close collaboration with service teams. In fact, we’ve co-developed initial capabilities with our customers (i.e. internal engineering teams) like how startups do with design partners.   Some other measures we’ve taken to increase adoption are : 1\. Collecting requirements and the feedback on existing features in the form of surveys 2\. Send frequent updates through newsletters. **Mukta:** _Prior to Palo Alto Networks you worked at Cohesity. Tell us a bit about  your work there._ **Ramesh:** At Cohesity I was Head of Engineering Efficiency. The key challenge there was to improve the productivity of engineers. It was very clear that the leadership gave great importance to internal engineering services. In my first week, I was asked a critical question: how would you increase engineering productivity by 10x? To answer that question I had to understand the bottlenecks in the system first. So I started learning the lay of the land by meeting people like senior leaders, key architects, and some developers in my first few weeks. Two areas stood out from those conversations: engineering efficiency and better utilization of infrastructure to reduce expenses. With regards to engineering efficiency, the main problem was the build time. After thorough investigation and discussions with key architects, we decided to migrate the build system over to [Bazel](https://bazel.build/), an open-source build system by Google. Bazel gives a robust remote cache mechanism which improved our capabilities and as part of migration we cleaned up implicit dependencies etc. Within 6 months we could migrate the majority c/c++ code to bazel. By the end, we’ve achieved 30- 50% optimization in build time. Regarding infrastructure utilization, we  built a new system that dynamically provisions infrastructure in the datacenter based on the intent and continuously monitors the usage to scale up and down accordingly. The tool is leveraged in regression runs as well to utilize the available resources effectively. **Mukta:** _Engineering efficiency is one of the underlying principles that has driven the evolution of DevOps, SRE and platform engineering. What was your strategy and vision when you started?_ **Ramesh:** The team’s vision statement at Cohesity was “Provide an awesome developer experience and frictionless engineering services”. Our strategy was centered around the OKR model to create alignment and engagement around measurable goals. So, I came up with quarterly and yearly OKRs for engineering efficiency which are aligned with overall eng OKRs. One practice that I’ve been following both at Cohesity and now at PAN is reviewing OKRs every month and adjusting execution accordingly. These OKRs are presented in “monthly all hands” so that the whole team is aware of the progress. ‍ **Mukta:** _Can you give us some examples of what kind of the Objectives you set and the Key results that followed at Cohesity?_ **Ramesh:** Sure. We set objectives around three main aspects: a) Improving developer experience and productivity b) Shift left quality through increased automation c) maximizing infrastructure utilization. So for the objective of  improving developer experience and productivity,  the Key Results were:   1. Reduce CR merge time by 50%. 2. Provide merge failure feedback through automated triaging for all commits. 3. Reduce MTTR for merge failures to x hrs. ‍**Mukta:** _As you know, many CTOs and heads of Engineering of SaaS companies are solving this problem. However, the dilemma they face today is whether to invest in their own platform teams and internal frameworks or an external platform. Given your journey and experience, what advice would you give them? Build or buy?_ **Ramesh:** Very good question and the same thing we’ve recently discussed at a Google cloud customer event. My view is that it depends on the kind of company. For example, most tech companies might have been in the platform/devops/sre journey already by the time they are exposed to 3rd party tools. Secondly, these companies have unique platform requirements based on their product needs and heavily focus on building solutions specific to their needs. So, any 3rd party tool or platform may not fully satisfy their needs.   Whereas when it comes to non-tech companies, it’d be little different as building software is not a primary for them. So they might prefer to Buy rather than Build. Another important factor is migration i.e. moving to a managed/3rd party solution is easy if you are starting from scratch but it’d be tricky to migrate over if you already have an in-house solution. So, I think it's about the integration complexity and potential costs associated with buying a solution. Basically I want to get the point across that if you go the buying route, you should consider buying a solution that covers all your needs and is continuously updated with new features/usecases. Rather than having to buy a single solution every time there is a problem and then integrate it into existing infrastructure. Lastly, I’d like to differentiate platform engineering from typical SRE/DevOps is that you treat platform as a **product** in platform engineering as opposed to an isolated automation use case. ‍‍**Mukta:** _Thank you Ramesh for taking the time to share your insights!_ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Is Your DevOps Implementation Complete or Flooded With Firefights? Author: Pravanjan Choudhury Published: 2022-06-27 Category: Blogs Tags: devops, DevOps Impementation URL: https://blog.facets.cloud/is-your-devops-implementation-complete-or-flooded-with-firefights ‍According to a [survey](https://www.atlassian.com/whitepapers/devops-survey-2020) published by Atlassian in 2020, 85% of DevOps practitioners said they faced barriers in their implementation of DevOps.  This survey resonates with many founders. We’ve talked to over 200 organizations, and very few can say they have fully streamlined DevOps processes. What is the state of your DevOps implementation? Are your planned stories languishing while your backlog gets flooded with firefighting and unplanned stories? Unfortunately, it’s difficult to identify systematic problems until you begin to feel their detrimental impact on your software delivery! Here are some warning signs to watch out for: ‍Stability Issues in Cloud Environments --------------------------------------- Expectations from product experiences have changed over time. Modern-day users expect a stable, responsive, and glitch-free experience. While stability issues may creep in due to bad code and how you build your software, several problems could arise from how you ship your software too. So what do  stability issues in the cloud environments look like? 1. **Higher than usual outages**: If you see frequent breaches in your SLAs, it’s worth digging deeper to discover what's going wrong. One of the common reasons that we have observed is the misconfiguration of cloud environments. This could arise from incomplete automation of parts of the CI/CD pipeline while still relying on manual processes for some processes. 2. **Poor release confidence**: Safeguarding releases is usually a good thing. However, if you spot too many manual interventions by the teams to safeguard releases for every environment, it may be a sign of environmental inconsistencies. 3. **Lack of ease in creating new environments**: Do you often hear that launching new environments is hard? This lack of ease indicates that the knowledge of setup of current environments is not centralized. Instead this knowledge is fragmented- in IaC code, configurations, manual run-books, or only known to specific team members.  Developer teams need to be able to easily spin up new QA or load test environments to test new features, updates, or fixes. Unfortunately, any shortcuts here lead to issues slipping to production environments instead of being caught early. Productivity Issues ------------------- The [State of DevOps](https://puppet.com/resources/report/2021-state-of-devops-report) Report by Puppet shows that 78% of organizations said their teams were “stuck in the middle” of their DevOps evolution in terms of productivity. If your developers are spending more time on [toil](https://sre.google/sre-book/eliminating-toil/) rather than working on new features and innovations, then these signs may look familiar:‍ 1. **Low Developer productivity:**  Developers getting frequently blocked is an indicator that can't be ignored.  In the coding and build stages, developers are fairly independent and not blocked on external dependencies.  However, problems arise when it comes to tasks that require handover with DevOps teams- like code releases, adding new components to environments, or configuration changes. Developers often get blocked on DevOps teams who have the know-how. 2. **DevOps burnout**: This is an obvious sign. Ops teams are under tremendous pressure to  respond to incidents quickly and they constantly need more resources. Eventually, DevOps teams burn out and suffer as they are responsible for keeping up with the latest DevOps practices while dealing with  current inefficiencies. 3. **Ticket ops**: To handle the aforementioned problems, many organizations tend to streamline the communication between developers and DevOps through tickets. Streamlining processes is a good thing but it can also indicate an over-reliance on the DevOps team. Handling ticket ops sequentially can lead to delays and blockers in the SDLC. Issues That Create Organizational Risk  --------------------------------------- DevOps is more than just ensuring that your CI/CD pipeline runs smoothly. There are security, compliance, and observability considerations which if not implemented correctly, put the entire organization at risk. Many of these considerations can't be deferred to a later stage. 1. **Incomplete security posture**: If your compliance audits result in last-minute fixes in environments or access, you are probably exposed to security vulnerabilities. Without audits, you would have no way of determining this risk! It would be good to have a mechanism that prevents security issues from creeping into a high velocity software delivery pipeline. This needs significant DevOps investments. 2. **Bloated cloud cost**: This is a common problem many companies face today. Repeated [cost](https://blog.facets.cloud/cloud-cost-optimization-by-design-a-strategic-approach-to-cloud-cost-planning/) audits are a sign of an incomplete DevOps implementation as well. Cost audit mechanisms are more of damage control action than damage prevention. A design-first cost-optimized approach to DevOps is not simple. 3. **Business continuity risks:** Many global standards require  organizations to have a business continuity plan, such as disaster recovery plans. Almost everyone claims to implement these standards. Yet, whenever there's a cloud provider issue even in a localized region, many solutions hosted on those providers go down. In our interviews with 200+ tech teams, no one has performed a flawless disaster recovery drill, yet many hope that in the event of a disaster their run-book will work - they don't! ‍Conclusion ----------- In conclusion, it’s important to keep track of these warning signs in order to assess the health of your [DevOps implementation](https://blog.facets.cloud/stop-firefighting-with-your-devops-implementation/). So how does your implementation stack up? We at [Facets.cloud](https://Facets.cloud) know that building a lean yet sturdy \\implementation is not a trivial process. Contact us to find out how we can help you streamline your Devops Roadmap. ‍ ‍ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Software Product Catalog: The Only Path to Zero Drift in Your SDLC Author: Anshul Sao Published: 2022-04-15 Category: Tech Articles Tags: infrastructure consistency, drift-free infrastructure URL: https://blog.facets.cloud/software-product-catalog-path-to-zero-drift-in-sdlc Today, all the aspects of a software product components including the dependent infrastructure reside in documents, tickets, multiple scripts, Infrastructure as Code (IaC), policies, and manual configurations. This increases the collaboration effort and makes it nearly impossible to ensure zero drift during the Software Development Lifecycle (SDLC) of the software product.. What is a Software Product Catalog? ----------------------------------- A software product catalog is a set of requirement metadata that represents the components of a software product and the interactions between them. This catalog consists of various sections to specify the building blocks of your application.   To elaborate, a catalog may consist of databases, caches, queues, cloud-native resources, scheduled jobs, one-time jobs, stateless and stateful applications. Each of these definitions can be agnostic of the exact cloud implementation and constructed using simple JSON schemas. How is the Catalog used? ------------------------ ![Software Catalog](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/626f77ddfbea864d5c443bf1softwarepercent20catalogpercent20-percent20howpercent20itpercent20ispercent20used-1701865778256-original.png) The catalog can be manifested to any number of managed environments on any supported cloud providers like AWS, Azure, GCP, or Bare Metals by the **Facets cloud runtime**. The environments automatically receive battle-tested features such as Release Management, Compliance, Observability, and Cloud-centric best practice and cost-optimized operations. What are the Advantages of a Software Product Catalog? ------------------------------------------------------ The advantages of this approach are: 1. It unifies the knowledge silos and creates a single source of truth for everyone to refer. 2. Writing the catalog itself is very easy as it doesn't need any implementation details rather only your requirements. 3. The catalog can be understood programmatically and automatically manifested as a complete environment to any cloud. 4. You can enable git-based workflows to maintain the sanctity of the catalog and  history of all versions. 5. You can extend the catalog to add entities specific to your organization. ​ ### How to mutate a Software Product Catalog? ![Software Catalog](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/6291249b5e8586469cf19347softwarepercent20catalogpercent20-percent20howpercent20topercent20mutatepercent20spc-1701865779169-original.png) A Typical catalog modification workflow looks like the above diagram. **[Facets Cloud Runtime](https://blog.facets.cloud/facets-cloud-runtime-the-best-way-to-provision-cloud-infrastructure/)** can detect and propagate the catalog modifications or changes in the integrated build systems to the environments. ‍ The advantages of this approach are: 1. It unifies the knowledge silos and creates a single source of truth for everyone to refer. 2. Writing the catalog itself is very easy as it doesn't need any implementation details rather only your requirements. 3. The catalog can be understood programmatically and automatically manifested as a complete environment to any cloud. 4. You can enable git-based workflows to maintain the sanctity of the catalog and  history of all versions. 5. You can extend the catalog to add entities specific to your organization. --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## How to Reduce Immediate Hourly Commitment of AWS RIs In Exigencies Author: Rohit Raveendran Published: 2022-04-15 Category: Tech Articles Tags: cloud cost management, aws savings URL: https://blog.facets.cloud/reduce-immediate-hourly-commitment-of-aws-ri-in-exigencies If you run AWS deployment, you would know that Reserved instances (RIs) are a great way to obtain a discount on your always running workloads (base compute) by committing for a certain duration like 1 year or 3 years.‍ Typically, you would look at your base on-demand compute to determine how many RIs to purchase at any time. This is all great in the normal course i.e. your business is growing and you are increasing the cloud spend. Also possibly buying more RIs from time-to-time. Now, consider an exigency like a large customer churn, re-architecture, or an incident like covid onset that reduces your on-demand compute base. In these cases, your on-demand RIs may not get fully utilized temporarily or you may opt to run the machines anyway at reduced utilization! Here is a tip to spread your No-upfront, Convertible RI commitments over a longer duration and reduce immediate $/hr spend for temporary relief. The example shows 3  RI Line-items purchased sometime in the past, amounting to _26 m5.2xlarge_ machines, expiring on _June 30th, 2022_. Assume Today to be **15th Oct 2021**. You can simply reserve one  t3.nano (the smallest instance) today for 3 years (no upfront, convertible), thus expiring on **15th Oct 2024**. When you select all 4 RIs Line-items and put them up for exchange, the number of machines reduces to 6 from 26 with an expiration date of **15th Oct 2024**. The way RI exchanges work is that AWS takes the **SUM** $ commitment value of the RIs to be exchanged and spreads it over the **Max** expiration date of the RIs to be exchanged. What we have essentially done is spread the 6 months of leftover commitment over 3 years.  Read AWS docs [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ri-convertible-exchange.html#ri-merge-convertible). Few points to note here: 1. This process is irreversible, i.e. say your usage is back, you can't move back the RIs to the original state as the exchange date always moves forward. So you may have to purchase new RIs. 2. You have a longer commitment with smaller $/hour than earlier. The total commitment remains the same 3. Applies to No-upfront, convertible RIs only and not for other mechanism like [AWS Savings plan.](https://blog.facets.cloud/rationalizing-aws-savings-plan-recommendations/)​ 4. You can of course sell RIs but some restrictions apply like you can't sell Convertible RIs at the moment. AWS docs [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ri-market-general.html). Stay tuned for more tips on how to choose between different reservation instruments like Standard, Convertible and Savings Plan wisely. ‍ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## How to Improve Upon Your AWS Savings Plan Recommendations? Author: Rohit Raveendran Published: 2022-04-15 Category: Blogs Tags: cloud cost management, aws savings URL: https://blog.facets.cloud/rationalizing-aws-savings-plan-recommendations When it comes to optimizing your [cloud costs](https://blog.facets.cloud/managing-cloud-spend-in-saas-7-overlooked-opinions/), most practitioners tend to make use of saving plans recommended by AWS. However, can you just simply apply these recommendations in all scenarios? Maybe not. Let’s break down an example AWS Compute Savings plan recommendation and find out.   A quick recap: There are two types of AWS savings plans:  the Compute Savings plan and the EC2 Instance Savings plan. The AWS Compute Savings plan is more flexible and is applied automatically to EC2 instance usage regardless of instance family, size, AZ, Region, OS or tenancy etc. It can help you reduce costs by up to 66%. On the other hand, the EC2 Instance Savings plan applies to individual instance families in a region. ### ‍AWS Compute Savings Plan example In this example below, $5.894/hour was recommended to me based on a period of 7 days of usage: ![AWS Compute Savings Plan](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/62597da3e89d1db6a706181d62442b10c286660a5e1d4ae3blog-savings-plan-1701865780111-original.png) AWS Recommendation Page So why should we analyze this recommendation? 1. Well, the benefits may be subjective for each organization and the immediate maximization of savings is not always the only objective, sometimes the savings have to be compared with the liability or cost of making the commitment for 3 years.  Can there be a way for us to drill down this recommendation and come up with a more suitable plan? Turns out there is - read on. 2. The savings plan on a 3-year term usually gives up to a [66%](https://aws.amazon.com/savingsplans/compute-pricing/) discount, the number 13% is off.  Why is that exactly? What I understood is that AWS runs a simulation based on the past few days using the on-demand bill, and comes up with the recommendation. Think of it as a bucket of a particular size (Committed Savings plan) that can carry objects which are elastic and can be squeezed into different proportions. This is for every hour of the bill. Let us look at the recommendation again. ![AWS recommendations](https://uploads-ssl.webflow.com/62566ffa5e87f6550e8578bc/6294984e1b963f528ba27671_aws-recommended(3).png) ‍ AWS advises  $5.894/hour at Savings plan rate  and leaves out an average of $5.62/hour at On-demand rate. Together, that should amount to $1934 ($11.514/hour)  spend for 7 days.‍ ![AWS savings](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/62597da3f8b7cd5a2bcbe9b2617666e6f3dcc7ceda32494cri-day-wise-1701865781537-original.png) ‍ Head over to the AWS Cost Explorer and apply the following filters, and you'll find the numbers do indeed match. Note in the filters: in the **Purchase Options** section, _Reserved_ means classic reserved instances that we have previously bought, which are applied first. _Savings plan_ means a previously bought savings plan. The _On-demand_ category is the residual on-demand that is not covered by either. If you have no auto-scaling of instances, day-wise granularity is enough. But if you have variable on-demand instances like us, we need to go further into [hourly granularity](https://aws.amazon.com/about-aws/whats-new/2019/11/aws-cost-explorer-supports-hourly-resource-level-granularity/) (You may have to turn on a flag). AWS seems to be accounting for this, leaving out $5.62/hour  at On-demand rate in the recommendations, which surely must be correct. Or is it? ‍Now, let's break it down to [hourly resolution](https://aws.amazon.com/about-aws/whats-new/2019/11/aws-cost-explorer-supports-hourly-resource-level-granularity/). ![AWS savings](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/629499b559498a5094ac8fb8awsimageline-1701865782529-original.png) Looking at the hourly breakdown chart above, we can observe that the minimum residual spend is $4.39/hour. This is the baseline for our Compute cost. Once they are squeezed into the Savings Plan bucket (up to 66%), the commitment requirement should be much lower than the recommended $5.894/hour.   ![AWS Savings](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/62597da3154e9207a983a3b6617667413d9094aff043d808ri-hour-wise-1701865783244-original.png) Let us group by Platform and we can see that most of the left-over on-demand hours are primarily Windows machines. We have a small fraction of compute as Windows machines and the previously bought Saving Plans are being applied to the Linux machines with higher savings (rightfully so). On Windows, machines have about a 25% average discount compared to 66% earlier than we thought, so that explains the lower savings, partially. ‍However, the bucket still should be a smaller size if everything is squeezed and the recommendation would still be less if it took the $4.39 into account. Let's do that. So now our Revised commitment would be ($4.39/hr)\*(100%-25%) = $3.3/hr. This should result in a saving of $4.39/hr - $3.3/hr = $1.1/hr. Now, we have accounted for $1.1/hr out of the recommended _$1.71/hr_ with $3.3/hr of savings plan commitment out of _$5.894/hr_. The remaining $2.6/hr commitment will lead to a saving of $0.6/hr. Now with this revised commitment, we have accounted for $1.1/hr out of the AWS recommended saving of  $1.71/hr with $3.3/hr savings plan commitment out of the AWS recommended $5.894/hr. The remaining $2.6/hr ( $5.894/hr- $3.3/hr) commitment in AWS recommendation  will lead to a saving of $0.6/hr ($1.71/hr- $1.1/hr). Let's define a new metric here - Return on Commitment (RoC), i.e. how much $/hr we save with $1/hr of commitment. ‍For the first $3.3/hr, the RoC is = $1.1/$3.3, i.e. 33% For the second $2.6/hr of the commitment, the RoC = $0.6/$2.6 , i.e. 23%. So the Return on Commitment ROC, has reduced from 33% for the first part to 23% for the second part. ‍Now, that's something to think about! ‍So even if AWS is leaving out some part of the variable spend, it is probably leaving out the bits where incremental savings is minimal to every additional $ of spend. However, this means that some part of the savings plans is unused for some hours of the day and that negates the benefit of savings for the hours they are being used. Our revised recommendation would be to commit with a $3.3/hr saving plan rate out of the AWS recommended $5.894/hr and that should save $1.1 out of $1.71/hr. The revised table would now look like this. ![AWS Recommendation](https://uploads-ssl.webflow.com/62566ffa5e87f6550e8578bc/62949aa5aefd5d83aa10f227_revised-recommendation(3).png) The leftover Windows machines aren't great for savings plans anyways, so we may not do anything about that. Now we have a way to dig deeper and work out the trade-offs on how much we are committing for 3 years and what ROIs are we getting. ‍Happy Savings! ‍ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Facets Cloud Runtime (FCR): The Best Way to Provision Cloud Infrastructure Author: Anshul Sao Published: 2022-04-15 Category: Tech Articles Tags: infrastructure consistency, drift-free infrastructure URL: https://blog.facets.cloud/facets-cloud-runtime-the-best-way-to-provision-cloud-infrastructure The General trend in today's world is to create the tools and the pipelines of DevOps in-house to ship stable and agile tech products. But there is a huge cost to this. Organizations spend a lot to acquire talent, who then design the full DevOps implementation starting from scratch, by combining multiple point solutions.  We have seen that in this activity, there is a lot of repetitive work. The same discoveries and tunings around the tools and the pipelines are done across the organizations. Elsewhere in software engineering, we adopt frameworks to accelerate development to avoid repetitive and grunt work.  **Why not have a framework for implementing your Cloud Modernization and DevOps?**  Facets Cloud Runtime addresses this and enables you to focus all your energy on solving your business problems. Most enterprises and startups today rely on one or more cloud providers like AWS, Azure, or GCP to host their infrastructure. Running the infrastructure in these cloud providers is a full-fledged operational task as all of these clouds have complex and very different services and setup processes.  Routine tasks like setting up environments, setting up logging, tooling, metrics, and making sure all environments are in sync all the time takes a lot of manpower. Frequent releases and babysitting all change management overshadows want to put all of the energy into building and shipping new products. Facets aim to provide best-in-class DevOps to every team so that they can focus solely on their product. What is Facets Cloud Runtime (FCR)? ----------------------------------- FCR is an adapter that translates the Facets [Software Product Catalog](https://www.facets.cloud/blog/software-product-catalog) into cloud implementations. The same catalog can be implemented in any of the clouds by choosing a Facets Cloud Runtime for that cloud. There are modules provided by Facets to fulfill the intents expressed in Facets Software Product Catalog. Organizations can build on top of these provided modules to implement new intents or new implementations for existing intents using the FCR Plugin system. FCR will build a marketplace of these plugins so that organizations can share their knowledge of fulfilling these intents. Apart from fulfilling intents and keeping them synchronized in all environments, FCR takes care of creating the basic Networking infrastructure, Security, and Permissions. It also provisions, maintains, and manages a Kubernetes cluster.  In this Kubernetes cluster, it takes care of continuous delivery pipelines, deployment mechanisms (Canary, Rolling, etc. ), spot node management, etc. FCR also provisions default observability (Prometheus + Alert Manager + Grafana) and logging (NFS based with S3 for archiving) stack for you with options to integrate with others. ![Facets Cloud time](https://uploads-ssl.webflow.com/62566ffa5e87f6550e8578bc/626f79c611d8216340770e0c_Facets%20Cloud%20Runtime%20(1).png) Facets Cloud Runtime (FCR) Layout How does FCR function? ------------------------- Facets Cloud Runtime provisions your infrastructure with best practices around [cost, security, observability, and resiliency](https://blog.facets.cloud/standardization-in-security-cost-compliance-and-observability/) on your selected cloud provider. The FCR provisions network isolation, security groups, Kubernetes cluster, and other components defined in the Facets Software Product Catalog. Along with this FCR installs cloud agents to maintain and run your product. These cloud agents enable capabilities like autoscaling node groups, spot management, alerting, dashboards, and much more. FCR supports manifesting an intent in multiple "**flavors**". The choice of the flavor may depend on the cloud provider or stack definition itself. For eg, MySQL intent may be fulfilled by the Aurora/RDS flavor on AWS. Stacks can further customize this behavior by overriding the default flavor in the intent definition with another supported Facets flavor or custom flavor written as a Plugin. Why FCR? -------- The aim of FCR is to provide robust implementations with all aspects considered, Facets Engineering team works very hard to maintain and publish various flavors of implementations so that organizations can just use it as a stepping stone and worry only about their business-specific use cases.  FCR is like having a very specialized Ops team at your disposal at all times. Another aim is to democratize, codify and package the Ops knowledge to make it distributable within and across organizations. We envision communities to maintain and create many implementations for organizations to choose from. ‍ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## What is Concurrent DevOps: The next maturity model of DevOps Author: Pravanjan Choudhury Published: 2022-04-15 Category: Blogs Tags: devops, devops maturity URL: https://blog.facets.cloud/concurrent-devops-the-next-maturity-model-of-devops DevOps aims to streamline the continuous cycle of software delivery, from code to production. Managing and using all the tooling required in the different stages of the CI/CD pipeline is a monumental task. You need many tools that take care of all the aspects of DevOps- infrastructure automation, building/shipping code, providing observability, security, governance, cost awareness, etc. Companies end up making internal platforms that stitch these tools together to create ad hoc end-to-end solutions. However, building these kinds of platforms is not trivial! Added to this, multiple personas - dev teams, test teams, and ops teams- work on the same application, data, and resources at each stage. Research by Atlassian concluded that collaboration is a key factor in DevOps success. The usual way of collaboration is through tickets, documents, playbooks, or informal communication between various stakeholders. This sequential handover of data, artifacts and other resources introduces handover inefficiencies and gaps in integration, ultimately slowing down the whole process. It creates multiple views of data. This is also known as [DevOps tax](https://about.gitlab.com/topics/devops/use-devops-platform-to-avoid-devops-tax/). All this undercuts the advantages of DevOps itself! What is Concurrent DevOps ? --------------------------- Sid Sijbrandij, CEO of [GitLab](https://write.superblog.ai/sites/supername/facetscloud/posts/clptqybjg188231vpjvbq8psrp/gitlab.com) coined the term “Concurrent DevOps” which he describes as a ‘benefit statement’. This philosophy of concurrency has been an elegant trend in the last couple of years. To explain the concept, Gitlab cites the example of Google Docs. Previously, collaborating on a manuscript involved a word doc having several drafts being handed over to collaborators, going back and forth through email exchange. Then came Google Docs which cut the time spent passing documentation around allowing collaborators to work on a controlled version at the same time. The idea behind Concurrent DevOps is to solve the same problem for the DevOps processes. Most collaborations involve sequential handovers between teams and tools, by making this collaboration _concurrent_ instead of sequential, we can have increased visibility to all aspects of DevOps and significantly increase the efficiency and speed of innovation. Let’s remember all other aspects of software engineering have already evolved to being concurrent! Our Views. ---------- In our view, Concurrent DevOps is the next maturity model for agile software teams. We’ve spoken to many tech leaders in the industry, to understand the issues they currently face. Adopting concurrent DevOps will solve the following main issues: 1. The combinatorial growth from the increase in the number of services x number of deployments x environments can be explosive and reduce visibility, predictability, and efficiency.  You’d think the solution would be to add more people, but that doesn’t work! 2. **Dependency on a few:** Specialized teams have specialized knowledge but they work in silos. Ad-hoc scripts and code generated to solve immediate problems can potentially impact multiple teams but they are understood only by a few. 3. **Poor visibility:** Multiple views of data and artifacts creates fragmented visibility of the overall status. Dependency and workflow across teams are not well defined. There is no well-defined way for one team to know when they are going to be unblocked and how their work will impact other teams. As we imagined a place where Development, Quality, Security, and SRE teams can effectively collaborate for complex set-ups, we realized that the following three requirements emerged to support Concurrent DevOps.‍ 1. **Decentralisation:** Decentralizing management of resources like applications, databases, caches, etc. is necessary for agility. In a decentralized setup, anyone from any team should have the ability and means to create and modify resources rather than reinventing the entire process every time they need it. This should be available to all personas, developers, QA, etc. Similar decentralization is required for quality suites, security policies, alerts, dashboards, and cost awareness to effectively ship software. Simple and familiar constructs and tooling are a must to achieve decentralization because it’s unrealistic to expect everyone to understand everything. 2. **Collaboration:** It is important to have a single source of truth that everyone can rely upon to see the health of the state of all environments in their entirety. This single source of truth also empowers everyone to effectively collaborate and have complete visibility over all aspects. It counters the tribal knowledge problem and ad-hoc changes by design. 3. **Guardrails:** Counterintuitively, central guardrails are essential for decentralization. For concurrent DevOps, specialized teams should set up centralized guardrails. They should be empowered to set controls around cost, security, compliance, sizing, release process/pipeline, etc. which apply to every decentralized change being applied to environments. How can you adopt Concurrent DevOps? ------------------------------------ We are often asked by customers, can we do this on our own? Of course! The principles and culture of DevOps have to be adopted first before focusing on tools. In theory, an organization can develop in-house tools to adopt concurrent DevOps principles but it is not trivial and requires specialized teams and the know-how of more than 30+ tools! We at [Facets.cloud](https://Facets.cloud) feel the Concurrent DevOps platform can be simplified for organizations that wish to adopt it but don’t want to spend excessively on effort and expertise. Contact us today to find out how we can help you. ‍ ‍ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. --- ## Embracing Both Cloud-Agnostic and Cloud-Native: Can These Approaches Co-exist? Author: Pravanjan Choudhury Published: 2022-04-15 Category: Blogs Tags: devops, platform engineering, cloud agnostic URL: https://blog.facets.cloud/cloud-agnostic-and-cloud-native-at-the-same-time Most technology companies start with a single cloud provider. With time, they start to adopt the cloud-native functionalities of that cloud. This is expected and completely makes sense. Moving towards cloud-native architectures brings convenience and possibly, cost-efficiency.   But if you are not careful, you may get bound to the cloud provider. Many would prefer to have an option to switch their cloud provider or more commonly, to run either a part of their product or their environments on another cloud. The reasons may be varied. Your customer may have a preferred cloud hosting clause. You might want to expand to a region dominated by other cloud providers. Other reasons could be data-heaven laws, pricing or a particular service of another cloud that absolutely eases things for you. Some technology teams value this optionality so much that they forgo cloud-native functionalities and operate just like they would on a typical traditional data center. This adds up to the tooling needs significantly, making the infrastructure expensive. After all, the cloud was never meant to be run like a traditional data center! ![cloud native vs cloud agnostic](https://superblog.supercdn.cloud/site_cuid_clpqurmtk030523mkes6qgtbl/images/626f7b50f31a953fb5c90680cloud-nativepercent20percent26percent20cloud-agnostic-1701865789656-original.png) Having the best of both worlds isn't very hard. One needs to follow some design guidelines from both Dev and Ops sides.‍ These are best practices even if you choose to be on a particular cloud forever. Dev Guidelines and Best Practices --------------------------------- It all boils down to following simple decision models. We like to think of them as "blue cloud" and "grey cloud". 1. **Protocol compliant cloud-native resources:** As an example, [AWS Aurora](https://aws.amazon.com/rds/aurora/) is a cloud-native MySQL provided by AWS which is absolutely okay to use. Applications don't need to know it is an Aurora instance or a vanilla MySQL hosted on a bare EC2 machine. This is "Blue cloud" but shifts to the grey zone if you start to make the assumptions on specific Aurora features. For instance, the read replica is a lower latency version of master-slave replication and design your apps accordingly (make assumptions on the latency in the application). It isn't so and may not have solutions across open-source/other cloud providers. Similarly, if you modify your applications to use the aurora data apis, it tends to move towards a grey zone! 2. ‍**Reliable and Popular cloud components:** There are services provided by cloud providers which are unique and widely popular like S3. Usually, most cloud providers will have a look-alike of this type of services like [Azure Blob](https://azure.microsoft.com/en-us/products/storage/blobs) or [Google cloud storage](https://cloud.google.com/storage/). They offer similar/same features with different APIs. It is absolutely worth using them but need to be managed. The solution is fairly trivial that do not build deep tie-ups but build utility layers for your functionalities. For such widely popular services, cloud-agnostic wrappers/SDK are common as well, for e.g., MinIO. This would shift it to the "Blue zone" again 3. ‍​**Niche Cloud features:** Then there are unique capabilities of each cloud that can bring down your development time significantly. Like, say an S3 select feature can give new capabilities to the object that you already store in S3. You can use this feature by wrapping with micro-services or functional abstractions so you at least can write another cloud-specific implementation if it comes to that. This would help in localizing the change without the need to go everywhere in your whole codebase‍ OPs Guidelines and Best Practices --------------------------------- In the above cases, we took care of the Dev Part. What about the Ops? The Ops setup generally includes backup, recovery, code-delivery, observability, security, HA. Instead of completely building them in a cloud-agnostic way, it is prudent to use some of the cloud-native capabilities of each cloud. This would reduce the burden of building everything from scratch in an error-free way. A few tips while you build your Ops toolchain: 1. There should be a central repository of the policies, that should be agnostic of the implementation. The implementation may choose the most reliable method in each of the cloud or manifestations. For e.g., a Disaster recovery policy should pertain to the backups to keep and their frequency, agnostic of the fact that the implementation is an Aurora MySQL or a Self-hosted MySQL in a Linux server 2. You should provide a uniform developer experience even if your environments contain a mix of self-hosted or cloud-native components. For e.g., Metrics should be pulled from everywhere and collated in a single source of truth for creating the dashboards and alerts in a uniform way. 3. Kubernetes is the first step towards being cloud-native and at the same time being agnostic. However, there should not be any change in the code delivery workflows even if the underlying Kubernetes clusters are cloud provider managed on each cloud. Conclusion ---------- Being Cloud-agnostic doesn't mean multi-cloud. It doesn't mean migrating to another cloud at will as well. It simply means if you could host a particular environment of yours in another cloud within a reasonable time, say weeks not years. This would give the necessary optionality of the future without investing in tooling for other clouds. You don't need to sacrifice cloud-native functionalities either, you just need to manage the abstractions well. ‍We at [Facets.cloud](https://Facets.cloud) are building on the above principles to provide you with the necessary tooling to achieve the best of both worlds. Do write us to know more or collaborate! ‍ --- This blog is powered by Superblog. Visit https://superblog.ai to know more. ---