Feature Flags in Production: LaunchDarkly, Unleash, and DIY
I shipped a catastrophic bug to 100% of our users at 4 PM on a Friday. A simple UI change that broke the checkout flow. It took 45 minutes to push a fix through our CI/CD pipeline. During those 45 minutes, we lost roughly $12,000 in revenue.
The next Monday, I implemented feature flags. That same change, behind a flag, could have been turned off in 30 seconds. No deployment. No CI pipeline. Just a toggle in a dashboard. Feature flags didn't just change how we deployed code; they changed how we thought about risk.
Feature flags (also called feature toggles, feature switches, or feature gates) are one of the most impactful engineering practices you can adopt. They let you separate code deployment from feature release, enabling practices like canary rollouts, A/B testing, and instant rollbacks. According to a 2025 Split.io survey, teams using feature flags deploy 3.5x more frequently and have 70% shorter mean time to recovery (MTTR).
This guide covers the full spectrum: when to use flags, how to implement them, which tools to use, and how to avoid the tech debt that poorly managed flags create.
Types of Feature Flags
Not all feature flags are created equal. Martin Fowler's taxonomy identifies four distinct types, each with different lifecycles and management needs:
| Type | Purpose | Lifespan | Who Controls It | Example |
|---|---|---|---|---|
| Release Flag | Decouple deployment from release | Days to weeks | Engineering | New checkout flow |
| Experiment Flag | A/B testing, multivariate testing | Weeks to months | Product/Data | Pricing page variant |
| Ops Flag | Operational control (kill switches) | Permanent | Engineering/SRE | Disable recommendations engine |
| Permission Flag | User-level feature access | Permanent | Product/Sales | Premium features, beta access |
The critical difference is lifespan. Release flags should be removed within weeks of full rollout. Experiment flags live until the experiment concludes. Ops and permission flags may live forever. Treating all flags the same is how you end up with 2,000 stale flags in your codebase.
Tool Comparison: LaunchDarkly vs. Unleash vs. Flagsmith vs. DIY
LaunchDarkly
LaunchDarkly is the market leader. It's a fully managed SaaS platform with enterprise-grade features.
Strengths:
- Excellent SDK support (25+ languages)
- Real-time flag evaluation with streaming updates
- Sophisticated targeting (user attributes, segments, percentage rollouts)
- Experimentation platform built in
- Enterprise features (audit logs, SSO, approval workflows)
- Local evaluation (no network call per flag check)
Weaknesses:
- Expensive ($10-$20 per seat/month for Pro, custom pricing for Enterprise)
- Vendor lock-in (proprietary data format)
- Overkill for small teams
Unleash
Unleash is the open-source alternative. You can self-host it for free or use their managed cloud offering.
Strengths:
- Open source (Apache 2.0 license for the core)
- Self-hosted option (you control the data)
- Good SDK support (15+ languages)
- Strategies system (gradual rollout, user IDs, IPs, custom strategies)
- Project and environment separation
Weaknesses:
- Fewer enterprise features than LaunchDarkly
- Self-hosting requires operational effort
- Experimentation features are paid-only
Flagsmith
Flagsmith is another open-source option with both self-hosted and managed offerings.
Strengths:
- Open source (BSD 3-clause)
- Remote configuration alongside feature flags
- Edge proxy for low-latency evaluation
- Good documentation and onboarding
DIY (Build Your Own)
Building your own feature flag system is viable for simple use cases. A basic implementation needs three things:
- A data store for flag states (database, Redis, or even a config file)
- An evaluation function that checks flag states at runtime
- A way to change flag states without deploying (admin UI, API, or CLI)
| Feature | LaunchDarkly | Unleash | Flagsmith | DIY |
|---|---|---|---|---|
| Pricing | $$$ ($10-20/seat/mo) | Free (self-hosted) / $$ (cloud) | Free (self-hosted) / $ (cloud) | Free (dev time) |
| Self-hosted option | No | Yes | Yes | Yes |
| SDK count | 25+ | 15+ | 12+ | Custom |
| Percentage rollouts | Yes | Yes | Yes | Build it |
| User targeting | Advanced | Good | Good | Build it |
| A/B testing | Built-in | Paid add-on | Basic | Build it |
| Audit log | Yes | Yes (paid) | Yes | Build it |
| Setup time | Minutes | Hours | Hours | Days-Weeks |
| Operational burden | None (SaaS) | Medium (self-hosted) | Medium (self-hosted) | High |
Implementation Patterns
Pattern 1: Boolean Flags (Simplest)
A flag is either on or off. The simplest possible implementation. Use for kill switches and basic feature gating.
Pattern 2: Percentage Rollouts
Roll out a feature to X% of users. The key is making it sticky: the same user should always see the same variant. Hash the user ID and flag name to generate a consistent percentage bucket.
The formula: bucket = hash(userId + flagName) % 100. If the bucket is less than the rollout percentage, the user gets the feature.
Pattern 3: User Targeting
Enable a feature for specific users, user segments, or based on user attributes. For example: "enable for all users in Azerbaijan" or "enable for users on the Pro plan."
Pattern 4: Multi-Variant Flags
Instead of boolean on/off, return one of several variants. Essential for A/B testing where you might have 3-4 different versions of a component. According to Optimizely research, multi-variant tests produce statistically significant results 40% faster than sequential A/B tests.
Pattern 5: Server-Side vs. Client-Side Evaluation
| Aspect | Server-Side | Client-Side |
|---|---|---|
| Latency | Low (local evaluation) | Requires network call or bootstrap |
| Security | Flags not exposed to client | All flag values visible in source |
| Real-time updates | Webhook/polling | Streaming/polling |
| Best for | API features, backend logic | UI features, client-side experiments |
The Technical Debt Problem
Feature flags are technical debt generators. Every flag adds a code path. Every code path adds complexity. Every piece of complexity adds bugs. Google's 2024 engineering practices report found that stale feature flags are among the top 5 sources of unnecessary complexity in large codebases.
The Flag Lifecycle
- Create: Define the flag with a clear purpose and owner
- Develop: Add flag checks to code
- Test: Test both flag states (on and off)
- Roll out: Gradually enable for users
- Full release: Enable for 100% of users
- Clean up: Remove the flag and the old code path
Step 6 is where teams fail. The feature works, nobody wants to touch working code, and the flag lives forever. Six months later, nobody knows what the flag does or if it's safe to remove.
Strategies for Managing Flag Debt
- Expiration dates: Every release flag gets an expiration date at creation time. After that date, it triggers a CI warning or even fails the build.
- Flag owners: Every flag has an assigned owner who is responsible for cleanup.
- Regular audits: Monthly review of all flags. Any flag that's been at 100% for more than 2 weeks should be removed.
- Automated detection: Lint rules that detect flags serving the same value to all users (candidates for removal).
Testing with Feature Flags
Feature flags multiply your testing surface. If you have 10 boolean flags, you technically have 1,024 possible combinations. You can't test them all. Here's a practical approach:
- Test both states independently: For each flag, run your test suite with the flag on and with it off.
- Don't test combinations: Unless two flags explicitly interact, testing every combination is impractical. Focus on the states you actually plan to roll out.
- Use flag overrides in tests: Your test framework should be able to force a flag to a specific state regardless of the environment configuration.
- Canary testing in production: Roll out to 1% of users and monitor error rates before going wider. This catches integration issues that unit tests miss.
My Opinionated Take
After using feature flags extensively for several years, here are my strong opinions:
1. Start with a simple DIY solution. If you have fewer than 20 flags, you don't need LaunchDarkly. A database table with flag names, states, and rollout percentages, plus a 50-line evaluation function, is enough. Migrate to a proper platform when you outgrow it.
2. Ops flags (kill switches) are the highest-value flags. Before you implement percentage rollouts or A/B testing, add kill switches for your most critical features: payment processing, email sending, third-party API integrations. Being able to turn these off in 30 seconds is worth more than any amount of experimentation infrastructure.
3. Flag cleanup is more important than flag creation. The value of feature flags is front-loaded (deployment flexibility, risk reduction). The cost is back-loaded (code complexity, testing burden). If you create flags but never clean them up, the costs will eventually exceed the benefits.
4. Don't flag everything. Not every change needs a flag. Bug fixes, dependency updates, refactors, documentation changes — these should be deployed normally. Reserve flags for user-facing features, risky backend changes, and operational controls.
5. Percentage rollouts should be your default for major features. Start at 1%, watch the error rates for an hour. Go to 10%, watch for another hour. Then 50%, then 100%. This pattern catches problems that testing misses, and it limits the blast radius when they occur.
Action Plan: Implementing Feature Flags
Week 1: Kill Switches
- Identify your 5 most critical features/integrations
- Add boolean kill switches for each (even a JSON config file works)
- Document how to flip each switch (who, how, when)
- Test that turning off each switch doesn't crash the application
Week 2-3: Basic Flag Infrastructure
- Decide: DIY, Unleash, or LaunchDarkly (based on team size and budget)
- Implement flag evaluation in your application
- Add percentage rollout support
- Create an admin UI or CLI for managing flags
Week 4: First Flagged Release
- Pick a non-critical feature for your first flagged release
- Deploy behind a flag to 0% of users
- Roll out to 1% > 10% > 50% > 100% over several days
- After full rollout, clean up the flag (remove it from code)
Ongoing: Flag Hygiene
- Set expiration dates for all release flags
- Monthly flag audit (5 minutes per flag)
- Add flag count to your team dashboard
- Celebrate flag cleanups (seriously, incentivize it)
Key Takeaways
- Feature flags separate deployment from release, enabling canary rollouts, instant rollbacks, and A/B testing.
- There are four types of flags (release, experiment, ops, permission) with different lifecycles and management needs.
- LaunchDarkly is the best managed solution; Unleash and Flagsmith are strong open-source alternatives; DIY works for small teams.
- Kill switches for critical features are the highest-value flags you can implement.
- Flag cleanup is essential. Stale flags are worse than no flags because they add complexity without benefit.
- Default to percentage rollouts for major features: 1% > 10% > 50% > 100%.
Sources
- Martin Fowler - Feature Toggles
- LaunchDarkly
- Unleash - Open Source Feature Management
- Flagsmith - Open Source Feature Flags
- Split.io Research
- Optimizely Insights
- Google Research Publications
I'm Ismat, and I build BirJob — Azerbaijan's job aggregator scraping 80+ sources daily.
