Based on concrete feedback coming from system and customers, different changes may be introduced to the system. Any change to the system, though intended to improve services, can also lead to failures. However, things will have to change to evolve and integrate best practices and innovations. Platform should provide capabilities to experiment with the new innovations / improvements in controlled way and accelerate the process of change.
Figure below captures various elements involved in implementing this practice.
It brings out five key areas to focus, which are explained in following sections:
For understanding our customers at Jio, we must get greater insight into how they behave in real time to drive faster business decisions. Real time data is needed from system to predicting any faults. Platform should be able to seamlessly provide this information to developers to provide quick implementation.
Basic questions which we can cover:
On an average people spend 5 hours a day on social media. Clearly the best place to reach our customers. We can gather information from the social media and can try to run a compelling, authentic campaign that involves real people plus a “social impact” element.
As shown in diagram below, data can be coming form multiple sources, Apps, System, DB, Sensors etc. Data is gathered and analyzed to find different patterns. Patterns drive the decision making for various stakeholders to improve systems, new features and hence delivering value to customer. Transparency and availability of these systems is key to respond faster to customer needs.
Refer Self-Service Data section for more details on this practice.
InfluxData provides a Modern Time Series Platform, designed from the ground up to handle metrics and events.
Telegraf :- It is a plugin-driven server agent for collecting and reporting metrics. Telegraf has plugins or integrations to source a variety of metrics, pull metrics from third-party APIs, or even listen for metrics via a StatsD and Kafka consumer services. It also has output plugins to send metrics to a variety of other datastores, services, and message queues, including InfluxDB, Graphite, OpenTSDB, Datadog, Librato, Kafka, MQTT, NSQ, and many others.
Kafka :-Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
InfluxDB :-InfluxDB is used as a data store for any use case involving large amounts of timestamped data, including DevOps monitoring, application metrics, IoT sensor data, and real-time analytics.
Chronograf :- Chronograf is an open-source web application written in Go and React.js that provides the tools to visualize your monitoring data and easily create alerting and automation rules.
Kapacitor :-Open source framework for processing, monitoring, and alerting on time series data.
Implementing canary releases means adjusting our deployment strategy. Our new process might look like this:
Development: We work on a new feature using a local copy of the app.
Staging: At this point we’re happy with the state of our app. The feature is working as expected and any bugs we may have found are fixed. The latest version of the website is pushed to another private website where customers can review.
Following a trial, we have two options.
Canary release is a technique that helps reduce the impact of negative changes by gradually rolling out the changes. If a problem with the new release is detected during the rollout then it can be rolled back, and only a subset of the traffic will have been impacted.
Let us now describe an example traffic-control configuration that we want to achieve (as shown in the figure below):
We want to deploy three separate instances of the applications.We call these versions “alpha” (early-adopter least-tested version), “beta” (believed to be ready for general availability), and “ga” (hardened, general availability version).
We identify clients coming from the “employees” and “tester” groups, based on their public IP addresses. We want to send 100% of traffic from “employees”, and 30% of traffic from “tester” group to the “alpha” instance. The “beta” instance will get the rest of the “tester” traffic and also 1% of the public traffic. Finally, 99% of the public traffic should go to the “ga” instance.
Using the high performance and reverse-proxy server i.e NGINX we can handle the incoming request.
For more information click here
How do facebook and google manage software releases without causing major problems
Guru : Martin Fowler
Blogs :
Feature toggles are a powerful technique, allowing teams to modify system behavior without changing code. They fall into various usage categories, and it’s important to take that categorization into account when implementing and managing toggles. Toggles introduce complexity. We can keep that complexity in check by using smart toggle implementation practices and appropriate tools to manage our toggle configuration, but we should also aim to constrain the number of toggles in our system.
“Feature Toggling” is a set of patterns which can help a team to deliver new functionality to users rapidly but safely. In this article on Feature Toggling we’ll start off with a short story showing some typical scenarios where Feature Toggles are helpful. Then we’ll dig into the details, covering specific patterns and practices which will help a team succeed with Feature Toggles.
Feature toggle architecture is used for releasing the software updates(new features) which can be rolled back when something goes wrong with new release. New features are introduced to users and behaviour of the same is observed.
A feature flag changes the runtime behavior of your application depending on a configuration. The configuration can be:
or a mix of the above. This is really powerful, because you can develop long term features inside the master branch and release them when you are ready. But it is also dangerous, because you have to maintain the compatibility between your features on all levels (persistence, UI etc.) and the complexity to test all runtime alternatives may increase dramatically.
The first instinct is to just use the configuration system of your application and write your own framework for feature flags. But when you think more about it, this has some disadvantages. A framework for feature flags should:
Implementing a framework that meets these requirements is pretty complex.
There are a lot of open source frameworks for the different languages. For Java there are Togglz, FF4J, Fitchy and Flip. For .Net there are FeatureSwitcher, NFeature, FlipIt, FeatureToggle or FeatureBee. Some use strings, some enums and some classes – but none has a high scalable backend and a portal to manage your flags (at least not that I know).
If you start with feature flags the chance is high, that it gets really complex after some time. So when Jim Bird writes that Feature Toggles are one of the Worst Kinds of Technical Debt, it is for a reason. So how do you use feature flags “the right way”?
The first thing is, that not all feature flags are the same and you should not treat them that way. There are short-lived feature flags, that are used to roll out new features or conduct experiments. They live for some time and then go away. But there are also feature flags that are intended to stay – like flags that handle licensing (like advanced features etc.). And there are mid-term flags for major features that take a long time to develop. So the first thing to do is to create a naming convention for the flags. You may prefix your flag names with short-, temp-, mid- or something like that. So everyone knows, how the flag is intended to be used. Make sure to use meaningful names – especially for the long-lived flags – and manage them together with a long description in a central place.
Mid and long term flags should be applied on a pretty high level. Like bootstrapping your application or switching between micro services. If you find a mid or long term flag in a low level component you can bet this is technical debt.
Short term flags are different. They may need to reside on different levels and are therefore more complex to handle. It is a good idea is to use special branches to manage the cleanup of flags. So right when you introduce a new feature flag, you create a cleanup branch that removes all the flags and submit a pull request for it.
A/B testing (also known as split testing or bucket testing) is a method of comparing two versions of a webpage or app against each other to determine which one performs better. AB testing is essentially an experiment where two or more variants of a page are shown to users at random, and statistical analysis is used to determine which variation performs better for a given conversion goal.
Running an AB test that directly compares a variation against a current experience lets you ask focused questions about changes to your website or app, and then collect data about the impact of that change.
Testing takes the guesswork out of website optimization and enables data-informed decisions that shift business conversations from “we think” to “we know.” By measuring the impact that changes have on your metrics, you can ensure that every change produces positive results.
(1) Deploy Latent code
(2) Develop new/ impactful features on the same trunk/master without affecting release priority features
(3) Ship alternate code paths within one deployable unit and choose between them at runtime
THERE IS NO DEVOPS WITHOUT FEATURE FLAGS!
Feature toggle benefits and risk
This is provides more insight about feature toggle.Basic concepts are explained.
This videos explains more details about the feature toggle.
Guru: Martin Fowler
Blog : Feature toggle
Advanced deployment strategies such as blue-green deployments and rolling deployments are critical for managing multinode installations of applications that must be updated without interruption. Blue-green deployments fit these requirements because they provide smooth transitions between versions, zero-downtime deployments, and quick rollback to a known working version.
Blue-green deployment is a release technique that reduces downtime and risk by running two identical production environments called Blue and Green. At any time, only one of the environments is live, with the live environment serving all production traffic. For this example, Blue is currently live and Green is idle. As you prepare a new release of your software, deployment and the final stage of testing takes place in the environment that is not live: in this example, Green. Once you have deployed and fully tested the software in Green, you switch the router so all incoming requests now go to Green instead of Blue. Green is now live, and Blue is idle.
This technique can eliminate downtime due to application deployment. In addition, blue-green deployment reduces risk: if something unexpected happens with your new release on Green, you can immediately roll back to the last version by switching back to Blue.
Blue-Green Deployment with Pivotal Cloud Foundry
How does blue-green deployment work with AWS
AWS services used in Blue/Green deployments?
Guru :Martin Fowler
Blog : Blue Green Deployment
Data visualization is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns. With interactive visualization, you can take the concept a step further by using technology to drill down into charts and graphs for more detail, interactively changing what data you see and how it’s processed.
Industry-renowned data visualization expert Edward Tufte once said: “The world is complex, dynamic, multidimensional; the paper is static, flat. How are we to represent the rich visual world of experience and measurement on mere flatland?” He’s right: There’s too much information out there for knowledge workers to effectively analyze — be they hands-on analysts, data scientists, or senior execs. More often than not, traditional tabular reports fail to paint the whole picture or, even worse, lead you to the wrong conclusion. AD&D pros should be aware that data visualization can help for a variety of reasons:
Few open source tools available in market for data visualization are - Metabase, Dashing, Grafana, etc.
Data Visualization Build More Effective Data Visualizations