Devops and The Principle Of Flow

lean-software-development-1-728

In the technology value stream, work typically flows from Development to Operations, steps consisting of functional areas between our business and our customers.

As stated in the lean principles developed by Toyota, we should optimize to get a single-piece fast and smooth flow for our releases.

We increase flow by:

  1. Making work visible,
  2. Reducing batch sizes and intervals of work
  3. Building in the quality, preventing defects from being passed to downstream work centers.

Why a fast flow is needed?

By speeding up the flow through the technology value stream, we reduce the lead time required to fulfill internal and external customer requests, further increasing the quality of the work while making us more agile.

Our goal is to decrease the amount of time required to deploy the changes into production and increase the reliability of those services.

Make our work visible

agile-pm-kanban-board

A significant difference between manufacturing and technology value streams is that our work is invisible.

It’s so easy for work to keep bouncing off between teams and yet have no visual control over it.

To prevent this and make out work more visible, we can use something like a Kanban board. (I prefer Trello for this).

Ideally, our Kanban board will span the entire value stream, defining work as completed only when it reaches the right side of the board.

Work is not done when development completes, but only when our application is running successfully in production.

Limit Work In Progress (WIP)

In technology, our work is far more dynamic than manufacturing. Teams have to satisfy demands of multiple stakeholders. As a result daily work gets dominated by urgent requests for work coming through every communication channel possible.

We can limit multi-tasking by using Kanban board, such as by codifying and enforcing WIP limits for each column on the board.

For example, we may set a WIP limit of three cards of testing. When there are already three cards in the testing column, no new cards can be added.

Using Kanban ensures that work is visible and WIP doesn’t get piled up.

Reduce Batch Sizes

one-piece-flow

Another key component to creating smooth and fast-flow is performing work in small batch sizes. Prior to the lean manufacturing revolution, it was common practice to manufacture work in large batches.

However, large batch sizes result in skyrocketing levels of WIP. According to lean principles, the ideal is a single piece flow, where each batch size is of just one.

Let’s take an example:

Suppose we have ten brochures to mail and mailing each one of them requires 4 steps:

  1. fold the paper
  2. insert the paper into the envelope
  3. seal the envelope
  4. stamp the envelope

Now in the traditional batch processing flow, we will perform each step sequentially for all ten envelopes.

In the lean one-piece flow, only one envelope can be at any given step. In other words, we fold the paper, insert it into the envelope, seal the envelope and stamp it before starting the next one.

How is one-piece flow dramatically better?

In the above example, suppose each step takes 10 seconds. In batch processing, we get our first complete envelope after 310 seconds, but with the one-piece flow we get it just after 40 seconds.

Worst, what if we find that the way we have folded the paper, doesn’t allow the envelope to be sealed. In which case we’ll be in a bigger trouble?

Eliminating hardships and wastes in the value stream

According to the Toyota Production System pioneer Shiego Shingo, a waste is:

The use of any material or resource beyond what the customer required or is willing to pay for

In software development value stream, a waste is anything that causes a delay for the customer, such as activities that can be bypassed without affecting the result.

The following are some common categories of waste that we encounter when implementing lean in software value stream.

  1. Partially done work
  2. Extra processes
  3. Extra features
  4. Task switching
  5. Waiting on QA or testing or acceptance testing
  6. Defects and bugs
  7. Non-standard or manual work

Explaining each of the above point deserves a post of its own. Will do that soon.

Conclusion

Improving flow through the technology value stream is essential to achieving DevOps outcomes. We do this by making work visible, limiting WIP, reducing batch sizes and eliminating wastes from our processes.

All of this will allow us to become more agile and will help in reducing lead times dramatically, and at the same time increasing the quality of releases.

That’s all, folks!

 

Advertisements

Strive to learn : 8 Ways to optimize for learning at work as a software engineer

avi-richards-183715-e1499951479114

A large open space with amazing ergonomic chairs where people discuss and execute upon disrupting ideas. It’s right next to company’s game room where you unwind after a hard work day.  Here is we as engineers, get to work on products that our customers love and we love delivering that delight by continuous delivery (or something else :P).

Yet the most prominent thing that excites and should excite an effective engineer is the opportunity for learning at work. Optimizing for learning is a high leverage activity and should be on the top priority for every engineer.

Here are 8 ways to optimize our work for learning, deeply inspired by the book The Effective Engineer.

1. Study code for core abstractions written by the best engineers at your company

pydev_refactoring

I have been lucky enough to be the dumbest engineer at Squad. But that has allowed me to learn very aggressively during working hours just by reading through libraries and modules written by other awesome engineers.

So next morning, open that black box that you’ve been importing for so long in your code and dig through it.

2. Do more of what you want to improve

In more relatable terms, if you think you want to improve writing SQL queries, do more of that. If you want to improve at doing code reviews, do more of that.

Practice and deliberately touch your weak points instead of cutting corners. You’ll be amazed how helpful your fellow engineers/friends will be in helping you do so.

3. Go through as many technical materials available

We at Squad have a dedicated slack channel where engineers share good to read articles, blogs and podcasts.

I’ve made a pact to go through each and every article that is shared on that channel irrespective of the domain or the tech it. And so far this has been a catalyst for my learnings on things that I didn’t even know were there to learn.

4. Master the programming language that you use

Read a good book or two on them. Get to the internals of the language that you use primarily at work. We at Squad use python heavily for the back-end, machine learning, data analytics and everything.

Personally, I’ve added 2 great books to my reading list that I’ll be picking next:

  1. Fluent Python
  2. Mastering Python Design Patterns.

5. Send your code reviews to the hardest critics

1_INwRDJ_vspfJKkyFpv5jww

At Squad, code reviews are in the DNA of engineering processes. I’ve been very fortunate to be on-boarded to the Squad codebase by one of the best and hardest code critics at the company. It really helped me in developing high code quality standards and also the art of reviewing code.

Not only that taught me how to write better code but also how to deliver your code reviews in a respectful manner that the other person doesn’t feel discouraged, something that I always keep in mind while doing code-reviews myself.

6. Enroll in classes on areas where you want to improve

Courses on sites like edx, coursera, Udacity have amazing courses that we can take in our spare time. Let it be compilers, database, machine learning, infrastructure, these platforms have amazing courses on all of them.

Personally, I try to keep exactly one online course in-progress all the time.

7. Participate in design discussions of projects you are interested in

pu2ppth4dc2fsg0i

Don’t ask for an invitation. Ask the engineers if they’d mind you being a silent observer or even a participant in the design discussion.

8. Make sure you are on a team with at least a few senior engineers whom you can learn from

This will help increase your learning rate at least 80% of the time.

At Squad, I get to work with one of the most awesome engineers I’ve got an opportunity to work with. That has helped me in learning and polishing things like estimations, product thinking, designing, communication etc.

Conclusion

Our work fills a large part of our life. Making sure that our work is driving our learning and improvements helps big time in maintaining contempt and keep progressing on the path to become a better effective engineer.

Resources

  1. http://www.effectiveengineer.com/
  2. https://blog.fogcreek.com/the-effective-engineer-interview-with-edmond-lau/

That’s all, folks!

 

Practical Problem Solving Framework: Inspired By The Toyota Way

toyota-7-step-pratical-problem-solving-process

We all will agree to a certain point that having a system/process for anything reduces chances of errors.

As an engineer or someone people look forward to propose solutions to problems it’s beneficial to have a framework in place to solve problems effectively.

Recently I was reading The Toyota Way, and it suggested a framework to Practical Problem Solving. It almost felt trivial that this sort of framework would be invaluable to software engineers too (in fact for everyone).

When confronted with a problem, first we want to make it crystal clear and get a grasp of the real point of cause. That’s followed by a series of 5 WHYs? to investigate the root cause. And finally countermeasures, evaluations, and countermeasures.

1. Initial Problem Perception

Large, vague and complicated problem is presented. The first step is to perceive all the information available at this point information of time.

Ex. “Hey! Metric X is showing incorrect value”

This doesn’t show the actual problem, but just a perception of how some internal user saw it.

2. Clarify The Problem

Next step is to clarify the problem to scope it down. Go and see the problem yourself. Analyse it and get a clear understanding.

As you are seeing the problem first hand, we want to gather as much information as possible.

Ex. So the entire analytics data was actually not consistent.

3. Locate Point Of Cause

Next step is to dig a little deeper and try finding the point fo cause.

Where is the problem observed? Where is the likely cause? This will lead us to the vicinity of the root cause, which we find in step 4.

Ex. Analytics system is working correctly, just that it sometimes doesn’t get updated every 5 minutes like it’s supposed to.

Here we rule out other possible causes, like a bug in the code or wrong data was tracked in the first place.

4. Ask, 5 WHYs? Investigation of  the root cause

Here from the direct cause, we expose and go deep to the root cause of the problem by asking WHY five times.

Ex.

1. 1st why – Why was data inconsistent: Because analytics didn’t get updated on time.

2. 2nd Why – Why analytics were not updated on time – Because scheduled ETL jobs didn’t run on time.

3. 3rd Why – Why the schedule jobs didn’t run on time – Because CPU usage was 100%

4. 4th Why – Why CPU reached 100% – Because server instance size was not enough to handle increased number jobs.

5. 5th Why – Why server size was not enough to handle the spike in usage –  Because our auto-scaling is slow.

By asking a series of 5 whys, we can generally get to the root cause of the problem and fix it there instead of just duct-taping it and be waiting for it to rise again.

5. Countermeasures

This step is fixing the root cause of the problem so that this doesn’t come up again.

Ex. Moved to a more sophisticated auto-scaler to manage spikes in usages and setting up alerts to monitor the performance.

6. Evaluate

After the countermeasure have been executed, it’s important to evaluate the effect post that. Was the problem solved?

Ex. “Now analytics are always in sync and even if they miss getting updated, we get an alert to know it beforehand and take action.”

7. Standardize

This resonates with another Toyota principle of jidokameaning building in the quality.

How can we standardize the countermeasures such that similar problems are not faced again? How can we propagate our learnings across the organization?

Ex. “Document and standardize the process that for all our instances and jobs proper alerts must be in place so that we know when they are malfunctioning”

Conclusion

This was my take on how we can learn from a cross-discipline organization like Toyota on how to have a process and framework in place to solve problems effectively.

Afterall, problem-solving is supposed to be fun and having a proper framework in place, helps us keep it that way!

That’s all, folks!

 

8 System Design Principles I learned After Doing It Wrong More than 50 Times!

laptop-3190194_960_720

 

At Squad, we strive to build awesome products to solve customer(internal and external) needs. As a product engineer, paramount part of your job is to design and build products. Dig deep into the root cause of the problems, design solutions and implement them as the end product.

Over the course of my journey so far, here are the 8 system and product design principles that I’ve learned from other awesome people at Squad, from feedback and simply doing it not right enough multiple times.

1. What is the underlying problem that led to the feature request?

At Squad, you don’t just code the requirements into the software. As a product engineer, it’s your responsibility to remove the layers and expose the root problem that led to the feature requirement.

Get to know the root cause of the problem that you are trying to solve. Or even better, as the lean principles say “genchi genbutsu” i.e go and see it yourself.

2. How can you make the feature more robust, reliable and usable?

Once the essential feature requirements are finalized, we must press on how can we make the feature more robust, reliable and usable?

Things to ponder upon and take into consideration can be :

  1. The persona of the users that’s going to use that.
  2.  Scenarios in which that feature would be used. Ex, if in the case of fires, than show more data than needed for faster resolutions.
  3. Building in the quality in the product itself or “Jidoka” as said in lean.

3. What is the first iteration going to be?

Given the time and resources you have., what is the best possible first iteration of the product going to be? If it’s a large system or something you are building from scratch, there are always going to be iterations.

The main idea here should be to move fast and get things shipped. Good enough and shipped on time is always better than perfect and in-development forever.

4. How easy will it be to make iterations on the current feature?

The design should incorporate all the non-functional requirements to make future iterations easy.

Scale the feature? Change a component? Use a different 3rd party service? Your implementation should be flexible enough to incorporate and encourage these enhancements.

Design patterns are your best friend here.

5. What are the potential bottlenecks with scale?

Scale-land is where everyone wants to be, but it is scary. It breaks what was not supposed to break and has witnessed more horror stories than a haunted castle.

What are the potential bottlenecks that are not a problem now, but will break at 5X, 10X or 100X scale?

List them down on the feature ticket, or better document it in the code itself.

6. What’s the data that has to be captured and how will it be consumed?

Every feature in the product will need some data that needs to be captured to track it. It can be but not limited to:

  1. Action logs.
  2. Event logs.
  3. Metrics
  4. Failures.
  5. Anamolies.

What affects this majorly is how that data will be consumed? Store it in a structure that will make the consumption of data easy and efficient. Afterall, the only motive to store data is to use it.

7. How good the developer experience will be when interacting with the code base of that feature?

There can be many developers who’ll use or modify the code that you are going to write.

How will be their experience when doing that? Ex. Will the test cases you wrote, make them feel confident enough to make changes fast?

Few points to consider:

  1. Is the code well documented?
  2. Are test cases strong enough?
  3. Is the code, re-usable where it makes sense?
  4. Are functions small and code, simple to read?

8. What metrics will determine that the feature has been implemented successfully?

Finally, after all the fun-time you had creating the feature, what will determine that the feature has been implemented successfully?

The data you tracked will be of paramount importance here.

It can be the case that to track this quantitatively is not possible, but can you track the qualitatively in that case?

The idea here is that you can’t improve what you can’t measure?

Processing 100,000 requests? Fewer errors by the users? 95% work done by the new system instead of old one?

This can and will involve more stakeholders of the team and not just the developer.

Conclusion

Obviously, this is not the exhaustive things to take into consideration while designing a system or a product as an engineer. This just covered what I have learned so far by just doing things wrong or not right enough multiple-times.

It’s fun to build stuff! Continuously improve (“Kaizen” in lean)! Keep iterating! Keep shipping!

 

 

That’s all, folks!

 

Deploying a nginx application using Kubernetes for Self-Healing and Scaling

Kubernetes is an open source system for automating deployment, scaling and management of containerized applications. A more technical term for it is, container orchestrator which is used to manage large fleets of containers.

Minikube is an all-in-one single node installation for trying out kubernetes on local machines. And the following post covers deploying a nginx application container using kubernetes in minikube.

If you don’t have, then this link has it all to install minikube and kubectl (command line tool to access minikube) : Download and install minikube and kubectl

Step 1 : Making minikube up and running

Ensure that minikube is running.

starting_minikube

Step 2 : Open the minikube dashboard

Minikube comes with a GUI tool that opens in the web browser. Open the minikube dashboard with following command :

opening_dashboard

It should open the dashboard in a browser window and it’ll look something like this:

first_look_dashboard

Looks cool! No?

Step 3 : Deploy a webserver using the nginx:alpine image

Alpine linux is preferred for containers because of its small size. We’ll be using the nginx:alpine docker image to deploy a nginx powered webserver.

Now, go the deployments section and click the create button, which will open an interface like below.

create_app_filled

Fill in the details as shown in the image.

We can either provide the application details here, or we can upload a YAML file with our Deployment details.

As shown, we are asking kubernetes to create a deployment with nginx:alpine image as container and that we want 3 pods (or simply instances) of that.

A pod in kubernetes is a scheduling unit, a logical collection of one or more containers that are always scheduled together.

Go on and click that awesome deploy button!

Step 4 : Analyzing the deployment

Once we click the deploy button. Kubernetes will trigger the deployment. Deployment will create a ReplicaSet. A ReplicaSet is a replication controller that ensures that specified number of replicas for a pod are running at any given point of time.

Flow is something like this:

Deployment create ReplicaSets, ReplicaSets create Pods. Pods is where the real application resides.

deployment_overview

As expected, we have our deployment, replica set and pods in place.

We can also, check our deployment via command line using kubectl.

deployment_overview_cli

Step 5 : Create a Service and expose it to the external world with NodePort

So far, we have our pods up and running. But how do we access them?

This is where a service comes into play. K8S provides a higher level abstraction called as a service that logically groups pods and policy to access them. This grouping is done via labels and selectors.

Then we expose the service to the world by defining its service type and service redirects our request to one of the pod and load balances them.

Create a my-nginx-webserver.yaml file with the following content:

https://gist.github.com/priyankvex/3b34ec02c82934b84c8dfb68272ed4f1

apiVersion: v1
kind: Service
metadata:
  name: my-nginx-web-service
  labels:
    run: my-nginx-web-service
spec:
  type: NodePort
  ports:
  - port: 80
    protocol: TCP
  selector:
    app: my-nginx-webserver

Enter the following commands to create a service name my-nginx-web-service

creating_service_cli

We can now verify that our service is running :

service_running

Step 6 : Accessing the application

Our application is running inside the minikube VM. To access the application from our workstation, let’s first get the IP address of the minikube VM:

minikube_ip

Now head to the address and port number of the service we got in above step.

application_running

And our app is running! Amazing, give yourself a pat now!

Taste of self-healing feature of the kubernetes system :

One of the most powerful feature of kubernetes is self-healing capabilities (just like Piccolo. DBZ, anyone?). While defining our app we created a replica set with 3 pods. Let’s go ahead and kill one pod and kubernetes wil create another one to maintain the running pod count 3.

self-healing.png

As we can see in the image. We deleted the bottom-most pod and K8S created a new one instantly.

Such kubernetes! Much HA (High Availability)!

Taste of scaling with Kubernetes:

Now, our app is receiving a crazy amount of traffic and three nginx pods are not enough to handle the load. Kubernetes allows us to scale our deployments with almost zero effort.

Let’s go ahead and spin up a new pod.

scaling_menu_option.png

scaling_to_4.png

Click OK. Now let’s go and check our pods.

scaled_deployment.png

As we can see in the image, we have now 4 pods running to handle the increased traffic.

Isn’t it amazing? We just horizontally scaled our application with the power of kubernetes.

This was just the tip of the iceberg what Kubernetes can do. I am also exploring the kubernetes and containerized architecture just like you, hopefully we’ll be back with another post soon with more kubernetes stuff!

That’s all, folks!

Estimation Peril: How To Estimate Software Projects Effectively(or How Not To Lie)

road-1668916_960_720

Consider, you are a rockstar engineer and you are given a task by your favorite person, your project manager, to show some new fields in the dashboard.

As usual, you are asked to estimate it as soon as possible. You think that well, seems like a quickie and you are tempted to estimate it a day. But you, being burnt before, decided to look at the fields that are to be added carefully. These fields are for analytics. You think, ok, let’s make it 2 days then. But being more cautious, you dig deeper and find that those analytics are not even being tracked on the app.

Now to complete the story, you’ll have to track the analytics, send them to the server, make the backend accept those and store them, show these on the dashboard, write tests etc….

What seemed a simple task is now a 1-2 week thing. Very hard to estimate. And your manager was expecting a response like, “would be done by end of day”.

What is the problem with estimates?

The main problem with an estimate is that the “estimate” gets translated into commitment. And when you miss a commitment, you breed distrust.

Most estimations are poor because we don’t know what they are for. They are uncertain. A problem that seemed simple to you on the whiteboard, turned out not to be so simple. There were non-functional requirements, codebase friction, some unfortunate bugs etc. We deal with uncertainty.

There is a rule in software engineering that everything takes 3X more time than you think it should, and this holds true even when you know this and take it into account!

Estimates can go the other way too, that is when you overestimate. This is as dangerous as underestimating.

What should an estimate look like?

An estimate should have 3 characteristics :

  1. Honest (Hardest)
  2. Accurate
  3. Precise

1. Honest : 

You have to be able to communicate bad news when the news is bad. And when the continuous outrage of your managers and stakeholders is on your face, you need to be able to continue and assert that the news is bad.

Honesty is important as you breed trust. You are not eliminating disappointment, rage and people getting mad, but you will eliminate distrust.

2. Accurate :

You are given a task and you estimate it to take somewhere between now to the end of the universe. That’s definitely accurate, it’ll be done within that time.

We won’t breed distrust, but we definitely will breed something else.

Which brings us to the 3rd characteristic.

3. Precise : 

An estimate should have just the right amount of precision.

What is the most honest estimation that you can make? I don’t know!

This is as honest as it can get. You really don’t know. But this estimation is neither accurate not precise.

But when we try to make precise estimates, we must note that we are assuming that everything goes right. We get the right breakfast, traffic doesn’t suck, your co-worker is having a good day, no meetings, no hidden requirements, no non-functional complexities etc.

Estimating by work break down

The most common way to estimate a complex task is to break it down into smaller tasks, into sub-tasks. And then those sub-tasks into sub-sub-tasks and so on until each task in hand is manageable and ideally not more than 4 hours of work.

Imagine this forming a tree, with executable tasks at the bottom as leaves. You just estimate the leaves and it all adds up.

This approach works, but there are 2 problems :

  1. We missed the integration cost
  2. We missed some tasks

There is a fundamental truth to work break down structure estimates:

The only way to estimate using work break down chart accurately, to know what are the exact sub-tasks, is to implement the feature!

What to expect from an estimate?

Estimates are uncertain. There is no guarantee that your estimate will work itself out. And that’s OK. It’s your manager’s job to manage that risk. We are not asking them to do something outside of their job.

The problem arises when you make a commitment. If you make a commitment, you must make it. Be ready to move heaven and earth to make it. But if you are not in a position to make a commitment, then don’t make one.

Because he’s going to set up a whole bunch of dominos based on that commitment, and if you fail to deliver, everything fails.

Some interesting links :

https://medium.com/swlh/your-app-is-an-onion-why-software-projects-spiral-out-of-control-bb9247d9bdbd

Uncle Bob on Estimates: https://www.youtube.com/watch?v=eisuQefYw_o

Happy Estimating!

That’s all, folks!

The Blue Ocean Strategy : How To Create Uncontested Market Space and Make the Competition Irrelevant

When Henry Ford made cheap, reliable cars people said, ‘Nah, what’s wrong with a horse?’ That was a huge bet he made, and it worked.
The whole idea of The Blue Ocean Strategy is to create uncontested market spaces that creates new demands and make the competition irrelevant.

The book describes Red Oceans as known market places that have bloody competition among businesses trying to win customers. Here there is a fixed existing demand of which every company wants a share.

The Blue Ocean on the other hand is an uncontested market place that creates demand for itself, which is not known to others. This makes competition irrelevant. Focus is on creating, not competing.

Value Innovation :

Value innovation occurs when company align innovation with utility, price and cost positions. Instead of using competition as the benchmark companies focus on taking leaps ion value for customers.

Idea behind value innovation if to break out of Value-Cost trade off.

Reducing Costs :

Reduced costs for the products are achieved by eliminating and reducing the factors that the conventional industry competes on.

Best example to illustrate this is the case study of Ford Model T.

Ford eliminated all factors like multiple colors and design variants and focused only on creating better cars for the masses.

Identifying Blue Oceans :

Identifying blue oceans needs managers and strategists of the company to brain storm on the strategy canvas. Where each manager holds his/her department accountable.

The strategy canvas’ focus must be shifted from competition to alternatives and from customers to non-customers.

Reconstruct Market Boundaries :

The author proposed a 6 step framework for identifying blue oceans in new market places :

  1. Look across alternative industries
  2. Look across strategic groups within industries
  3. Look across complementaries
  4. Look across the chain of buyers
  5. Look across functional and emotional appear to buyers
  6. Look across time

Reaching Beyond Existing Demands

To reach the customers in new markets, think of non-customers before customer differentiations.

There are 3 tiers of non-customers :

  1. Jump Ship : These can switch to competitors on any moment.
  2. Refusing : These are using competitors products.
  3. Distant : Product doesn’t appeal to these customers.

Examples of Blue Ocean Strategies Implemented by Famous Companies :

  1. Ford :

Ford standardized the car and made the options limited. This increase the quality of the car and brought the price point down.

2. GM :

General Motors found their blue ocean in making the cars fun, fashionable and comfortable.

3. Watson :

Watson computers introduced tabulators for businesses for the first time. They also introduced leasing pricing models which made it easy for businesses to own a tabulator.

4. Apple :

Apple created Apple II and tapped the new market for ready-made, easy to use personal computers.

5. Dell :

Dell on the other hand, found its blue ocean by changing the purchasing and delivery experience of the buyer. It allowed customization of the machines according to the needs of the buyer.

It is evident from the above examples that blue oceans are not unleashed by technology innovation per se but by linking technology to elements valued by buyers.

Strategy for Blue Ocean Implementation :

Two views on industry structure are related to strategic actions.

  1. Structuralist View :

Based on market structure to conduct and performance. This view on strategy deals with making sure that the company is making money in the red oceans.

2. Reconstructionist View :

This view is based on endogenous growth. It focuses on creativity not systematic approaches.

This view is responsible to find blue oceans for the company.

Both the views towards strategy are necessary to assert the company is making money is also exploring new markets to remain competent in future too.