6 Lessons On Work Ethic I Learned In One Year Of Professional Career

vagnini_101317_4227_hero_lg

Time flies. Recently I had completed one year as a full-time employee at my current employer Squad. A year has passed, and I decided it was time to revisit instances, memories, and experiences and to recollect what I had learned as a professional in this past year.

It was also a wake-up call to reassess and redirect the ship named professional career to make sure it doesn’t get stuck in a whirlpool.

After all, our career is our responsibility and we all should make efforts to “Own our story!”.

This is going to be a list of 6 important lessons in professionalism and work ethic that I learned working as a Product Engineer at one of the most innovative startups in India.

1. Know your field

Do you know what a facade pattern is? Do you know what sprint and story points are? Do you know how to work your way with the debugger your IDE provides?

A wealth of ideas, disciplines, techniques, tools, and terminologies have decorated the last fifty years of our field. And if we want to be a professional we want to know a sizeable chunk of it.

The motto that I believe in is, “If you want to see far, stand on the shoulder of the giants”.

 

2. Continuous Learning

The frenetic range with which the industry is changing, it means that we as engineers also need to learn colossal amount just to keep up.

Read books, read articles, watch talks. Keep adding deltas to your learning daily.

 

3. Practice

This stuck me around 2 months back. Tennis is my favorite sport, and players believe that playing in the tournaments continuously actually makes their game less polished.

It’s the deliberate practice of doing things right, which makes it gleamy again.

Is it true about our jobs also? Cutting corners to meet deadlines, working on the outdated stack at the company, working with legacy code, can all this make us less sharp.

At least I find it to be true, and practice to deliberately improve your craft is a vital component of one’s work ethic.

 

4. Know your domain

It is the duty of every software developer to understand the domain of the solutions that they are programming. If you are writing software for healthcare, get at least a basic understanding of healthcare, if you are writing for sales, know about sales.

Read a book or two on the domain, ask the domain experts.

We should be able to know enough about the domain to question the product direction and feature requests that we get.

 

5. Collaboration and mentoring

We all must make special efforts to collaborate, mentor and get mentored by other developers.

Whatever I have learned in this course of one year, a major portion of it due to learning from others.

 

6. Humility

Programming is an act of creation. It feels like magic when we can write code that can do things that can produce tremendous value.

We all should take pride in our work, but never be arrogant about it. We should be confident in our abilities and take risks.

But we must know that we will fail. Things that we create will break, risks will be proven wrong and we will be called upon for these mistakes.

And when all this will happen, all we can do is be humble and take Howard’s advice, laugh and move on.

 

That’s all, folks!

 

Advertisements

Tracking Metrics to Surface and Solve Problems: Metric Tracking Practices I’ve Learned So Far

It is a nice pleasant evening, you are sipping coffee and reviewing your code one final time, just so that you can gather enough confidence to hit the deploy button.

But a fact of life as a software engineer is that things can go wrong. Small changes may result in unexpected outcomes, including outages, errors or negatively impacting customers.

And when problems occur, either we can do random checks and validations that may or may not solve the problem or we can have a disciplined problem-solving approach that relies on data rather than intuitions.

Metrics and Telemetry

To enable a disciplined problem-solving method, we need our software to track the right metrics and the right places. We need to design our systems so that they are continually creating telemetry.

What is Telemetry?

The DevOps handbook defines telemetry as,

An automated communication process by which metrics are collected at remote points and are sent over to receiving equipments.

When designing systems, it is a high leverage activity to include creating telemetry as a first-class citizen to enable and ease tracking metrics at all the levels needed, right from business level metrics to deployment pipeline level.

Levels Of Metrics Tracking

As engineers, the software we write impacts the organization at multiple levels, from infrastructure to the product, to business. Thus, to resolve problems quickly, we need to track metrics all these levels.

Following levels of metrics have been really useful for me to keep a checklist for adding the right metrics in the software that I write.

1. Business Level Metrics:

These metrics directly affect the business. Thus are really imported to keep an eye on.

Examples include sales transactions, numbers of items clients sent, total successful items processed, hourly processing rates, etc.

2. Application Level Metrics:

These metrics track the functioning of the application.

Examples include latency of the APIs, response time of queries, number of errors etc.

3. Infrastructure Level Metrics:

These metrics track the infrastructure that runs our application.

Examples include CPU usage, available memory, IOPS spikes etc.

4. Product Level Metrics:

These metrics track the product progress and results. As product engineers, it’s a high leverage task to track product-related metrics too.

Examples include A/B test results, feature toggling results, product progress, product extensibility, and configurability etc

By having telemetry coverage in all these areas, we will be to see the health of everything that our software relies on or things that rely on our software.

Conclusion

With the limited time that I’ve spent in the software industry, I’ve come to realize the importance of metrics. Even the parts that don’t involve any software must be tracked and measured. The key idea here is that you can not improve what you don’t measure.

With right telemetry built into the software that we write, we’ll be able to not only solve the problems they arise but also surface the latent ones before they catch fire.

That’s all, folks!

 

Organization Archetypes And The Concept Of Market-Oriented “Solver Teams”

depositphotos_44074529-stock-illustration-flat-design-illustration-concept-of

Organizations which designs systems are constrained to produce designs which are copies of the communication structure of the organization.

In other words, how we organize our teams has a powerful effect on the software we produce, as well as our resulting architectural and production outcomes.

Thus, in order to get a fast flow of work from Development to Operations, with high quality, great customer outcomes and fast speed of delivery,  we must organize our teams to bring the team structure to our advantage.

Done poorly, this can prevent teams from working safely and independently, instead, they’ll be tightly coupled, all waiting on each other for work to be done.

At SQUAD, teams are structured as market-oriented teams, to quickly respond and solve customer needs. At SQUAD, we call them “Solver Teams”.

 

Organizational Archetypes

meaning-of-life

There are primarily three types of organization structures that inform how we design our DevOps value stream: functional, matrix and market.

Functional-oriented:

Organizations optimize for expertise, division and reducing cost. These organizations centralize expertise and have tall hierarchical structures.

Ex. Server admins, SREs, Data admins

Matrix-oriented:

Organizations attempt to combine functional and market orientation. This results in complicated organization structures like a single person reporting to multiple managers etc.

Market-oriented:

Organizations optimize for responding quickly to customer needs. These organizations tend to be flat, composed of multiple cross-functional disciplines (ex. marketing, engineering, machine learning).

Each market-oriented team is responsible for feature delivery, operational tracking and service support.

Market-Oriented “Solver Teams” at SQUAD

At SQUAD, we have a bunch of interesting problems to solve that highly impact and solve customer needs.

Broadly speaking, solver teams at SQUAD are market-oriented teams, composed of cross-disciplinary work like engineering, marketing, machine learning, data analysis etc to “solve” a customer problem.

These teams are responsible not only for feature development but also for user experiments, testing, optimizations, deployment and operational tracking of services, from idea conception to, successful launch, to retirement, all without dependencies on other solver teams.

Advantages of Market-Oriented Teams

  1. Small teams working independently and safely.
  2. Faster execution and delivery of work.
  3. Enables team members to be “E-Shaped” specialists.

 

Enable every team member to be a generalist

Screenshot from 2018-05-05 21-03-30

As we rely on ever increasing number of technologies. We want engineers who can contribute to multiple areas of value stream.

Another major advantage of the market-oriented teams is that, because of their innate nature of being cross-disciplinary and covering entire value stream from development to operations, it provides opportunities for the team members to develop and multi-specialist capabilities, also called as E-Shaped specialists.

When team members start becoming “E-Shaped” experts, business benefits of enabling faster flow are overwhelming.

As the same team member is able to contribute to multiple points in the value stream, the flow of the stream is much smoother and faster than a specialist working on a single point in the stream without having comprehensive knowledge of the entire value stream.

 

Conclusion

We saw how organization architecture dramatically improves our outcomes. Done well, organizational structure plays as an advantage and helps teams move and deliver faster.

At SQUAD, we structure teams as “solver teams”, which are responsible to own the entire value stream of the problem they are solving.

Solver teams are small and can move fast and safely without having dependencies on other solver teams.

That’s all, folks!

 

Devops and The Principle Of Flow

lean-software-development-1-728

In the technology value stream, work typically flows from Development to Operations, steps consisting of functional areas between our business and our customers.

As stated in the lean principles developed by Toyota, we should optimize to get a single-piece fast and smooth flow for our releases.

We increase flow by:

  1. Making work visible,
  2. Reducing batch sizes and intervals of work
  3. Building in the quality, preventing defects from being passed to downstream work centers.

Why a fast flow is needed?

By speeding up the flow through the technology value stream, we reduce the lead time required to fulfill internal and external customer requests, further increasing the quality of the work while making us more agile.

Our goal is to decrease the amount of time required to deploy the changes into production and increase the reliability of those services.

Make our work visible

agile-pm-kanban-board

A significant difference between manufacturing and technology value streams is that our work is invisible.

It’s so easy for work to keep bouncing off between teams and yet have no visual control over it.

To prevent this and make out work more visible, we can use something like a Kanban board. (I prefer Trello for this).

Ideally, our Kanban board will span the entire value stream, defining work as completed only when it reaches the right side of the board.

Work is not done when development completes, but only when our application is running successfully in production.

Limit Work In Progress (WIP)

In technology, our work is far more dynamic than manufacturing. Teams have to satisfy demands of multiple stakeholders. As a result daily work gets dominated by urgent requests for work coming through every communication channel possible.

We can limit multi-tasking by using Kanban board, such as by codifying and enforcing WIP limits for each column on the board.

For example, we may set a WIP limit of three cards of testing. When there are already three cards in the testing column, no new cards can be added.

Using Kanban ensures that work is visible and WIP doesn’t get piled up.

Reduce Batch Sizes

one-piece-flow

Another key component to creating smooth and fast-flow is performing work in small batch sizes. Prior to the lean manufacturing revolution, it was common practice to manufacture work in large batches.

However, large batch sizes result in skyrocketing levels of WIP. According to lean principles, the ideal is a single piece flow, where each batch size is of just one.

Let’s take an example:

Suppose we have ten brochures to mail and mailing each one of them requires 4 steps:

  1. fold the paper
  2. insert the paper into the envelope
  3. seal the envelope
  4. stamp the envelope

Now in the traditional batch processing flow, we will perform each step sequentially for all ten envelopes.

In the lean one-piece flow, only one envelope can be at any given step. In other words, we fold the paper, insert it into the envelope, seal the envelope and stamp it before starting the next one.

How is one-piece flow dramatically better?

In the above example, suppose each step takes 10 seconds. In batch processing, we get our first complete envelope after 310 seconds, but with the one-piece flow we get it just after 40 seconds.

Worst, what if we find that the way we have folded the paper, doesn’t allow the envelope to be sealed. In which case we’ll be in a bigger trouble?

Eliminating hardships and wastes in the value stream

According to the Toyota Production System pioneer Shiego Shingo, a waste is:

The use of any material or resource beyond what the customer required or is willing to pay for

In software development value stream, a waste is anything that causes a delay for the customer, such as activities that can be bypassed without affecting the result.

The following are some common categories of waste that we encounter when implementing lean in software value stream.

  1. Partially done work
  2. Extra processes
  3. Extra features
  4. Task switching
  5. Waiting on QA or testing or acceptance testing
  6. Defects and bugs
  7. Non-standard or manual work

Explaining each of the above point deserves a post of its own. Will do that soon.

Conclusion

Improving flow through the technology value stream is essential to achieving DevOps outcomes. We do this by making work visible, limiting WIP, reducing batch sizes and eliminating wastes from our processes.

All of this will allow us to become more agile and will help in reducing lead times dramatically, and at the same time increasing the quality of releases.

That’s all, folks!

 

7 Tips On Making Your Engineering Workflow Faster

One of the most important thing that I like about pair programming with other awesome engineers is you get to see their workflows. How they get things done? How they get find and make their way around tools, terminal, and editors?

After witnessing and getting awestruck by many such experiences I realized that having an effective workflow can increase your day to productivity many folds.

The next step was to take action, and while doing so, I have compiled few tips to make your engineering flow faster too.

1. Identify waste:

lean-manufacturing-identifes-waste-in-workflows

According to the Toyota Principles:

A waste is any hardship and drudgery that doesn’t align itself with what customer requires and is willing to pay for.

The First step towards developing a ninja workflow is to do an audit and identify waste.

Identifying waste is the first step towards eliminating it.

Ex. My workflow had many wastes like not automating enough, not harnessing the power of IDEs, waiting for the large codebase to reload etc.

2. Get proficient with your IDE:

8rljtzts_400x400

Today, IDEs are super power packed. To develop an awesome workflow, we must learn how to harness the power of our IDEs.

I use PyCharm for my day to day work and have witnessed very considerable efficiency increase once I took steps to become proficient with it.

Features like:

  • Custom live templates
  • Smarter code navigation
  • Custom keyboard shortcuts
  • Debugger
  • Distraction free mode
  • Plugins

All make you a power user. It’s worth investing time given the return on investment.

3. Get familiar with Unix shell commands:

Getting familiar with Unix shell commands is a game changer. First, it makes you look smart and second it helps you automate stuff.

I noticed that I used to do a set of repeated tasks every day when I started working.

  1. Login to office WiFi network
  2. Open browser
  3. Open Slack in tab
  4. Open mail in a tab
  5. Start the IDE
  6. Activate virtual environment
  7. Checkout VCS repository etc.

And now all it looks like is:

~$ startwork

You get the idea. Invest some time and become a Unix power user.

4. Automate your manual workflows:

Developing skills to automate takes time. Whether they are using shell scripts, browser extensions or little code snippets.

Investing time in automating workflows is a high leverage activity.

Ex. You don’t need to manually follow a certain flow of the app to test out something every time, which takes 2-3 mins.

Can we automate this and do this in 2-3 seconds.

How much time will it save if suppose we do this 25-30 times on average?

Ex. Or automating generating search tags for blog posts 😛

5. Prefer keyboard over mouse:

Screenshot from 2018-04-23 01-02-17

We all will agree. Using mouse is slow. Using keyboard over mouse helps decrease the time it takes to perform actions by many folds.

The action that was buried under 3 sub-menus can now be performed just by pressing a key combination.

Personally, for this, I would highly recommend a plugin for JetBrains IDEs called key promoter. It’ll help big time in getting over our mouse addiction.

6. Learn at least one productive high-level language:

Getting things done in a language like Python is way faster than something like say,  Scala.

Learning at least one high-level language allows us to quickly test out ideas and implement them.

No more resistance of writing a 20 line class just to test out an API call to a service.

Move fast and test out ideas on an interactive interpreter instead of compiling code files.

7. Make it easy and fast to run the unit tests associated with just your current changes:

Screenshot from 2018-04-23 01-22-07

Running the entire test suite or even the test suite of the module you touched can be time-consuming. Life is too short for that.

To quickly validate things, make it super easy and fast to run the unit tests of just your current changes.

Personally for this, I use the copy reference feature of PyCharm a lot, and obviously, I use the keyboard shortcuts to do so.

Conclusion

With this post, I wanted to share some ways that I’ve been consciously working on to making my engineering workflow as efficient as possible. Still, there is a big room for improvement, but hopefully, this article would’ve been of a little help at least.

That’s all, folks!

 

Fail Fast: Hone Your Ability to Recover and Respond Quickly

veteran-turned-software-engineer-e1485204975427

It’s close to midnight and you are about to wrap your day off. Suddenly you get a pager-duty to resolve a critical bug that’s failing some of the automated reporting emails.

You go on to check the logs in the log management tool. This is not the ideal time to find out that logs are not getting streamed to the log management service properly.

Next, you decide to check the performance metrics of the email API and you realize that you don’t know the new monitoring tool well enough to get the right metrics quickly.

That sets the theme to why as effective engineers we should fail fast and hone our abilities to recover and respond quickly to failures.

Another post I wrote on failing fast:

https://priyankvex.wordpress.com/2017/07/08/philosophy-behind-the-offensive-programming/

“The best defense against major unexpected failures is to fail often.”

Netflix knows its way around when we talk about creating reliable systems. What engineers at Netflix have done may sound counter-intuitive, but they have made a tool called Chaos Monkey. It randomly kills services in their own infrastructure.

It turns out that this strategy helps Netflix to increase site’s reliability. Failing services during office hours when all the engineers are available, helps them perform recovery drills effectively and prepares them well enough for actual emergencies.

Why is it so important to prepare for failures?

As software engineers, our systems are bound to fail at some point and some releases certainly will have some bugs. In such scenarios, learning and investing time in the ability to recover quickly becomes a high leverage activity. It gives you the confidence to move fast with your product having peace of mind that you are ready to tackle problems if they arise.

Few reasons to invest time in recovering from failures:

  1. Prepares the team to write scripts for success via mock drills.
  2. Surfaces gaping holes in the systems used for monitoring and debugging.
  3. Helps develop better tools and processes to handle emergencies.
  4. Helps control stress and panic in the cases of actual failures.

Write your contingency plans

what-if-800x435

Ask yourself “what if” questions and work-through contingency plans:

  1. What if a critical bug gets deployed with a release?
  2. What if a user raises an urgent ticket?
  3. What if my message broker goes down?
  4. What if my systems face a spike in usage?

This can be applied to even more aspects of software engineering:

  1. What if the due date for a feature gets preponed?
  2. What if a critical team member goes sick?
  3. What if there is a dilemma in the product plan and prioritization?

Conclusion

No matter how careful we are and what we are working on, things will go wrong some of the time.

The better our tools and processes for recovering quickly from failures, and the more we practice using them, the higher our confidence and the lower our stress levels will be. This allows us to move forward much more quickly.

That’s all, folks!

 

Philosophy Behind The Offensive Programming

football-1149952_640

Recently I was listening to a podcast and there was this really smart guy Piwai talking about something that instantly captivated by attention. That was the coining of the term Offensive Programming.

What is offensive programming?

Well, you can find the literature on  Wikipedia and also I am not the best person to explain that. So check that out please. But fundamentally, offensive programming refers to a style of programming that is exact opposite of the more famous counter-part the defensive programming.

Defensive programming refers to coding style which adheres to dealing gracefully with conditions that should not happen.

Offensive programming on the other hand, well just tells you to let the app crash. Don’t try to recover, don’t try to handle the exception, just log the stack trace and crash.

The reason behind this is that in reality the problem can be much bigger and somewhere else in the code, as a side effect of you are getting this error in first place. This forces you to fix the problem at the source and will possibly result in a healthier code base.

When it makes sense to be offensive?

This was my exact concern while I was listening to this podcast. Thankfully, Piwai answered that himself. I also, talked about it with a really smart guy at the office and he also made the same remarks.

So at Square (the company who do payments and author libraries) what they do is, they stick to a defensive style of programming  for interfaces and parts of code that deals with external interfaces and/or user interactions. Basically, something that is not in your control.

But, for the internal interfaces, where the classes you wrote are going to interact with each other, you don’t have to be that paranoid about that. This is where he (Piwai) said you should switch to the offensive approach. You have full control over the classes you wrote, and the expected behaviour is in your control. If it fails to do so, it’s better to just crash and let the problem to be fixed at the source.

That is the exact reason he said at Square, they make very liberal use of assertions in the code. Assertions are not forgiving at all.

Example Please!

I would attempt to point to examples here, one that the Piwai himself talked in very brief and the one that I’ve encountered myself where I thought it made sense.

In this example, say we are handling credit card objects. There is no point to internally validate the credit card object every time you deal with it.

As soon as we get a credit card, we decorate it with a validated credit card. That’s all the defensiveness we had to offer.

Now internally, we go offensive and throw exceptions or assertions every time we encounter an invalidate credit card object.

The code below is not perfect, but can give you an idea.

class ValidatedCreditCard extends CreditCard{

    CreditCard creditCard;
    
    ValidatedCreditCard(CreditCard creditCard){
      // Handling external user interactions defensively.
      try{
        creditCard.validate();
      }
      catch (CreditCardValidationError e) {
        // Handle and try to fix the error
        tryToFixTheCardDetails();
      }
      this.creditCard = creditCard;
    }
}

public static void main(String[] args){

    CreditCard c  = getCreditCardFromUser();
    c = ValidatedCreditCard(c);
    // Time to go offensive
    // ...
    if (c == null){
      throw new CardInvalidException();
    }
}

Another example I can think of is a much simpler one and more relatable.
Suppose, we have a utility function that uploads a file to s3.
It would make sense to follow offensive programming style and just throw an exception if somehow they file or the key reaching the function is None.

def upload_file_to_s3(file, key):
    if file is None or key is None:
        raise TypeError

 

Few more tips from the podcast

1. How to start with offensive programming?

Best way is to start putting assertions in the code, where you think is suitable. Yeah, we’ll experience more crashes and that’s awesome!

Because now we know that we have a problem.

2.  We feel more confident about the code base:

We just know that, this method doesn’t try to handle nulls, thus I can confidently say that it was not null or it would’ve crashed.

3. Do incremental roll outs.

When you ship a code, roll it out like for 1% of users. We’ll have a ton of crash reports, and that’s good! I mean not for the 1% users but they are taking one for the team!

4. Crash at preventable errors and recover from expect-able errors :

Preventable errors are invalid arguments, NPEs etc. Go offensive on these.

Expectable errors are like resource depletion, invalid user inputs etc.

Try to recover from these.

 

Overall, it was nice to listen to a guy who works at a company like Square talking about how they use offensive programming for a healthier code base. And if Square is doing something, we all can learn something from that!