Extreme Fast Delivery Without Creating Tech Debt: An Agile Case-Study

Itay Waxman
Riskified Tech
Published in
6 min readMay 29, 2022

--

Does your engineering team ever receive large feature requests that MUST happen fast? You’re already envisioning how you’ll need to drop all your team’s tech standards to comply with the business requirements on time. Wouldn’t it be lovely to have tools in your team’s arsenal to offer a plan to deliver fast, without compromising on a high-quality technical solution?

In this blog post, I’ll showcase how my entire development team at Riskified was able to work in parallel on the same big feature, while minimizing the friction of stepping on each other’s toes — resulting in a pretty impressive fast delivery and well-architectured solution.

Showcase feature

My team’s mission is to protect eCommerce consumers from account takeover attacks. To detect and block such attacks, our system employs machine learning to analyze login events in real-time. To keep our models accurate we train them periodically, meaning we need a new up-to-date training dataset. So, we have an internal web application that allows our analysts to analyze a login attempt, and tag whether it was part of an attack or not. Many logins are tagged and gathered into a training dataset.

Recently we observed a new kind of attack, which involves high-velocity bots that our model wasn’t able to identify accurately enough. We didn’t want to leave our customers exposed, and wanted to very quickly upgrade our system capabilities to protect them from this new breed of attacks. To do so, we needed to retrain our models, requiring a new training dataset that includes labeled login examples of the new attack.

Unfortunately, our tagging tool didn’t display enough data points to allow our analysts to determine which login was a bot attack. A key attribute of this attack was that many attack login attempts had shared elements that link all of them together. So we wanted to enhance our tagging tool to display all the linked login attempts for a given login.

Planning and execution

With urgency in mind, we needed an execution plan that would support fast delivery.
These are the two main considerations we took into account:

  • Supporting massive amounts of data
    We have millions upon millions of records of login events in our system — and it’s only going to grow in scale. Given a login, we need to provide all the linked logins by predefined elements (like email or device). After asking the right questions, we understood what kind of data store, data transformations and refresh rate we were aiming for
  • Avoiding creating technical shortcuts
    The internal web application will be extended substantially in the future, and eventually will even be available for external users. Therefore, we want to avoid creating technical shortcuts as part of the solution

We recognized that in addition to the new web application section (visualize the linked logins), we’d need to introduce 2 new components — a large-scale data batch processing job and a new API service that will serve the linking results.

To meet the requirements within the timeline, we used the following methodologies:

All hands on deck

We aimed for all team members to take part in the development. At the time, my team consisted of 5 software engineers. Thinking you can simply assign all your team members to a task and complete it faster doesn’t work in reality — as the phrase goes “9 women can’t give birth to a baby in a month“. To overcome the downsides, we’ve put in place some concepts which are discussed in the following sections.

Identifying interfaces

We broke down the entire feature into small stories and discovered which story is dependent on which. For every component integration, we agreed on a contract and created a task to set up a mock API. Usually we prefer a developer to take a story end-to-end, but to allow parallel development, we’ll create a mock API to enable integrating the API before it’s implemented.

This way, one developer can work on the API consumer part (as example, application UI side) and another developer can implement the API itself (as example, fetch data from the DB and map it to the API response).

Visualize the dependency graph

After we broke the feature into small stories and mapped all the dependencies, we created a visualization of the development path. Basically a one-direction graph chart — this was the result (simplified):

This visualization’s goal is to allow us to understand how many developers can work in parallel and on what, and what the bottleneck tasks are. From the above graph, we identified that the beginning of development can be paralleled between 3 developers, and the maximum capacity for this feature would be 5 developers working in parallel.

Within the Sprint, all developers took part and self-assigned tickets by priority that resulted in this:

Team Communication

Since many developers worked on this feature, close communication was crucial. Developers were aware of who was working on close-proximity tasks to their own, and verified the contract between their tasks still stood. Once a mock API pull request was created, both parties performed a code review. Their approval also meant that they were aware that the integration between their task and their colleague would be conducted.

Benefits

“Extreme” fast delivery

Usually, large features with many dependent parts may easily cause idle time and a context switch for developers — resulting in later delivery than could’ve been. We were able to deliver the feature within one Sprint (2 weeks), which was objectively impressive for stakeholders. Analysts could begin using it to produce training datasets in a matter of time, which would allow fast model retraining and improving our detection of account takeover attacks.

Bus factor = team size

In short, the “bus factor” is the minimum number of team members that have to suddenly disappear from a project, before the project stalls due to lack of knowledgeable or incompetent personnel. In our case, everyone on the team was involved in the planning and execution. More than that, they were hyper-aware of peers’ tasks, since they relied on them for their own. It resulted in everyone being knowledgeable on the feature and easily able to conduct future work around it.

Another tool in the arsenal

The project management triangle can be summarized as “Good, fast, cheap. Choose two”. In our scenario, the quality and delivery time were tight and we concentrated all our developer resources, which is ‘expensive’. Based on the constraints, the team demonstrated “extreme” fast delivery techniques that can be reused when business circumstances require it.

Enhanced collaboration

This kind of hyper teamwork really improved the internal communication and trust between team members. The experience a team gains by working that closely together, will allow future execution to be sharper. I would argue that a team’s maturity can be accelerated by taking the approach discussed above to increase team collaboration.

Some caution

Or, why you shouldn’t always try to put all hands on a feature — fastest delivery is usually not the most efficient of resources.
It could have easily been argued that we invested more developer days where alternatively a later time to market could have been more efficient. One or two developers could have taken it “end to end” with fewer accumulated developer days. Working with so many dependencies can have developers blocked frequently, extra checkpoint meetings are required and integration between different parts can go wrong.

Wrapping up

In this showcase, the business circumstances required the earliest time to market, while still building for future extensions. Taking technical shortcuts was not an option, so we chose to use as much of the entire team’s development capacity as we could, to deliver fast. We used the methods above to make it as frictionless as possible and to maintain a good development experience. The result was very satisfying, for both the developers and the stakeholders — we managed to deliver fast and at high quality.

This blog post is a retrospective on something we worked on in the past, since then as a team we are very familiar with working closely together on projects, we favor fast delivery over separating our resources to work on different projects in parallel.

--

--