Loading...

Why Do You Need Competitive Programming and Mathematics in Big Organizations?

By Ahmad Alhusein
Last updated July 21, 2025
Why Do You Need Competitive Programming and Mathematics in Big Organizations?
ProgrammingMathematicsSystems

You've probably heard statements like “Mathematics isn't important for backend engineers” or even worse, Competitive programming has no real-world applications.”

These assumptions aren't entirely baseless. In fact, they might hold true in early-stage startups, where user traffic is low, and rapid feature delivery is prioritized over system scalability or long-term maintainability. However, once a startup begins to scale and serve large numbers of users, these assumptions fall apart. At that point, every inefficient line of code becomes technical debt — and you'll regret not thinking in terms of performance and scalability from the start.

But let's directly address the question: Why is it important to have a strong background in mathematics and competitive programming in large-scale enterprises (and fast-growing startups)?

(I use mathematician/competitive programmer interchangeably here, as these two skill sets are often strongly correlated.)

A Real Example from a Large Tech Company

Let's walk through a real project I worked on at a large tech company — an observability platform used to monitor application performance by collecting traces, logs, and metrics. We'll focus on metrics.

What are metrics?

A metric is a labeled value representing a system state. For example:

cpu_usage { env="production", host="example.com", group="developers-group" }

Internally, this is serialized into a JSON-like schema and stored in VictoriaMetrics, a scalable time-series database. Our platform collects, processes, and stores multiple terabytes of data per minute from thousands of hosts.

Sounds straightforward? It isn't.

To handle this scale, we needed a distributed architecture — over 200 independent instances running in parallel. These instances couldn't share a central database or coordinator, because doing so would introduce a single point of failure — unacceptable for an observability platform with an SLA of 99.9% uptime.

The Problem: Cardinality Explosion

One major challenge was cardinality explosion — when a client sends too many unique time series in a short span. This can result from configuration errors or internal incidents and is extremely harmful to VictoriaMetrics' performance.

So, we had to enforce limits — specifically, to cap the number of unique time series sent by each client, application, and environment.

Why unique time series? Because VictoriaMetrics handles duplicate data efficiently, and we sometimes intentionally duplicate scraping for high-priority hosts. The concern is uniqueness, not volume.

What is Cardinality? > Cardinality is the number of unique values in a relational table column relative to the total number of rows in the table.

Thus, the problem was:

Given N clients, each with M applications, and a stream of labeled metric strings (some of which are duplicates), how do we efficiently count the number of unique metric series per client-application pair?

Naive Approach and Its Pitfalls

An intuitive solution is to use hash sets per client-application pair and merge them using Kafka or RabbitMQ. That might work in a small startup with a few hundred clients.

However, at our scale:

  • K = 200+ instances
  • N = thousands of clients
  • M = thousands of applications
  • Q = massive volume of time series

The time complexity of such a solution approaches O(K × N × M × Q). Worse, in case of a cardinality spike, the overhead increases dramatically.

Adding a central orchestrator could mitigate this — but it would become a bottleneck and potential failure point, which we couldn't afford.

A Mathematical (and Smarter) Solution

After weeks of research, we realized we didn't need the exact count of unique series — just an estimate to detect overloads.

Here's how we approached it.

Assume we have a set of randomly distributed integers (or hashed strings). Let's observe the binary representation of these numbers. If you randomly pick a number:

  • The chance its first bit is 0 is $1/2$
  • The chance its first two bits are 00 is $1/4$
  • And so on…

If we record the maximum number of leading zeros (p) among all values, then the number of distinct items is roughly $2^p$.

This is the core idea behind a probabilistic algorithm called HyperLogLog. It estimates cardinality using statistical sampling, allowing us to track uniqueness with minimal memory.

To improve accuracy, we divide the data into buckets, say $2^{16}$ buckets (using the first 16 bits of the hash). Each bucket independently estimates cardinality for its subset, and we combine the results.

However, this only works if our input is uniformly distributed.

But our data consist of strings, not random integers.

To solve that, we used non-polynomial hash functions like xxHash and Murmur Hash. Polynomial hashes tend to correlate bits with characters — which leads to skew and estimation errors.

Real-World Optimization: Two-Pointer Technique After deploying our estimation system, we hit a new bottleneck: high memory usage.

Whenever a client exceeded their quota, we deleted their excess time series. The initial implementation created a new array and copied all surviving entries — wasting memory and increasing GC pressure.

To optimize this, we used the two-pointer technique — common in competitive programming:

  • Iterate left to right.
  • Track the "last safe" index.
  • Swap deletable items forward.
  • Trim the list in-place.

This saved gigabytes of memory and drastically improved performance.

Final Thoughts

There are countless problems in large-scale systems that require algorithmic thinking and mathematical modeling. Our team, which includes many ICPC finalists, consistently applied competitive programming techniques to reduce latency, memory usage, and cost.

Without this mindset, we would have been stuck — unable to implement a scalable, robust rate-limiting mechanism.

Conclusion: Competitive programming and mathematics are not just academic exercises — they are essential tools in the engineering toolkit, especially when building systems at scale.

References

Further Reading & Useful Links