WebTorch: A Load Balancer That Learns

August 4, 2021

In my previous blog post, “How I Stopped Worrying and Embraced Docker Microservices,” I talked about why Microservices are the bee’s knees for scaling Machine Learning in production. A fair amount of time has passed (almost a year ago, whoa), and it proved that building Deep Learning pipelines in production is a more complex, multi-aspect problem. Yes, microservices are a fantastic tool, both for software reuse, distributed systems design, quick failure, and recovery, yadda yadda yadda. But what seems very obvious now is that Machine Learning services are very stateful, and statefulness is a problem for horizontal scaling.

‍

Context switching latency

‍

An easy way to deal with this issue is to understand that ML models are large and thus should not be context switched. If a model is started on instance A, you should try to keep it on instance A as long as possible. Nginx Plus comes with support for sticky sessions, which means that requests can always be load balanced on the same upstream, a super useful feature. That was 30% of the message of my Nginxconf 2017 talk.

‍

The other 70% of my message was urging people to move AWAY from microservices for Machine Learning. In an extreme example, we announced WebTorch, a full-on Deep Learning stack on top of an HTTP server, running as a single program. For your reference, a Deep Learning stack looks like this.
‍

Pipeline required for Deep Learning in production. — What is this data > why is it so dirty > alright, now it’s clean, but my Neural net still doesn’t get it > finally, it gets it!

‍

Now consider the two extremes in implementing this pipeline:

Every stage is a microservice.
The whole thing is one service.

Both seem equally terrible for different reasons, and here I will explain why designing an ML pipeline is a zero-sum problem.

‍

Communication latency

‍

If every stage of the pipeline is a microservice, this introduces a huge communication overhead between microservices. This is because very large data frames which need to be passed between services also need to be:

Serialized
Compressed (+ Encrypted)
Queued
Transferred
Dequeued
Decompressed (+ Decrypted)
Deserialized

What a pain. What a terrible thing to spend cycles on. All of these actions need to be repeated every time the microservice limit is crossed. The horror, the terrible end-to-end performance horror!

‍

In the opposite case, you’re writing a monolith that is hard to maintain. You’re probably using uncomfortable semantics either for writing the HTTP server or the ML part, can’t monitor the in-between stages, etc. Like I said, writing an ML pipeline for production is a zero-sum problem.

‍

An extreme example: All-in-one deep learning

‍

Venn diagram of torch, nginx — Torch and Nginx have one thing in common, the amazing LuaJIT

That’s right. You’ll need to look at your use case and decide where you draw the line. Where does the HTTP server stop, and where does the ML back-end start? If only a tool made this decision easy and allowed you to even go to the extreme case of writing a monolith without sacrificing either HTTP performance (and pretty HTTP server semantics) or ML performance and relevance in the rapidly growing Deep Learning market. Now such a tool is here (in alpha), and it’s called WebTorch.

‍

WebTorch is the freak child of the fastest, most stable HTTP server, nginx, and the fastest, most relevant Deep Learning framework, Torch.

‍

Now, of course, that doesn’t mean WebTorch is either the best performance HTTP server and/or the best-performing Deep Learning framework, but it’s at least worth a look, right? So I ran some benchmarks, loaded the XOR neural network found on the torch training page. Next, I used another popular Lua tool, wrk, to benchmark my server. I’m sending serialized Torch 2D DoubleTensor tensors to my server using POST requests to train. Here are the results:
‍

‍

Huzzah! Over 1000 req/sec on my MacBook Air, with no Cuda support and 2 Intel cores!

So there, plug that into a CUDA machine and see how much performance you squeeze out of that bad baby. I hope I have convinced you that sometimes, mixing two great things CAN lead to something great and that WebTorch is an ambitious and interesting open-source project!

‍

And hopefully, in due time, it will become a fast, production-level server, making it easy for Data Scientists to deploy their models in the cloud (do people still say cloud?) and DevOps people to deploy and scale.

‍

Possible applications of such a tool include, but are not limited to:

‍

Classification of streaming data
Adaptive load balancing
DDoS attack/intrusion detection
Detect and adapt to upstream failures
Train and serve NNs
Use cuDNN, cuNN, and cuTorch inside NGINX
Write GPGPU code on NGINX
Machine learning NGINX plugins
Easily serve GPGPU code
Rapid prototyping Deep Learning solutions

‍

Maybe your own?

‍

To learn about Prove’s identity solutions and how to accelerate revenue while mitigating fraud, schedule a demo today.

Tags:

North America

Keep reading

See all blogs

Blog

Prove's Approach to Evolving Phone Number Fraud using OTPs

Increased "rented phone number fraud," where fraudsters leverage aged phone numbers to bypass SMS 2FA, necessitates advanced verification solutions that analyze phone number reputation and ownership beyond basic tenure checks.

Rich Rezek

March 3, 2025

Blog

Developer Blogs

How Developers Can Balance KYC Compliance and User Experience in Crypto Exchanges

Learn how to balance KYC compliance and user experience in crypto exchanges with advanced tech, user-friendly design, and privacy-focused strategies.

Matt Keib

February 27, 2025

Developer Blogs

Best Practices for Building a Secure Authentication Portal

Learn best practices for building secure authentication portals to protect data and maintain user trust.

Aniket Bhattacharyea

February 25, 2025

Developer Blogs

Let us Prove it
Talk to an expert today

Let's talk

Trusted by 1,000+ leading companies to reduce fraud and improve consumer experiences, Prove is the world’s most accurate identity verification and authentication platform.