Multi-Core Processors Be A Foundation For Cloud-Native Scale-out?

Despite recent advancements and improved parallelism in multi-core CPU performance, there is still a big challenge to be solved. Even with these shifts, the performance impact with respect to Linux applications remains far below the curve of linearity as the core count increases. Typical Linux applications can be expected to see a 1.5X improvement in performance for a 2-core CPU, but the scale quickly plateaus after that, with 4 cores yielding an improvement of around 2.5X. The performance further degrades as core counts rise.

At the heart of this rapid plateauing lies the fact that both the process (or processes) and related I/O devices require core resources. Unfortunately, the likelihood that they both execute on the same core decreases as the core count goes up—resulting in excessive inter-core communications, data copying and other operating system overheads.

From a hardware perspective, each processor core (including its associated L1 and L2 caches) is largely an independent computing unit, fully capable of executing compute tasks in parallel. That is not the issue; instead, the root cause is in how applications are architected to facilitate parallel computing and how the underlying operating system gains insight into application processes for tasks that can be executed in parallel without incurring a significant amount of shared-resource-locking and data-copying overheads. In a multi-core environment, this insight is critical in order to reduce or eliminate inter-core overheads so that the effective performance of each core is not overwhelmed.

That all sounds good, but what about all of the existing Linux applications that have been written? Can these applications somehow leverage the available CPU cores as parallel resources without any application changes?  Making that possible would seem like magic!

To find a solution, it pays to consider the experience of modern day, large scale cloud-native web companies such as Facebook and Uber. They have gone through some noticeable architectural shifts, partly due to the pressing need to develop applications that can perform and scale out horizontally, agnostic of hypervisors, containers or operating systems. The result has been the rise of new, cloud-native applications that feature a clear separation of control and data-path functions, and the ability to spawn new worker processes on demand. Moreover, each worker process is often designed to run to completion in order to simplify complex, inter-process, inter-thread locking, and to achieve a better correlation between how applications are expected to execute in parallel and how the underlying infrastructure can adapt best to application needs in real time.

Remember how computer instructions were growing in complexity until Reduced Instruction Set Computers (RISC) came to the rescue?  It feels like there is an incarnation of that simplicity mindset in applications today. Parallel computers are not new; they have long existed, but these were purpose-built proprietary computers with proprietary architecture and special MPI message passing protocols that are not designed for the masses.  We don’t want that.

There are two additional key questions that we need to ponder.

First, can an application-aware parallel computing framework be constructed that is software defined, with the underlying servers, storage and networks already moving toward that?

Second, are today’s commodity multi-core processors and I/O virtualization framework sufficiently parallel from a hardware and infrastructure perspective that a software defined application aware parallel computing framework is possible?

That is a multi-billion-dollar question.


Blog post by: Cheng Wu