Ravi Pandya

Ravi Pandya
ravi@iecommerce.com
www.iecommerce.com
+1 425 417 4180
vcard

syndicate this site

Ravi Pandya software | nanotechnology | economics

ARCHIVES

2007 11 10

2004 10 09 08 07 06

2003 04 02 01

2002 12 11 10 09 08

2001 11

ABOUT ME

Ravi Pandya
Architect
Cloud Computing Futures
Microsoft
ravip at microsoft.com

03- Microsoft

00-02 Covalent

97-00 EverythingOffice

96-97 Jango

93-96 NetManage

89-93 Xanadu

88-89 Hypercube

84,85 Xerox PARC

83-89 University of Toronto, Math

86-87 George Brown College, Dance

95- Foresight Institute

97- Institute for Molecular Manufacturing

DISCLAIMER

The opinions expressed here are purely my own, and do not reflect the policy of my employer.

Blog Information Profile for rpandya

Fri 23 Nov 2007

K42 and Tornado

My colleague Eric Northup has mentioned these a few times, and I'm glad I looked them up. The Tornado OS (from my alma mater, U of T) and its successor K42 (at IBM Research) use a fine-grained object-oriented approach to all operating system structures (processes, memory regions, etc.), with built-in clustering for replicated instances across processors. This reduces lock contention and increases cache locality by operating on the per-processor instance as much as possible. Since objects are generally expected to be local, it can optimize for this case, and track cross-processor operations as a special case. There are some policy choices (e.g. maintaining replica tables for all processors) that would probably need to be adapted for manycore.

The scalability architecture is best described in this paper. The memory manager was key, e.g. for locality-aware allocation, padding to cache line size to avoid false sharing, deferring deletion until quiescence to avoid existence locks, etc. An insight as the basic Tornado model was applied to real workloads was that creation-time object specialization isn't sufficient, instead it is better to for example start with an unshared implementation and then upgrade to shared implementation when multiple processes share an object. They were able to improve their 24-proc scalability from "terrible" to pretty good in 2 weeks of work because of good OO discipline and tracing infrastructure.

Overall, I found it striking how the scalability architecture mirrored that for distributed systems - state partitioning, replication, dynamic upgrade, etc. I had expected this from general principles, but it was valuable to see it confirmed in practice with significant workloads. The scalability graphs are impressively linear. Security isn't mentioned, but I expect that the same OO design that gives the OS good modularity and scalability could be applied to give it good capability discipline as well.

12:04 #