Professor: Samer Al-Kiswany | Term: Winter 2026

Lecture 1

Definition

A distributed system is a collection of independent computers that appear to the users of the system as a single computer.

Distributed software systems should provide easy-to-use abstractions, hide complexities, handle failures, efficiently use resources, and leverage emerging technologies.

Example

  • A network of workstations allocated to users
  • A pool of processors in the machine room allocated dynamically
  • A single file system (all users access files with the same path name)
  • User command executed in the best place (user workstation, a workstation belonging to someone else, or on an unassigned processor in the machine room)

Question

Why distributed?

  • Economics: Microprocessors offer a better price/performance than mainframes
  • Speed: A distributed system may have more total computing power than a mainframe
  • Inherent distribution: Some applications involve spatially separated machines
  • Reliability: If one machine crashes, the system as a whole can still survive
  • Incremental Growth: Computing power can be added in small increments

Primary features

  • Multiple computers
    • Concurrent execution
    • Independent operation and failures
  • Communications
    • Ability to communicate
    • No tight synchronization (no global clock)
  • “Virtual” computer
    • Transparency

A bunch of computers that work together, but appear to be as one.

Challenges

  1. Heterogeneity

Distributed systems require communication and coordination across a variety of:

  • Networks
  • Hardware
  • OS
  • Programming Languages
  • Implementations
  1. Scalability

Question

Can the system behave properly if the number of users and components increases?

  • Scale-up: Increase the number of “users” while keeping the system performance unaffected
  • Speed-up: Improvement in the system performance as system resources are increased

Tradeoffs need to be made (i.e availability and consistency CS451)

Impediments:

  • Centralized data
    • A single file
  • Centralized services
    • A single server
  • Centralized algorithms
    • Algorithms that “know it all”

Scalability is not black and white, there are shades of scalability. Some things that cannot be done at scale.

  1. Failure handling
  • Partial failures
    • Can non-failed components continue operation?
    • Can the failed components easily recover?
  • Failure detection
  • Failure masking
  • Failure tolerance
  • Recovery
  • An important technique is replication. This is a hard problem.
  1. Concurrency

Multiple clients sharing a resource how do you maintain integrity of the resource and proper operation (without interference) of these clients?

Managing failures in concurrent systems is difficult.

  1. Transparency

Transparency is not easy to maintain. There are many types of transparency:

TransparencyDescription
AccessLocal and remove resources are accessed using identical operations
LocationHide where an object is located
RelocationHide that an object may be moved to another location while in use
MigrationHide that an object may move to another location
ReplicationHide that an object is replicated
ConcurrencyHide that an object may be shared by sevreal independent users
FailureHide the failure and recovery of an object

Aiming at full distribution transparency may be too much. There are communication latencies that cannot be hidden.

Completely hiding failures of networks and nodes is impossible. You cannot distinguish a slow computer from a failing one. You can never be sure a server performed an operation before a crash.

Full transparency will cost performance. Keeping replicas up-to-date with the master takes time.

  1. Security

Confidentiality: Information is disclosed only to authorized parties.

Integrity: Ensure that alterations to assets of a system can be made only in an authorized way.

Authentication: How do we verify the correctness of a claimed identity?

Authorization: Does an identified entity have proper access rights?

Developing distributed systems: Pitfalls

Many distributed systems are needlessly complex, caused by mistakes that required patching later on. Many false assumptions are made.

False assumptions include:

Lecture 2

I skipped this lecture, it was a CS456 crash course.

Lecture 3

Communication Middleware

A distributed system organized as middleware.

The middleware layer extends over multiple machines.

Communication Alternatives:

Raw message passing

  • TCP/IP: Reliable byte stream
  • UDP/IP: Unreliable, packet oriented, connectionless

Problem: Low level not easy to use.

OS Abstractions: Distributed Shared Memory

On a page fault, get the remote page.

  • Pros: Easy to program
  • Cons: High overhead, hard to handle failures

Remove Procedure Call RPC (Xerox RPC)

Main idea: Call remote procedures as if they are local ones. The key advantage is that it is general, easy to use, and efficient.

Uses the well-known procedure call semantics.

The caller makes a procedure call and then waits. If it is a local procedure call, then it is handled normally. If it is a remote procedure, then it is handled as a remote procedure call.

Caller semantics is blocked send (send the request, stop, wait for the result, continue after it arrives). Callee semantics is blocked receive to get the parameters and a non-blocked send at the end to transmit results.

RPC architecture

flowchart TD

a[Caller Procedure]
b[Caller Stub]
c[Comm]
d[Comm]
e[Callee stub]
f[Callee procedure]


subgraph Caller
	a --> b
	b --> c
	c --> b
	b --> a
end
c -- Call packet(s) ---> d
subgraph Callee
	d --> e
	e --> f
	f --> e
	e --> d
end
d -- Result packet(s) ---> c
Interface{
	// has enough information for compile time
	// checking, and generating proper calling sequence
	int func1(int arg)
	int func2(char * buffer, int size)
}
client.c
// Client stub for func1
int func1(int arg) {
	// Create req
	// Pack fid, arg, etc. 
	// Send req
	// Wait for reply
	// Unpack results
	// Return result
}
server.c
// Server stub for func1
int func1 {
	// Unpack request
	// Find the requested server function
	// Call the function with arg in the request
	// Pack results
	// Send results
}

Question

How about for func2?

Problems

  • * buffer points into different address space (solution: copy/restore if needed)
  • Can be input or output (solution: add an argument type (in | out | both)
  • Complex data structures

Question

How does the client know where to send the RPC request?

Binding

Server side:

  • Load the function in memory
  • Populate export table (fid, fname, call_back function). Each function is assigned a unique id (fid) every time the function is exported
  • Export interface

In a register, the server ip and function names supported is stored.

Client Side:

The client queries the registry to see who provides the function they want to call (fname).

The registry will respond with the appropriate server ip.

The client will then query the sever to get the binding info for the function.

The server will respond with the fname, fid, and the client stores this info in memory.

At runtime, the client already knows the server address and function id. The client sends the function id and arguments to the server. The server looks up the fid in the export table, runs the function, and sends back the result.

Runtime choices:

TCP/IP

  • High overhead on server, many connections and state
  • High latency (setting/closing connections)

UDP/IP

  • Unreliable

Fault Tolerance

Lost Request

When the function is called, client sends the fid and arguments to the server. Request was lost in the network on transit.

The client waits, times out, and the client retries. Request hits the server and returns the result back to the client. Retrying was safe because the function ran once.

Lost Reply

Client sends the fid and arguments to the server. Server receives the payload, runs the function, and sends the result back to the client. The reply back to the client is lost.

Client times out retries sending the function id and arguments to the server. The server runs the function and returns the result back to the client.

Problem

The function was executed twice.

Call Semantics

At least once: the function is executed at least once.

At most once: the function is executed once or not at all.

Neither one is perfect: exactly-once is desired, but has high overhead.

Xerox RPC implements, at most once semantic: If success is returned exactly once: it is guaranteed that it was executed exactly once. On failure, then it was called at once.

A client issues one RPC at a time.

Xerox at most once semantics:

Key idea: Every RPC request is uniquely identifiable.

The client sends the client ip, process id, sequence number (strictly increasing number), fid, arguments. This tuple uniquely identifies this call.

Server receives this, checks export table, check previous calls table in memory, runs function, returns result.

The previous call table checks if it has seen this (client_ip, process_id, seq_num) tuple before.

We return the client ip, process id, sequence number and result.

Fault Tolerance: Lost Reply

Client calls function, sends client ip, process id, sequence number, fid, arguments.

Server runs the function, lost in transit on the way back to the client.

The client times out and sends the request again (same client ip, process id, sequence number, fid, arguments). The server checks the previous_calls table and notices that this is a duplicate. It returns an ACK.

Fault Tolerance: Server Crash

Client calls function, sends client ip, process id, sequence number, fid, arguments. Server runs function, updates previous calls table. Then a reboot happens, server does not send anything back.

Client times out and sends the exact same request again. The server checks the previous calls, but there’s nothing in it, so it runs the function again and sends the result back to the client.

Solution to this is to rebind on server restart with new fid.

Client calls function, sends client ip, process id, sequence number, fid, arguments. Server runs function (export table has fid = 5). Then reboot hits. Rebind functions with a different fid. So now the same function has a fid = 11 in the export table.

The client times out and sends the same request as before (with fid = 5). The fid doesn’t match, so the server responds with an error that there is no such function id.

Upon receiving this error, the client should query the registry again for the new bindings.

Fault Tolerance: Client failure with repeated sequence numbers

Client calls function, sends client ip, process id, sequence number (10), fid, arguments. Server checks previous call table, populates it with the client ip, process id and sequence number. It runs the function, and returns the result.

Reboot after the client receives this.

We then have a later call of a different function, client sends client ip, process id, sequence number (1), fid, arguments. Server checks previous calls table, notices it is a duplicate (since sequence numbers are meant to be strictly increasing), and returns ack (without executing the function!). The server doesn’t know if the client rebooted and started fresh or if it is a delayed duplicate packet.

The solution is to add clock-based conversation id, to every call. <conversation_id, seq #> is strictly increasing.

Lecture 4

File system is the Operating Systems interface to storage.

Virtual file system is an abstraction layer in the Unix kernel that acts as an interface for the file system. Provides an API for the file system.

Specific file systems are assigned subtrees.

Application wants to read some data. Send call to VFS, VFS gets data from the right local file system (based on path).

Application wants to write some data. Send call to VFS, writes to file system, file system returns with success, though data isn’t written to disk. fsync is a blocking call that writes the data disk.

We don’t write to disk every time for performance reasons, must invoke fsync.

Hash table and Data table (Hash table stores pointer to the data in Data table).

If we want to update a key, we perform:

write(ptr) then write(data). This is risky if we crash.

So we can do

write(data)
write(ptr)
fsync(data)
fsync(ptr)

This is still not the right way to do it. There is no guarantee of ordering.

Correct ordering is

write(data)
fsync(data)
write(ptr)
fsync(ptr)

No guarantee of ordering for performance reasons. i.e. head is over a location on disk but the write operation for that location is not until later. We can reorder the operations to optimize speed.

Distributed file system emulates non-distributed file system behaviour on a physically distributed set of files, usually within an intranet.

We require transparency, concurrent access, file replication, and security.

Sun Network File System

On client computer, virtual file system goes remote to NFS client, NFS client uses NFS protocol to contact NFS server on server computer (uses RPC). NFS server sits in the UNIX kernel (not necessarily, just optimization), sends request to VFS, VFS gets data from UNIX file system.

flowchart TD
a[VFS]
b[NFS client]
c[NFS server]
d[VFS]
e[UNIX file system]
f[Application Program]
g[Unix file system]

subgraph Client computer
f -- UNIX system calls --> a
subgraph UNIX kernel
a --> b 
a --> g
end
g --> ida[(Disk)]
end
b -- NFS Protocol ---> c
subgraph Server Computer
subgraph UNIX kernel
c --> d
d --> e
end
e --> id[(Disk)]
end

NFS V.3 Design Principles

  • Access transparency: No distinction between local and remote files.
  • Stateless design: Server does not maintain any state about clients. Easier implementation and easier to handle crashes. File operations are idempotent simplifies fault tolerance and implementation.

All NFS operations take a file handle.

Idempotent Example

Client wants to write to a file, performs: fd = open(path) write(fd, buf, size)

Server will write to file, send ack

Question

What if ack is lost?

Client will timeout, will run the same write again and send to server write(fd, buf, size)

Server will write again, send ack.

Error

This is not idempotent.

We include an offset in write(fd, buf, size, offset). Now this is idempotent.

Sketchy idempotence

Client wants to unlink file1.

Server unlinks the file, returns an ack. Ack is lost in transport.

Client times out and sends unlink file1 back to server.

Server is going to return an error since the file doesn’t exist anymore.

We account for this in NFS client side design. If we delete a file and it returns an error, then we have successfully deleted it.

File handler uniquely identifies file on the server side.

flowchart TD
a[VFS]
b[NFS server]
c[FS1]
d[FS2]

a --> b
a --> c[(FS1)]
a --> d[(FS2)]

We use i-node to uniquely identify files and to determine which file system has our files.

Warning

Client 1 writes to file with an inode 10. Server performs this action.

Client 2 unlinks file with inode 10 (delete). Client 2 creates a new file an i-node is generated to be assigned to the file. i-nodes are reused, this new file can be assigned i-node 10.

Client 1 writes to inode 10 (thinking it is the own file), but it is corrupting the new file.

To solve this we store an additional parameter, the i-node generation number. Store the i-node.i-node generation number in the function call.

It would first be i-node.0 from Client 1, and when Client 2 creates the new file and the i-node is the same, it would be i-node.1. When Client 1 tries to write to i-node.0, the server will respond with an error, the file doesn’t exist anymore. This solution is necessary since there exist a limited amount of i-node values.

To get the file handle of the root directory on a remote server, we use mount.

Client requests for mount path, server checks if path is exported, checks user permissions, returns file handle of exported root directory.

Lookup:

NFS does lookup iteratively one step at a time .

Client sends lookup for (people_fh, "bob")
Server responds with bob_fh
Client sends lookup for (bob_fh, "hw")
Server responds with hw_fh
Client sends lookup for (hw_fh, "a3")
Server responds with a3_fh

This is slow. Recursive would be faster but it is not implemented for the following reason.

Nested Mounting

Server A imports a directory from Server B
Client imports directory from Server A. It wants that directory from Server B. 
Client sends request to Server A for the fh that is from Server B. Server A realizes it is from Server B, requests it, receives it, and sends it to Client. 

This is dangerous. Server B may not want the client to access the directory, but when Server A is requesting the directory on behalf of the Client, Server B does not know this. The client should request to Server B itself.

The client needs to explicitly import subdirectory from server B.

To support all of this we need to do iterative lookup (recursive won’t work). We can still optimize by chopping into segments.

Read operation: Client sends read request for file handle. Server performs read and returns data blocks.

Write operation: Client writes to file handle. Server does persistent write back ack to client. Write through caching.

NFS requests transmitted via Remote Procedure Calls (RPCs). Client sends authentication information, checked against access permission in file attributes.

Any client may address RPC requests to server providing another client’s identification information. Introduction of security mechanisms in NFS v4.

Semantics of File Sharing

On a single processor, when a read follows a write, the value returned by the read is the value just written. In a distributed system with caching, obsolete values may be returned.

MethodComment
UNIX semanticsEach operation on a file is instantly visible to all processes
Session semantics (NFS)No changes are visible to other processes until the file is closed
Immutable filesNo updates are possible
TransactionAll changes occur atomically