functions are functions that the interpreter exposes to user code but are implemented in the host language.
Lisp-1 refer to languages like Scheme that put functions and variables in the same namespace. Lisp-2 refers to languages like Common Lisp that partition them.
To interpret a return statement in a recursive tree-walk interpreter, we use an exception to unwind the interpreter past the visit methods of all the containing statements back to the code that began executing the body.
In a lexically scoped language, variable declarations actually split blocks into two separate scopes, the scope before the variable is declared and the one after, which introduces the new variable.
In the Lox tree-walk interpreter, environments are mutable hash tables which causes a bug in closures. We could use persistent environments to resolve the issue. This splits up environments during variable declarations, but semantic analysis is used to prevent changing any existing code.
The current Lox interpreter resolves a variable by tracking down which declaration it refers to each and every time the variable expression is evaluated. A better solution is to resolve each variable once. Write a chunk of code that inspects the user’s program, finds every variable mentioned, and figures out which declaration each refers to. This process is called semantic analysis.
A parser only tells if a program is grammatically correct (syntactic analysis). Semantic analysis goes further and starts to figure out what pieces of the program actually mean.
In Lox’s case, the semantic analysis will resolve variable bindings.
In the variable resolution pass, there are no side effects, and no control flow. Loops are visited only once, both branches in if statements are visited. Logic operators are not short-circuited.
When resolving functions during semantic analysis, we define the name eagerly, before resolving the function’s body. This lets a function recursively refer to itself inside its own body.
Every time the resolver visits a variable, it tells the interpreter how many *scopes* there are between the current scope and the scope where the variable is defined. At runtime this corresponds exactly to the number of *environments* between the current one and the enclosing one where the interpreter can find the variable’s value. Some resolver hand that number to the interpreter via storing it in the syntax tree. That’s a fine approach but Lox will take another common approach and store it off to the side in a map called locals: Map(Expr, Integer) within the Interpreter class.
The choice of how many different analyses to lump into a single pass is difficult. Many small isolated passes, each with their own responsibility, are simpler to implement and maintain. However, there is a runtime cost to traversing the syntax tree itself, so bundling multiple analyses into a single pass is usually faster.
There is a subtle difference between “properties” and “fieldS”. Fields are named bits of state stored directly in an instance. Properties are the named things that a get expression may return. Every field is a property, but not every property is a field.
In Lox, in class instances we look for fields before methods. That means semantically, fields shadow methods.
GFS introduces some core patterns when designing distributed databases. It decouples data flow from control flow to reduce load on master. It uses large chunk sizes to, again, reduce communication on master. It sequences writes through a primary replica.
There are two main ways replication can be implemented in a cluster. State transfer and replicated state machine.
State transfer is when the primary replica executes all relevant operations and regularly shops state to backups. This could include CPU changes, memory, and I/O device changes.
Replicated state machines are fed the same input operations in the same order, so their outputs will be the same. The caveat is that the input operations should be deterministic.
There are two different kinds of state to replicate. Application-level and machine-level.
Application-level is like replicating a database’s tables. GFS for example replicates application-level state. In this case, the primary will only send high level operations like db queries to its backups.
Machine-level state replicates things like contents of RAM, registers, and machine events like interrupts and DMA. VMware FT replicates machine-level state.
Raft is decomposed into three separate parts. Leader election, log replication, and safety.
Systems that rely on the overlap of majority set set of servers for operation are referred to as quorum systems.
External consistency is a stronger guarantee than serializability. External consistency implies that when a transaction has committed, all subsequent transactions will see the effect of that committed transaction in the DB, that’s not the same guarantee provided by serializability.
Linearizability on the other hand, does not say anything about the behaviour of transactions. It is a consistency level that deals more with the behaviour of a single object. It is a recency guarantee for a single object. When one committed transaction has updated the value of an object, all other reads to that object must see the new value.
In plain English, under linearizability, writes should appear to be instantaneous. Imprecisely, once a write completes, all later reads (where “later” is defined by wall-clock start time) should return the value of that write or the value of a later write. Once a read returns a particular value, all later reads should return that value or the value of a later write.
The C for consistency’s in CAP and ACID are different.
Zookeeper can scale linearly by adding more nodes to the cluster because it relaxes guarantees. Writes are still linearizable, but not reads. So we can read from any replica in the cluster. Raft, on the other hand will actually degrade since all reads must still go through the leader. And now, the leader must store more information about the new servers.
Note that Zookeeper does provide optional support for linearizable reads, which comes with a performance cost.
In essence, Bitcoin combines the idea of using computational puzzles to regulate creation of new currency units with the idea of secure time stamping to record a ledger of transactions and prevent double spending.
Taking out the bitcoin of the blockchain just turns it into a distributed database. If there is no mining incentive with puzzles it cannot be decentralized.
Many people claim that the strongest sign of a token’s value is whether it forms a necessary component of the application to which it’s connected to. More clearly, is the token a necessary crypto economic mechanism in the application?
Nearly all deep learning is based on a single type of model, the neural network.
Because of the “all-or-none” character of nervous activity, neural events and the relations among them can be treated by means of propositional logic, and hence modelled.
A trained model can be treated like a regular computer program.
Machine learning: the training of programs developed by allowing a computer to learn from its experience, rather than through manually coding the individual steps.
Convolutional neural networks (CNNs) work well for computer vision tasks.
What makes deep learning distinctive from other sources of machine learning is the architecture. Deep learning uses architectures based on neural networks.
Computer vision datasets are normally structured in such a way that the label for an image is part of the filename, or path, most commonly the parent folder name.
Classification and regression are the two main models we’ll investigate in fast.ai
***Overfitting is the single most important and challenging issue when training for all machine learning practitioners.***
You should only use over-fitting avoidance techniques *once* you confirmed overfitting is actually occurring. Using those techniques prematurely could end up with models which are less accurate.
Academics love talking about which architectures to use, but usually there are standard architectures that work most of the time.
Models using architectures with more layers take longer to train, and are more prone to overfitting. On the other hand, when using more data, they can be more accurate.
A model that has weights that have already been trained on some other dataset is called a pertained model. You should almost always use a pertained model, because it means that your model, before you’ve even shown it of your data, is already very capable.
When using a pertained model, cnn_learner will remove the last layer, since that is always specifically customized to the original training task (ImageNet dataset classification), and replace it with one or more new layers with randomized weights, of an appropriate size for the dataset you are working with. This last part of the model is called the head.
Using pertained models is the most important method we have to allow us to train more accurate models very cheaply. However, pretrained models aren’t studied in academic deep learning.
Image classification using deep learning can also be used for non-image tasks. Many datasets can be converted into an image representation. Images are almost like a universal hash in the ML world…
The biggest failure when organizations decide to use AI is to not hold out some test data that the vendor never gets to see.
In machine learning, the metric is the thing that you care about. The loss is the thing that the model cares about when adjusting weights.
The reason why EC2 sucks for running database servers is because the locally attached disk on EC2 is not ephemeral. DB servers running on EC2 would lose data when the EC2 instances crashed, so they would use S3 to take snapshots of the data.
EBS (Elastic Block Storage) is not meant for sharing. Only one EC2 instance can mount a specific EBS volume at a time.
Historically EBS wasn’t fault tolerant because Amazon would always put them in the same data centre. This is because they wanted to reduce the cost of chain replication, which was used for fault tolerance.
EBS is simply local block storage. RDS and Aurora are managed MySQL DBs.
RDS is like EBS with better fault tolerance. Replicas are replicated across different availability zones (data centres). Each transaction sends multiple 8KB pages.
Aurora has two replicas in three availability zones for super reliability. Unlike RDS though, the only thing being written over the network is *just* the log entries, *not* pages. So Aurora is faster than RDS *even* with more replicas
Aurora is quorum-based. It doesn’t need 100% consensus to continue on.
Switching from RDS to Aurora resulted in a 35x performance increase. The performance is a result of communicating only via log entries and also using a quorum-based consensus model.
In quorum-based systems, you can adjust the number of nodes W, R that a client needs to speak to in order to write and read respectively. When designing the system, the equation R+W = N+1 must hold true. So, depending on if your system is read or write heavy, you can adjust R and W, For example, R=1, W=3, or R=3, W=1.
Depending on the values of R and W, if a certain number of storage servers in Aurora go down, you can have a scenario where you can read and not write, or vice-versa.
In Aurora, when a storage server receives a log, it doesn’t have to apply the transformation to it’s internal page. It can wait until it receives a request from the DB server for something from that page.
Aurora is log-based, similarity to Raft.
The DB server in Aurora doesn’t need to do always do quorum reads. It knows the most recent log entry that has been applied for each storage server.
Aurora needs to do quorum reads when the DB server crashes to re-establish its runtime state.
In a monolithic database setup, the database runs on the same physical machine as the storage volume. However, in many modern distributed databases, the storage layer is decoupled from the database layer and replicated across multiple nodes to achieve scalability and reliability. While this is helpful for better fault tolerance, the downside is that communication between the database layer and the storage layer now happens via the network and the bottleneck lies there.
Databases tend to work with pages. You can think of pages like pages in a notebook. Working with pages is efficient because you can read multiple lines of the notebook by simply tearing out a single page. Pages in the context of disk storage minimize the number of disk I/O operations.
A page consists of a page header, the records, and the record offset array.
The record offset array points to the beginnings of records, and helps locate where the record is physically stored on disk.
In database systems, writes are written to a page in memory but also to a write-ahead log (WAL). Crashes before the page gets written to disk will be no problem since data can be recovered from the WAL. Because of this, WAL makes perfect sense for transactions.
Cache coherence is similar to consistency and linearizability for distributed databases but applied to caches because technically, writes are still on local clients, and the actual underlying virtual disk server is technically up to date.
2PL is different from simple locking. Simple locking acquires a lock for *every* shared data object that it intends to read or write. 2PL acquires locks only when needed. This is why 2PL is prone to deadlocks.
2PL != 2PC (Two-phase locking != two-phase commit).
Two phase commit is used in many sharded databases. It’s used specifically in databases which support multi-record transactions.
Two phase commit has an evil reputation because it’s slow. It requires lots of network chitchat and waiting on persistent logs.
Although 2PC looks similar to Raft consensus (one leader), the outcome is quite different. Raft is all about being able to operate with a majority, and every follower doing the same thing. The constituents of 2PC are all doing different things. Also, 2PC is not fault tolerant at all. It’s a primitive for distributed transactions. Raft is a primitive for distributed consensus.
You can however combine these two concepts and use a Raft or Paxos cluster to individually replicate each party in 2PC. (Transaction coordinator, node A, and node B.)
Atomic in ACID means “all or none”. This means transactions in a distributed system abort if the outcome is not part of the legal set of serializable outcomes.
Spanner read-only transactions doesn’t need 2PC, locks, or transactions.
Aurora is a super optimized single-node db engine with a distributed storage layer. It avoids distributed consistency by not being distributed and instead relying on fast failover ins master node failure. Everything goes through one db server. The key selling feature is that it’s API compatible with MySQL or Postgres. Spanner on the other hand is a distributed data layer that uses optimized sharded Paxos to guarantee consistency in a system that spans multiple geographic regions. That is, ACID transactions that span regions.
Spanner ensures reads are serializable by making read-only transactions read from snapshots via multi-version DB. This concept is called snapshot isolation.
Serializable means concurrent transactions will be executed in some serial order. External consistency (sometimes referred to as as strict serializability) goes one step beyond and imposes what that order must be.
There is no difference between serializability and external consistency in a single-node database.
Spanner guaranties external consistency.
A tree-walk interpreter is fine for some kinds of high-level declarative language. But for general-purpose, imperative languages — even a “scripting” language like Lox — tree-walk interpreters are too slow.
Spatial locality doesn’t really exist with a tree-walk interpreter. The objects and nodes in our parse tree could be anywhere because of garbage collection. There’s a lot of overhead with AST walkers.
Compiling to native code sucks too because you have to spend a few years mastering a specific architecture. That only gets your language onto *one* of the several popular instruction sets out there.
On one end, a tree-walk interpreter is simple, portable, and slow. On the other, native code is complex and platform specific but fast. Bytecode sits in the middle. It retains the portability of a tree walker (no getting hands dirty with assembly), and sacrifices *some* simplicity to get a performance boost in return, though not as fast as going fully native.
The problem with a fantasy architecture (bytecode) of course, is that it doesn’t exist. The solution is to write an emulator — a simulated chip written in software that interprets the bytecode one instruction at a time. This is called a virtual machine.
Structurally, bytecode resembles machine code. It’s a dense linear sequence of binary instructions.
Emulation layer adds overhead, which is a key reason bytecode is slower than native code. But in return, it gives us portability. We can write our VM in a language like C that is already supported on all the machines we care about.
The compiler’s design parallels the structure of an interpreter. The structure of an interpreter looks like parser -> syntax trees -> interpreter. A compiler looks like compiler -> bytecode -> virtual machine.
An opcode’s instruction formation is the specification of what it’s operands look like.