jeff
-
Mar 1
the tz in timestamptz stands for timezone.
you can send results from a query into a new table using the *into* keyword in Postgres. Select distinct(X) as Y into new_table from old_table;
when creating database build scripts, it’s important that they’re idempotent.
where clauses and joins are the only parts of SQL statements which can filter or remove data from the result set.
when converting spreadsheets to databases, it’s best to automate it with build scripts. import.sql to create the master table, and normalize.sql to create lookup tables and fact tables.
ilike in SQL stands for insensitive like.
when designing a distributed system, if your data can fit on one machine, all you need to do is replicate your databases. However, if the data becomes big enough, you need to partition (shard) it across multiple databases.
the three popular algorithms for replicating changes between database nodes is single-leader, multi-leader, and leaderless replication.
replication of databases is an old topic—the principles haven’t changed mush since they were studied in the 1970s because the fundamental constraints of networks have remained the same.
However, outside of research, many developers continued to assume for a long time that a database consisted of just one node. Mainstream use of distributed databases is more recent.
Since many application developers are new to this area, there has been a lot of misunderstanding around issues such as eventual consistency.
Men and months are not interchangeable, unlike what many people like. Men and months are interchangeable only when a task can be partitioned among many workers with no communication among them.
Recommended scheduling for a software task is 1/3: planning, 1/6 coding, 1/4 component test and early system test, 1/4 system test.
What is most often neglected is the system test, which comes at the end after the component is implemented. It’s very important to allow enough system test time in the original schedule.
Brook’s Law: Adding manpower to a late software project makes it later.
Most software projects have gone awry for lack of calendar time than for all other reasons combined.
The surgeon-copilot team differs in the conventional team in that the system is the product of one mine—or at most two, acting uno animo. On the other hand, the conventional team divides the work, and each is responsible for design and implementation of part of the work.
It is better to have a system reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas. This is called conceptual integrity.
The separation of architectural effort from implementation is a very powerful way of getting conceptual integrity on very large projects.
A quick solution to the implementors waiting on the architects is to not hire the implementors until the specifications are complete.
-
Mar 2
Two strategies to deal with replication is state transfer and replicated state machine. State transfer transfers the entire RAM o the primary node. Replicated stat machine only transfers external operations.
In distributed storages, it’s nearly impossible to have cut-over (primary failing, client needs to now talk to backups) without anomalies (duplicate response).
Apparently Google's replacement for GFS, Colossus, splits the master over multiple servers, and has more automated master failure recovery.
Heartbeating, or external monitoring is used to detect server failures.
Ensuring deterministic execution across replicated compute is easier with VMs than physical servers because hypervisors emulate and control many aspects of the hardware that might differ between primary and backup executions.
A guest OS is an OS running in a VM.
Network disk server acts as a tie-breaker when primary and backup can’t talk to each other.
Most of MIT 6.824 deals with fail-stop failures. If nodes become compromised, then you have a Byzantine (non-fail-stop) system.
Consensus algorithms allow a collection of machines to work as a coherent group that can survive the failures of some of its members.
Raft’s primary goal is understandability.
Consensus algorithms typically arise in the context of replicated state machines.
Replicated state machines are typically implemented using a replicated log. Each server stores a log containing a series of commands, which its state machine executes in order.
Raft is a protocol for implementing distributed consensus.
A Raft cluster contains several servers; five is a typical number, which allows the system to tolerate two failures.
Terms act as a logical clock in Raft, and they allow servers to detect obsolete information such as stale leaders.
The leader decides when it’s safe to apply a log entry to the state machines; an entry is called committed. A log entry is committed hence the leader that created the entry has replicated it on a majority of servers. Once a follower learns that a
log entry is committed, it applies the entry to its local stat machine (in log order).Locks aren’t used to protect data. They are used to protect invariants.
Condition variables solve busy waiting on some sort of condition. It moves busy waiting from poll to push. Hence the golang syntax, cond.Broadcast()
Unbounded channels are not async queues, think of them as a synchronous communication mechanism. Both sides of the channel have to be there. Send waits for receive and receive waits for send.
Bounded channels can behave async (in golang) up until they’re full. When they’re full, they act like sync communication.
You can replace a wait group with a channel. (X amount of receives)
A channel is essentially a thread-safe FIFO queue.
Map Reduce, GFS, and VM-FT, have single masters. This is great for decision making—it can’t disagree with itself. But not so good for fault tolerance. It pushes the real heart of the system into a corner.
Up until the 80s, people designed fault-tolerant systems not prone to split brain by creating networks that can’t fail. With enough money and precaution (no physical wires able to be tripped), this is possible.
The method that split brain is majority vote. Number of servers need to be odd (so there’s a majority). If n = 2f+1, you can tolerate f failures. These are also called quorum systems.
If you always need a majority to proceed, then at every step, the majority of that step will contain at least one server from the previous majority. This one property is the one that Raft relies on to prevent split brain.
Application code (like KV-store) sits on top of Raft.
If all you need is to scale to higher load, the simplest approach is vertical scaling. The problem is that the cost grows faster than linearly. Also the server is limited to a single geographic location
Replicas and sharding go hand-in-hand in distributed systems.
Three popular algorithms for replicating changes between db nodes is single-leader, multi-leader, and leaderless replication.
When you set up new followers in a replicated database, you can’t lock the database for a copy, that would be horrible for availability of the database. The process should look like 1. Take a snapshot of the leader’s database at some point in time, without taking a lock on the entire db. 2. Copy snapshot to a new follower node. 3. Follower connects to leader and request all data changes since the snapshot was taken. 4. When the follower has processed the backlog of data changes since the snapshot, we say it has *caught up*.
The cons of using WAL as replication logs in a replicated database system is that WAL contains low-level details like which bytes were changes in which disk blocks, which is coupled to the storage engine. If the database changes its storage format from one version to another, it is typically not possible t run different versions of the database software on the leader and followers. This makes upgrades of the system require downtime, instead of easily updating software version of nodes.
Logical logs are decoupled from storage engine internals, allowing leaders and followers to run different versions of database software, or even different storage engines.
A logical log format is also easier for external applications to parse. This aspect is useful if you want to send the contents of a database to an external system such as a data warehouse for offline analysis or for building custom indexes and caches.
Potential solutions to read-after-write consistency is reading from leader based on certain criteria, or client remembering timestamp to compare data received from database. There’s also additional costs if the leader is in another data centre not close to the client.
There is read-after-write consistency and cross-device read-after-write consistency.
Reads in a system with asynchronous followers can suffer from dirty reads after clean reads, making it look like things are moving backward in time. This scenario is quite likely if the user refreshes a web page and each request is routed to a random server. The second query is observing the system at an earlier point in time then the first query. One way of achieving monotonic reads is to make sure that the user always makes their reads from the same replica rather than randomly.
In the move to distributed (replicated and partitioned) databases, many systems have bonded single-node transactions, claiming that they’re too expensive in terms of performance and availability. They say that eventual consistency is inevitable in a scalable system. There is some truth to that statement but it’s overly simplistic, and we will develop a more nuanced view over the rest of DDIA.
The main gotchas with replication in distributed database is when using highly available databases aka asynchronous databases. Because commits are asynchronous, replication lag will exist in the form of stale read-after writes, non-monotonic reads, and consistent prefix reads.
-
Mar 3
We who cut mere stones must always be envisioning cathedrals.
Instead of excuses, provide options. How do you react when someone—a bank teller, an auto mechanic, or a clerk—comes to you with a lame excuse? What do you think of them and their company as a result?
Don’t live with broken windows. Don’t leave bad design, wrong decision, or poor code unrepaired. If there is insufficient time to fix it properly, then board it up. Comment out the offending code, display a “not implemented” message, or substitute dummy data instead. Take *some* action to prevent further damage and to show that you’re on top of the situation.
Great software today is often preferable to perfect software tomorrow.
You can get an informal measure of the orthogonality of a project team’s structure. Simply see how many people need to be involved in discussing each change that is requested.The larger the number, the less orthogonal the group.
-
Mar 4
Languages without container Mutex types like Rust are error-prone as they rely on a locking protocol. The invariant is protected manually by programmers instead of the type system.
An interpreter is how you “implement” a language.
An interpreter changes the representation of our source code two times before evaluating it. One, from source code to tokens, called “lexical analysis” or “lexing”. This is done by a lexer aka tokenizer. The tokens are then fed to the parser which does the second transformation and turns the tokens into an abstract syntax tree.
What exactly constitutes a token varies between lever implementations. Some lexers only convert “5”s to an integer in the parsing stage, or even later, and not when constructing tokens.
-
Mar 5
Unlike an Entity , it does not have a unique identity, and equivalence is determined by comparing the attributes encapsulated by the Value type
-
Mar 21
When compiler and interpreter devs say “language”, they are usually referring to the language implementation.
The network of paths a language implementation may choose is like climbing a mountain. You start off at the bottom with the program as raw source text, literally just a string of characters. Each phase analyzes the program and transforms it to some higher-level representation where the semantics — what the author wants the computer to do — become more apparent. Eventually you reach the peak. We have a bird’s eye view of the user’s program and can see what their code *means*. We begin our descent down the other side of the mountain. We transform the highest-level representation down to successively lower-level forms to get closer and closer to something we know how to make the CPU actually execute.
The first step in a language implementation is scanning aka lexing. A scanner/lexer takes in a linear stream of characters and chunks them together into a series of tokens.
“Lexical” comes from the Greek root “lex” meaning “word”.
The next step is parsing. This is where the syntax gets a grammar. A parser takes the flat sequence of tokens and builds a tree structure that mirrors the nested nature of the grammar. This is called a parse tree or abstract syntax tree.
Parsing has a long, rich history in computer science that is closely tied to the artificial intelligence community.
The first two stages of an interpreter are similar across all implementations. Now the individual characteristics of each language start coming into play with Static Analysis.
The first bit of analysis that most languages do is called binding or resolution. For each identifier, we find out where that name is defined and wire the two together.
Static analysis is the summit of the mountain and a sweeping view of the user’s program. All this semantic insight that is visible to us from analysis needs to be stored somewhere. There are a few places where we can store it. 1. Store it right back as attributes on the syntax tree itself — extra fields in the nodes that aren’t initialized during parsing but get filled in later. 2. Store data in a lookup table off to the side. Typically, the keys to this table are identifiers. We call this table a symbol table.
Everything up this point is considered the frontend of the language implementation. The frontend of the pipeline is specific to the source language the program is written in. The backend is concerned with the final architecture where the program will run. In the middle, the code may be stored in some intermediate representation (IR) that isn’t tightly tired to either the source or destination forms.
IRs lets you support multiple source languages and target platforms with less effort.
Once we understand what the user’s program means, we are free to swap it out with a different program that has the same semantics but implements them more efficiently. Optimization is a huge part of the programming language business. Many language hackers spend their entire careers here, squeezing every drop of performance they can out of their compilers to get their benchmarks a fraction of a percent faster. It can become sort of an obsession.
After all optimizations are applied to the user’s program, the last step is to convert it to a form the machine can actually run. This is the code generation step, where “code” here refers to assembly-like instructions a CPU runs. This is the backend of the mountain. From here on out, our representation of the code becomes more and more primitive, like evolution run in reverse, as we get closer to something our simple-minded machine can understand.
On the backend, we have a decision to make. Do we generate instructions for a real CPU or a virtual one? Code for a virtual machine used to be called p-code for *portable*, but today we generally call it bytecode because each instruction is often a single byte long. These synthetic instructions are designed to map a little more closely to the language’s semantics, and not be so tied to the peculiarities of any one computer architecture and its accumulated historical craft.
If your compiler produces bytecode, your work isn’t over once that’s done. Since there is no chip that speaks that bytecode, it’s your job to translate. You have two options. 1. Write a little mini-compiler for each target architecture that converts the bytecode to native code for that machine. You still have to do work for each chip you support, but this last stage is pretty simple and you get to reuse the rest of the compiler pipeline a cross all of the machines you support. You’re basically using your bytecode as an IR. The second option is for you to write your own VM, a program that emulates a hypothetical chip supporting your virtual architecture at runtime. Running bytecode in a VM is slower than translating it to native code ahead of time because every instruction must be simulated at runtime each time it executes. In return, you get simplicity and portability since your VM is written in a language that’s already supported on most machines. Like C for example. These VMs aren’t system VMs, but rather language or process VMs.
We finally have a program in a form that we can execute. The last step is running it natively by telling the OS to load the executable, or by starting up the VM and loading the program into that. In both cases, we usually need some service that our language provides while the program is running. All of this stuff is going at runtime, so it’s called the runtime. In a fully compiled language the runtime gets inserted directly into the resulting executable. If the language is run inside an interpreter or VM, then the runtime lives there. This is how most implementations of languages like Java, Python, and Javascript work.
Some compilers interleave parsing, analysis, and code generation so that they produce output code directly in the parser, without ever allocating any syntax trees or other IRs. These single-pass compilers strict the design of the language. C was designed around this limitation. At the time, memory was so precious that a compiler couldn’t even hold an entire source file in memory, much less the whole program. This is why in C you can’t call a function above the code that defines it unless you have an explicit forward declaration.
Some programming languages begin executing code right after parsing it to an AST. This implementation style is common for student projects and little languages, but is note widely used for general-purpose languages since it tends to be slow.
A transpolar is when you write a frontend for your language. Then, in the backend, instead of doing all the work to lower the semantics to some primitive target language, you produce a string of valid source code for some other language that’s about as high level as yours. Then, you use the existing compilation tools for that language as your escape route off the mountain and down to something you can execute.
After the viral spread of UNIX to machines, there began a long tradition of compilers that produced C as their output language. C compilers were valuable everywhere UNIX was and produced efficient code, so targeting C was a good way to get your language running on a lot of architectures.
Web browsers are the machines of today, and their machine code is Javascript, so these days it seems almost every language out there has a compiler that targets JS since that’s the main way to get your code running in the browser.
Compiling is an implementation technique that involves translating a source language to some other — usually lower level — form. When you generate bytecode or machine code, you are compiling. When you transpire to another high-level language, you are compiling too.
When we say a language implementation “is a compiler”, we mean it translates source code to some other form but doesn’t execute it. The user has to take the resulting output and run it themselves.
When we say a language implementation “is an interpreter” we mean it takes in source code and executes it immediately. It runs programs “from source code”
There are two main techniques for automatic memory management. Reference counting and garbage collection. Ref counters are much simpler to implement (Perl, PHP, and Python all started out using them). But over time, the limitations of ref counting become too troublesome. All of those languages eventually ended up adding a full tracing GC, or at least enough of one to clean up object cycles.
It’s a good choice to remove and ban null from a statically typed language. In a dynamically typed one though, eliminating it is often more annoying than having it.
Ternary operators are sometimes called mixfix operators.
The subtract operator is actually both an infix and prefix operator.
People generally reinvent classes with the power and flexibility of prototypes. No one knows why though…
-
Mar 22
The rules that determine how a particular language groups characters into lexemes are called its lexical grammar.
Tools like lex and flex take a handful of regexes and gives you a complete scanner back.
Maxical munch is a scenario when two lexical grammar rules can both match a chunk of code that the scanner is looking at, and whichever one matches the most characters wins.
Haskell is a pure functional programming language.
Haskell is lazy.
Haskell is statically typed.
Functions in Haskell are invocated without parentheses.
All if statements in Haskell need an else because the if statement needs to be an expression.
Functions in Haskell cannot start with capital letters.
When a function in Haskell doesn’t take any parameters, it’s called a definition or name.
Tuples in Haskell don’t have to be homogenous, unlike lists. They also have a known size beforehand, unlike lists.
Parameters in Haskell’s function type declaration are separated with -> instead of a more explicit distinction like “,”
or something.Types in Haskell start with an uppercase. Type variable start with a lowercase.
A scanner implements a lexical grammar. A parser implements a syntactic grammar.
A lexical grammar can be defined with a regular language. However, for syntactic grammar which can nest arbitrarily deeply, regular languages aren’t powerful enough. We need a bigger hammer, which is a context-free grammar (CFG).
The alphabet of a lexical grammar are characters and a The alphabet of a syntactic grammar are tokens, and strings are expressions.
Strings generated from grammar rules are called derivations. Rules are called productions because they produce strings in the grammar.
Each production in a CFG has a head (it’s name), and a body, which describes *what* it generates. A body is a list of symbols.
A symbol can be a terminal or a nonterminal.
A formal grammar is “context free” if its production rules can be applied regardless of the context of a nonterminal. No matter which symbols surround it, the single nonterminal on the left hand side can always be replaced by the right hand side.
This is what distinguishes it from a context-sensitive grammar.The language generated by a grammar is the set of terminal symbols that can be derived, by repeated rule applications from some particular nonterminal symbol (“start symbol”).
-
Mar 23
Each style of programming has a certain “grain” to it. Object-oriented languages want you to *orient* your code along the rows of types. Adding a new row (class) is easy. Adding a column (function) is a PITA. You have to add a function to every existing class. Functional languages encourage you to lump each column’s worth of code together in a function. Adding functions is easy, but not types. You have to add an arm to every function’s pattern matching.
A bunch of smart language nerds noticed that neither style made it easy to add both rows and columns to the table. They called this difficulty the “expression problem”
Languages with multimethods like Common Lisp’s CLOS, Dylan, and Julia support adding both new types and operations easily. But they sacrifice either static type checking or separate compilation.
The Visitor pattern is really about approximating the functional style within an OOP language.
You can “play” a CFG like a game in order to generate strings. Parsers play that game in reverse. Given a string (series of tokens), a parser maps tokens to terminals in the grammar to figure out which rules could have generated that string.
Without well-defined 1. Precedence and 2. Associativity, an expression that uses multiple operators is ambiguous — it can be parsed into different syntax trees.
An LL parser (Left-to-right, Leftmost derivation) is a top-down parser for a subset of context-free languages. It parses the input from Left to right, performing Leftmost derivation of the sentence.
An LL parser is called an LL(k) parser if it uses k tokens of lookahead when parsing a sentence. A grammar is called an LL(k) grammar if an LL(k) parser can be constructed from it.
Recursive descent parsing is considered top-down parsing because it starts from the top or outermost grammar rule and works its way down into the nested subexpressions before finally reaching the leaves of the syntax tree. This contrasts bottom-up
parsers like LR that start with primary expressions and compose them into larger and larger chunks of syntax.A recursive descent parser is a literal translation of the grammar’s rules straight into imperative code. Each rule becomes a function.
Like the scanner, the parser consumes a flat input sequence, only now we’re reading tokens instead of characters.
A parser’s error handling wants to report as many distinct errors as there are but also minimize cascading errors. These two goals are in tension.
A popular parser recovery technique is panic mode. As soon as the parser detects an error, it enters panic mode. Before it can get back to parsing, it needs to get its state and the sequence of forthcoming tokens aligned such that the next token does match the rule being parsed. This process is called synchronization. The parser fixes its parsing state by jumping out of any nested productions until it gets back to that rule. Then it synchronizes the token stream by discarding tokens until it reaches one that can appear at that point in the rule.
With synchronization, any additional real syntax errors hiding in the discarded tokens aren’t reported, but it also means that any mistaken cascaded errors that are side effects of the initial error aren’t falsely reported either, which is a decent trade-off.
The traditional place in the grammar to synchronize is between statements.
Interpreters consist of: scan, parse, execute.
The difference between let and where bindings in Haskell is that let bindings are expressions themselves. Whereas where bindings are just syntactic constructs.
Every function in Haskell officially only takes one parameter. Functions that accept several parameters are curried functions.
If we call a function with too few parameters, we get back a partially applied function.
Lisp stands for [Lis]t [P]rocessor
In Haskell, by calling functions with too few parameters, we are actually *creating* functions on the fly.
The advantage of currying is that it makes every function into a one-argument function returning one result. Because of this, mathematicians can more easily write proofs about computation. The fact that it’s occasionally useful to create higher-order functions in practical programs is a side-effect.
-
Mar 24
Zoo Keeper supports scalable lock strategy.
CRAQ is similar to Zoo Keeper except that is provides a stronger guarantee – linearizability. It uses chain replication.
CRAQ cannot be used by itself. It does not have a defence against split brain. It needs to be used with an external authority on whose alive and whose dead. This is called a configuration manager. It’s job is to measure aliveness. Every time it thinks a server is dead, it sends a new configuration to the chain. The configuration manager usually uses Raft/Paxos/Zoo Keeper.
Chain replication is favored over Raft/Paxos when you need to reduce the load on the leader (CRAQ leader only needs to send updates to successor, not the majority of a cluster).
Raft/Paxos aka majority schemes are favored when nodes might be in different data centres. You don’t have to wait for a single slow point of failure, you can move on with a majority.
Primary/backup replication schemes can be viewed as a Chain Replication chain with just two nodes.
-
Mar 26
Since statements, by definition, don’t evaluate to a value, they need to do something else to be useful. That something is called a side effect.
The structure of code using recursive descent looks like the grammar it’s implementing.
We defer undefined variable error to runtime instead of a syntax error. That way mutually recursive functions can exist. It’s OK to refer to a variable before it’s defined as long as you don’t evaluate the reference.
Scope and environments are close cousins. The former is the theoretical concept, the latter is the machinery that implements it.
In the early part of last century, mathematicians stumbled into a series of confusing paradoxes that led them to doubt the stability of the foundation they had built their work upon. They wanted to rigorously ask questions like “Can all true statements be proven?” And “Can we compute all functions that we can define?” It turns out th answer is no, and in the process of proving so, Alan Turing and Alonzo Church devised a precise answer to the last question — a definition of what kinds of functions are computable. These are now considered the “computable functions”. Turing’s system is called a Turing machine. Church’s is the lambda calculus. Both are still widely used as the basis for models of computation, and many modern functional programming languages use the lambda calculus at their core.
A Turing machine can compute any computable function. A language that is expressive enough to compute any computable function is considered Turing-complete.
Turing machines are simple. You need arithmetic, control flow, and the ability to use arbitrary amounts of memory.
If statement lets you conditionally execute statements, and the “conditional” ternary operator lets you conditionally execute *expressions*.
The parentheses around expressions in if statements are only half useful. You need some delimiter between the end of the expression and the then statement, otherwise the parser can’t tell when it has reached the end of the expression. The opening parenthesis after *if* doesn’t do anything useful. Dennis Ritchie just put it there so he could use ) as the ending delimiter without having unbalanced parentheses.
Most languages and parsers avoid the dangling-else in an ad-hoc way. The else is bound to the rest if that precedes it.
An interpreter’s evaluate methods are just thin wrappers around the host language’s actual code. (Arithmetic, conditional execution).
-
Mar 29
During the Fortran days, lexing, parsing, semantic analysis, optimization, and code-get roughly had the same proportion wrt loc. Nowadays, lexing and parsing are done using pre-existing tools. There is more semantic analysis than before. Optimization takes 50% of the compiler, and code-gen is also very minimal.
It’s easier to add new syntax and sugar to a language than change semantics. That’s why many languages tend to sweeten with time.
-
The eight concepts of Parallel Distributed Processing (PDP 1986) describes the eight concepts of deep learning. Processing units, state of activation, output function, pattern of connectivity, propagation rule, activation rule, learning rule, and environment.
Deep learning is a type of machine learning.
The model of machine learning is different from regular programming. With regular programming, you think input -> function -> output. With machine learning, the model is inputs & weights -> model -> results.
However, once the model is trained (once we’ve chosen our final, favourite weight assignment), then we can think of the weights as being *part of the model*, since we’re not varying them anymore. Thus, machine learning actually looks like inputs -> model -> results. We can treat the trained model like our old function again.
Modern day terminology for pre-trained models looks like this. Inputs & parameters -> architecture -> predictions -> loss (derived from predictions and labels).
Machine learning is the training of programs developed by allowing a computer to learn from its experience, rather than through manually coding the individual steps.
Learning becomes automatic when the adjustment of the weights was also automatic.
In machine learning, we would like a function that is so flexible that it could be used to solve any given problem, just by varying its weights. This function already exists, it’s the neural network.
The process of finding good weight assignments is called stochastic gradient descent (SGD).
Limitations of machine learning are:
A model cannot be create without data.
A model can only learn to operate on the patterns seen in the input data used to train it.
This learning approach only creates predictions, not recommended actions.
It’s not enough to just have examples of input data; we need labels for that data too.
Most of the time the data you have is a proxy you care about. For instance, arrests is a proxy for crime. The difference is usually significant, and if not taken into account, can lead to bias.
When you train a model, you must always have both a training set, and a validation set, and must measure the accuracy of your model only on the validation set.
If you train for too long, with not enough data, you will the accuracy of your model to get worse; this is called over-fitting.
(Super burnt out for 2 weeks)