# 有限的委托集s

||yabo app

Video:

（（Lightly edited) slides://www.hdjkn.com/files/factored-set-slide.pdf

###### （第1部分，动机）···Some Context

• 我的主要动机是降低存在风险。
• I try to dothisby trying to becomeless confusedabout intelligence and optimization and agency and various things in that cluster.
• 我这里的主要策略是开发一种代理理论嵌入in the environment that they’re optimizing. I think there are a lot of open hard problems around doing this.
• This leads me to do a bunch of weird math and philosophy. This talk is going to be an example of some weird math and philosophy.

This talk can be split into 2 parts:

This talk can also be split into5parts differentiated by color:标题幻灯片，，，，Motivation，，，，Table of Contents，，，，Main Body，，，，andExamples。Combining these gives us 10 parts (some of which are not contiguous):

###### （第1部分，主体）···Set Partitions

• 一种划分of a set $$S$$ is a set $$X$$ of non-empty subsets of $$S$$, calledparts，，，，such that for each $$s∈S$$ there exists a unique part in $$X$$ that contains $$s$$.
• Basically, a partition of $$S$$ is a way to view $$S$$ as a disjoint union. We have parts that are disjoint from each other, and they union together to form $$S$$.
• 我们将为\（s \）的所有分区编写\（\ mathrm {part}（s）\）。
• We’ll say that a partition $$X$$ istrivialif it has exactly one part.
• We’ll use bracket notation, $$[s]_{X}$$, to denote the unique part in $$X$$ containing $$s$$. So this is like the equivalence class of a given element.
• 我们将使用符号\（s〜_ {x} t \）说两个元素\（s \）和\（t \）在\（x \）中的同一部分。

We’re also going to use thelattice structureof partitions:

• We’ll say that $$X \geq_S Y$$ ($$X$$ is finer than $$Y$$, and $$Y$$ is coarser than $$X$$) if $$X$$ makes all of the distinctions that $$Y$$ makes (and possibly some more distinctions), i.e., if for all $$s,t \in S$$, $$s \sim_X t$$ implies $$s \sim_Y t$$. You can break your set $$S$$ into parts, $$Y$$, and then break it into smaller parts, $$X$$.
• $$X\vee_S Y$$ (the common refinement of $$X$$ and $$Y$$ ) is the coarsest partition that is finer than both $$X$$ and $$Y$$ . This is the unique partition that makes all of the distinctions that either $$X$$ or $$Y$$ makes, and no other distinctions. This is well-defined, which I’m not going to show here.

###### （第1部分，主体）···设定因素化

If you take one definition away from this first talk, it should be the definition of factorization. I’ll try to explain it from a bunch of different angles to help communicate the concept.

If $$B=\{b_0,\dots,b_{n}\}$$ is a factorization of $$S$$ , then there exists a bijection between $$S$$ and $$b_0\times\dots\times b_{n}$$ given by $$s\mapsto([s]_{b_0},\dots,[s]_{b_{n}})$$. This bijection comes from sending an element of $$S$$ to the tuple consisting only of parts containing that element. And as a consequence of this bijection, $$|S|=\prod_{b\in B} |b|$$.

So we’re really viewing $$S$$ as a product of these individual factors, with no additional structure.

We’ll write $$\mathrm{Fact}(S)$$ for the set of all factorizations of $$S$$, and we’ll say that afinite factored set是a pair $$(S,B)$$, where $$S$$ is a finite set and $$B \in \mathrm{Fact}(S)$$.

Note that the relationship between $$S$$ and $$B$$ is somewhat loopy. If I want to define a factored set, there are two strategies I could use. I could first introduce the $$S$$, and break it into factors. Alternatively, I could first introduce the $$B$$.一种nytime I have a finite collection of finite sets $$B$$, I can take their product and thereby produce an $$S$$, modulo the degenerate case where some of the sets are empty. So $$S$$ can just be the product of a finite collection of arbitrary finite sets.

To my eye, this notion of factorization is extremely natural. It’s basically the multiplicative analog of a set partition. And I really want to push that point, so here’s another attempt to push that point:

I’m now going to move on to some examples.

###### （第1部分，示例）···Enumerating Factorizations

Exercise! What are the factorizations of the set $$\{0,1,2,3\}$$ ?

Spoiler space:

$$\begin{split} \{ \ \ \{ \{0\},\{1\},\{2\},\{3\} \} \ \ \} \end{split} \begin{split} \ \ \ \ \underline{\ 0 \ \ \ 1 \ \ \ 2 \ \ \ 3 \ } \end{split}$$

Recall that in the definition of factorization, we wanted that for each way of choosing one part from each factor, we had a unique element in the intersection of those parts. Since we only have one factor here, satisfying the definition just requires that for each way of choosing one part from the discrete partition, there exists a unique element that is in that part.

So if I’m going to have a factorization of $$\{0,1,2,3\}$$ that isn’t this trivial one, I’m going to have to pick 2 partitions of my 4-element set that each break the set into 2 parts of size 2. And there are 3 partitions of a 4-element sets that break it up into 2 parts of size 2. For each way of choosing a pair of these 3 partitions, I’m going to get a factorization.

$$\begin{split} \begin{Bmatrix} \ \ \ \{\{0,1\}, \{2,3\}\}, \ \\ \{\{0,2\}, \{1,3\}\} \end{Bmatrix} \end{split} \begin{split} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \begin{array} { |c|c|c|c| } \hline 0 & 1 \\ \hline 2 & 3 \\ \hline \end{array} \end{split}$$

\（\ begin {split} \ begin {bmatrix} \ \ \ \ \ {\ {\ {0,1 \}，\ {2,3 \} \}，\ \ \ \ \ \ {\ {\ {0,3 \}1，，，，2\}\} \end{Bmatrix} \end{split} \begin{split} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \begin{array} { |c|c|c|c| } \hline 0 & 1 \\ \hline 3 & 2 \\ \hline \end{array} \end{split}\)

$$\begin{split} \begin{Bmatrix} \ \ \ \{\{0,2\}, \{1,3\}\}, \ \\ \{\{0,3\}, \{1,2\}\} \end{Bmatrix} \end{split} \begin{split} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \begin{array} { |c|c|c|c| } \hline 0 & 2 \\ \hline 3 & 1 \\ \hline \end{array} \end{split}$$

So there will be 4 factorizations of a 4-element set.

In general you can ask, “How many factorizations are there of a finite set of size $$n$$ ?”. Here’s a little chart showing the answer for $$n \leq 25$$:

You’ll notice that if $$n$$ is prime, there will be a single factorization, which hopefully makes sense. This is the factorization that only has one factor.

###### （（Part 2, Motivation) · · ·这Pearlian Paradigm

Given a collection of variables and a joint probability distribution over those variables, Pearl can infer causal/temporal relationships between the variables. (In this talk I’m going to use “causal” and “temporal” interchangeably, though there may be more interesting things to say here philosophically.)

Pearl can infer temporal data from statistical data, which is going against the adage that “correlation does not imply causation.” It’s like Pearl is taking the combinatorial structure of your correlation and using that to infer causation, which I think is just really great.

“Well, what if I take the variables that I’m given in a Pearlian problem and I just forget that structure? I can just take the product of all of these variables that I’m given, and consider the space of all partitions on that product of variables that I’m given; and each one of those partitions will be its own variable. And then I can try to do Pearlian causal inference on this big set of all the variables that I get by forgetting the structure of variables that were given to me.”

So in my view, this cheating is very entangled with the fact that Pearl’s paradigm isn’t great for handling abstraction and determinism.

Where Pearl was given a collection of variables, we are going to just consider all partitions of a given set. Where Pearl infers a directed acyclic graph, we’re going to infer a finite factored set.

In the Pearlian world, we can look at the graph and read off properties of time and orthogonality/independence. A directed path between nodes corresponds to one node being before the other, and two nodes are independent if they have no common ancestor. Similarly, in our world, we will be able to read time and orthogonality off of a finite factored set.

（（Orthogonality and independence are pretty similar. I’ll use the word “orthogonality” when I’m talking about a combinatorial notion, and I’ll use “independence” when I’m talking about a probabilistic notion.)

In the Pearlian world,d- 您可以从图表中读取的分离，对应于可以放在图表上的所有概率分布中的条件独立性。我们将拥有一个基本的定理，它基本上是同一件事：条件正交性对应于我们可以将其放在各种概率分布中的条件独立性。

In the Pearlian world,d- 分离将满足组成图型公理。在我们的世界中，我们将满足构图半径公理。我声称您首先不应该想要的第五章图形公理。

Excluding the motivation, table of contents, and example sections, this table also serves as an outline of the two talks. We’ve already talked about set partitions and finite factored sets, so now we’re going to talk about time and orthogonality.

###### （（Part 2, Main Body) · · ·Time and Orthogonality

I think that if you capture one definition from this second part of the talk, it should be this one. Given a finite factored set as context, we’re going to define the history of a partition.

history\（x \）的书面\（h^f（x）\）是最小的因素\（h \ subseteq b \），因此，如果\（s，t \ in s \），则（s \ sim_b t \）对于所有\（b \ in H \），然后\（s \ sim_x t \）。

So the history $$H$$ is a set of factors of $$S$$ , and knowing the values of all the factors in $$H$$ is sufficient to know the value of $$X$$ , or to know which part in $$X$$ a given element is going to be in. I’ll give an example soon that will maybe make this a little more clear.

We’re also going to define orthogonality from history. We’ll say that two partitions $$X$$ and $$Y$$ areorthogonal，书面\（x \ perp^fy \），如果他们的历史是不相交的：\（h^f（x）\ cap h^f（y）= \ {\ {\} \）。

Now I’m going to go through an example.

###### （（Part 2, Examples) · · ·Game of Life

Let $$S$$ be the set of all Game of Life computations starting from an $$[-n,n]\times[-n,n]$$ board.

LET \（r = \ {（r，c，t）\ in \ mathbb {z}^3 \ mid0 \ leq t \ leq t \ leq n，\ \ \ \ \）n-t \} \）（即，可以从初始\（[ - n，n] \ times [-n，n] \）板上计算的单元格。对于\（（r，c，t）\ in r \），让\（\ ell（r，c，t）\ subseteq s \）是所有计算的集合，使得row \（r \）的单元格（r \）和列\（c \）在时间\（t \）中还活着。

（（Minor footnote: I’ve done some small tricks here in order to deal with the fact that the Game of Life is normally played on an infinite board. We want to deal with the finite case, and we don’t want to worry about boundary conditions, so we’re only going to look at the cells that are uniquely determined by the initial board. This means that the board will shrink over time, but this won’t matter for our example.)

\（s \）是生命计算的所有游戏集合，但是由于生命游戏是确定性的，因此所有计算的集合都与所有初始条件的集合中均属于Berijective对应关系。\（| s | = 2^{（2n+1）^{2}} \），初始板的数量。

This also gives us a nice factorization on the set of all Game of Life computations. For each cell, there’s a partition that separates out the Game of Life computations in which that cell is alive at time 0 from the ones where it’s dead at time 0. Our factorization, then, will be a set of $$(2n+1)^{2}$$ binary factors, one for each question of “Was this cell alive or dead at time 0?”.

Meanwhile, the question of whether the red cell is alive or dead is going to beorthogonal到这题of whether the green cell is alive or dead.

• $$h^{F}(X)=\{L_{(r,c,0)}\in B\mid |r_X-r|\leq t_X,|c_X-c|\leq t_X\}$$.
• $$X \ ᐸ^{F} \ Y$$ if and only if $$t_X \ ᐸ \ t_Y$$ and $$|r_Y-r_X|,|c_Y-c_X|\leq t_Y-t_X$$.
• \（x \ perp^f y \）if and＆hork if \（| r_y-r_x |> t_y+t_x \）或\（| c_y-c_x |> t_y+t_x \）。

###### （（Part 2, Main Body) · · ·Conditional Orthogonality

We’ll begin with the definition of conditional history. We again have a fixed finite set as our context. Let $$F=(S,B)$$ be a finite factored set, let $$X,Y,Z\in\text{Part}(S)$$, and let $$E\subseteq S$$.

\（x \）给定\（e \）的条件历史，书面\（h^f（x | e）\）是满足以下两个条件的最小因子\（h \ subseteq b \）的最小因素集：

• 对于所有\（s，t \ in e \），如果\（s \ sim_ {b} t \）对于所有\（b \ in H \），则\（s \ sim_x t \）。
• 对于所有\（s，t \ in e \）和\（r \ in s \），如果\（r \ sim__ {b_0} s \）for all \（b_0 \ in H \）和\（r \ sim_{b_1} t \）for ash as ash \（b_1 \ in B \ setminus h \），然后\（r \ in e \）。

• “If I only look at the values of the factors in $$H$$, is ‘this point is in $$E$$’ compatible with that information?”
• “If I only look at the values of the factors in $$B\setminus H$$ , is ‘this point is in $$E$$’ compatible with that information?”

If both of these questions return “yes”, then the point has to be in $$E$$.

We’re going to use conditional history to defineconditional orthogonality，，，，just like we used history to define orthogonality. We say that $$X$$ and $$Y$$ are给出正交\（e \ subseteq s \），书面\（x \ perp^{f} y \ mid e \），如果\（x \）给定\（e \）的历史与\的历史不相交（y（y）\）给定\（e \）：\（h^f（x | e）\ cap h^f（y | e）= \ {\} \）。

We say $$X$$ and $$Y$$ are给出正交\（z \ in \ text {part}（s）\），书面\（x \ perp^{f} y \ mid z \），如果\（x \ perp^{f} y mid z \）全\（z \ in z \）。因此，考虑到分区的每种单独的方式，该分区可能是该分区中的每个单个部分，就意味着成为正交的是正交的。

###### （（Part 2, Main Body) · · ·组成半径公理

• If $$X \perp^{F} Y \mid Z$$, then $$Y \perp^{F} X \mid Z$$. (symmetry
• 如果\ \（x \ perp^{f}（y \ vee_s w）\ mid z \），则\（x \ perp^{f} y \ mid z \）和\（x \ perp^{f} w \ \ perp^{f}中部Z \）。（（decomposition
• If $$X \perp^{F} (Y\vee_S W) \mid Z$$, then $$X \perp^{F} Y \mid (Z\vee_S W)$$. (weak union
• if \（x \ perp^{f} y \ mid z \）和\（x \ perp^{f} w \ mid（z \ vee_s y）\），然后\（x \ perp^{f}（y\ vee_s w）\中Z \）。（（contraction
• if \（x \ perp^{f} y \ mid z \）和if \（x \ perp^{f} w \ mid z \），然后\（x \ perp^{f}（y \ vee_s w）\中Z \）。（（作品

Decomposition and composition act like converses of each other. Together, conditioning on $$Z$$ throughout, they say that $$X$$ is orthogonal to both $$Y$$ and $$W$$ if and only if $$X$$ is orthogonal to the common refinement of $$Y$$ and $$W$$.

###### （（Part 2, Main Body) · · ·基本定理

fundamental theorem of finite factored sets说：让\（f =（s，b）\）为有限的委托集，让\（x，y，z \ in \ text {part}（s）\）为\（s \）的分区。然后，\（x \ perp^{f} y \ mid z \），并且仅在\（f \）上所有概率分布\（p \），以及all \（x \ in x \），\（y（y）\ in y \）和\（z \ in z \），我们有\（p（x \ cap z）\ cdot p（y \ cap z）= p（x \ cap y \ cap z）\ cdot p（z）\）。即，如果在所有概率分布中满足\（z \）给定的\（z \），则与\（y \）之间是正交的。在所有概率分布中。

This theorem, for me, was a little nontrivial to prove. I had to go through defining certain polynomials associated with the subsets, and then dealing with unique factorization in the space of these polynomials; I think the proof was eight pages or something.

So next, we’re going to talk about how to get temporal data from orthogonality data.

###### （（Part 2, Main Body) · · ·Temporal Inference

• \（f^{ - 1}（x）\ perp^{f} f^{ - 1}（y）\ mid f^{ - 1}（z）\）e时在o \）和
•  $$\lnot (f^{-1}(X) \perp^{F} f^{-1}(Y) \mid f^{-1}(Z))$$ whenever $$(X,Y,Z)\in N$$.

So we have these orthogonality rules we want to satisfy, and we want to consider the space of all models that are consistent with these rules. And even though there will always be infinitely many models that are consistent with my database, if at least one is—you can always just add more information that you then delete with $$f$$—we would like to be able to sometimes infer that for all models that satisfy our database, $$f^{-1}(X)$$ is before $$f^{-1}(Y)$$.

###### （（Part 2, Examples) · · ·Two Binary Variables (Pearl)

So we’ve set up this nice combinatorial notion of temporal inference. The obvious next questions are:

• Can we actually infer interesting facts using this method, or is it vacuous?
• 一种nd: How does this framework compare to Pearlian temporal inference?

To address that question, we’ll go to an example. Let $$X$$ and $$Y$$ be two binary variables. Pearl asks: “Are $$X$$ and $$Y$$ independent?” If yes, then there’s no path between the two. If no, then there may be a path from $$X$$ to $$Y$$, or from $$Y$$ to $$X$$, or from a third variable to both $$X$$ and $$Y$$.

However, I claim that this Pearlian ontology in which you’re handed this collection of variables has blinded us to the obvious next question, which is: is $$X$$ independent of $$X \ \mathrm{XOR} \ Y$$?

In the Pearlian world, $$X$$ and $$Y$$ were our variables, and $$X \ \mathrm{XOR} \ Y$$ is just some random operation on those variables. In our world, $$X \ \mathrm{XOR} \ Y$$ instead is a variable on the same footing as $$X$$ and $$Y$$. The first thing I do with my variables $$X$$ and $$Y$$ is that I take the product $$X \times Y$$ and then I forget the labels $$X$$ and $$Y$$.

So there’s this question, “Is $$X$$ independent of $$X \ \mathrm{XOR} \ Y$$?”. And if $$X$$独立于\ \（x \ \ mathrm {xor} \ y \），实际上我们将能够得出结论，\（x \）是before$$Y$$!

So not only is the finite factored set paradigm non-vacuous, and not only is it going to be able to keep up with Pearl and infer things Pearl can’t, but it’s going to be able to infer a temporal relationship from only two variables.

So let’s go through the proof of that.

###### （（Part 2, Examples) · · ·两个二进制变量（分组集）

Let $$\Omega=\{00,01,10,11\}$$, and let $$X$$, $$Y$$, and $$Z$$ be the partitions (/questions):

• \（x = \ {\ {00,01 \}，\ {10,11 \} \} \）。（第一个是什么？）
• $$Y=\{\{00,10\}, \{01,11\}\}$$. (What is the second bit?)
• $$Z=\{\{00,11\}, \{01,10\}\}$$. (Do the bits match?)

From this, we’ll be able to prove that $$X \ ᐸ_D \ Y$$.

Since $$X\leq_{\Omega} Y\vee_{\Omega} Z$$—that is, since $$X$$ can be computed from $$Y$$ together with $$Z$$—$$H_X\subseteq H_Y\cup H_Z$$. (Because a partition’s history is the smallest set of factors needed to compute that partition.)

Thus $$H_X\neq H_Y$$, so $$H_X \subset H_Y$$, so $$f^{-1}(X) \ ᐸ^F \ f^{-1}(Y)$$, so $$X \ ᐸ_D \ Y$$. □

When I’m doing temporal inference using finite factored sets, I largely have proofs that look like this. We collect some facts about emptiness or non-emptiness of various Boolean combinations of histories of variables, and we use these to conclude more facts about histories of variables being subsets of each other.

I have a more complicated example that uses conditional orthogonality, not just orthogonality; I’m not going to go over it here.

Imagine that I had a bit, and it’s either a 0 or a 1, and it’s either blue or green. And these two facts are primitive and independently generated. And I also have this other concept that’s like, “Is it grue or bleen?”, which is the $$\mathrm{XOR}$$ of blue/green and 0/1.

###### （（Part 2, Main Body) · · ·申请 /未来工作 /猜测

Research topics related to inference:

• Decidability of Temporal Inference
• 有效的时间推断
• Conceptual Inference
• Temporal Inference from Raw Data and Fewer Ontological Assumptions
• 与确定关系的时间推断
• 没有正交的时间
• Conditioned Factored Sets

• Extending Definitions to the Infinite Case
• 有限维的基本定理集合
• 连续的时间
• New Lens on Physics

• 嵌入式观测
• Counterfactability
• Cartesian FramesSuccessor
• UnravelingCausal Loops
• Conditional Time
• Logical Causality from Logical Induction
• Orthogonality as Simplifying Assumptions for Decisions
• 有条件的正交性作为抽象desideratum

I focused on the temporal inference aspect of finite factored sets in this talk, because it’s concrete and tangible to be able to say, “Ah, we can do Pearlian temporal inference, only we can sometimes infer more structure and we rely on fewer assumptions.”

I want to build up the factored set ontology as an alternative to graphs when modeling agents interacting with things, or when modeling information flow. And I’m really excited about that direction.

Did you like this post?You may enjoy our otheryabo app posts, including: