||yabo app



((Lightly edited) slides://www.hdjkn.com/files/factored-set-slide.pdf

((Part 1, Title Slides) · · ·有限的委托集s

(第1部分,动机)···Some Context

斯科特:So I want to start with some context. For people who are not already familiar with my work:

  • 我的主要动机是降低存在风险。
  • 我尝试通过弄清楚如何做到这一点艾尔ignadvanced artificial intelligence.
  • I try to dothisby trying to becomeless confusedabout intelligence and optimization and agency and various things in that cluster.
  • 我这里的主要策略是开发一种代理理论嵌入in the environment that they’re optimizing. I think there are a lot of open hard problems around doing this.
  • This leads me to do a bunch of weird math and philosophy. This talk is going to be an example of some weird math and philosophy.

适用于谁艾尔ready familiar with my work, I just want to say that according to my personal aesthetics, the subject of this talk is about as exciting asLogical Induction,也就是说,我对此感到非常兴奋。我对这次听众感到非常兴奋。我很高兴现在发表这次演讲。

((Part 1, Table of Contents) · · ·考虑说话

This talk can be split into 2 parts:



This talk can also be split into5parts differentiated by color:标题幻灯片,,,,Motivation,,,,Table of Contents,,,,Main Body,,,,andExamples。Combining these gives us 10 parts (some of which are not contiguous):

第1部分:简短的交谈 Part 2: The Main Talk
标题幻灯片 有限的委托集s 这Main Talk (It’s About Time)
Motivation Some Context 这Pearlian Paradigm
Table of Contents 考虑说话 We Can Do Better
Main Body Set Partitions,,,,etc. Time and Orthogonality,,,,etc.
Examples Enumerating Factorizations Game of Life,,,,etc.

(第1部分,主体)···Set Partitions

好的。Here’s some background math:

  • 一种划分of a set \(S\) is a set \(X\) of non-empty subsets of \(S\), calledparts,,,,such that for each \(s∈S\) there exists a unique part in \(X\) that contains \(s\).
  • Basically, a partition of \(S\) is a way to view \(S\) as a disjoint union. We have parts that are disjoint from each other, and they union together to form \(S\).
  • 我们将为\(s \)的所有分区编写\(\ mathrm {part}(s)\)。
  • We’ll say that a partition \(X\) istrivialif it has exactly one part.
  • We’ll use bracket notation, \([s]_{X}\), to denote the unique part in \(X\) containing \(s\). So this is like the equivalence class of a given element.
  • 我们将使用符号\(s〜_ {x} t \)说两个元素\(s \)和\(t \)在\(x \)中的同一部分。

您还可以将分区视为集合\(s \)上的变量。以这种方式查看,分区的值\(x \)对应于元素在哪个部分中。

或者,您可以将\(x \)视为一个您可以询问\(s \)的通用元素。如果我有一个\(s \)的元素,并且它对您隐藏,并且您想问一个问题,则每个可能的问题都对应于根据不同可能的答案分割\(s \)的分区。

We’re also going to use thelattice structureof partitions:

  • We’ll say that \(X \geq_S Y\) (\(X\) is finer than \(Y\), and \(Y\) is coarser than \(X\)) if \(X\) makes all of the distinctions that \(Y\) makes (and possibly some more distinctions), i.e., if for all \(s,t \in S\), \(s \sim_X t\) implies \(s \sim_Y t\). You can break your set \(S\) into parts, \(Y\), and then break it into smaller parts, \(X\).
  • \(X\vee_S Y\) (the common refinement of \(X\) and \(Y\) ) is the coarsest partition that is finer than both \(X\) and \(Y\) . This is the unique partition that makes all of the distinctions that either \(X\) or \(Y\) makes, and no other distinctions. This is well-defined, which I’m not going to show here.



一种分解of a set \(S\) is a set \(B\) of nontrivial partitions of \(S\), called因素,因此,对于从\(b \)中每个因素中选择一个部分的每种方式,在这些部分的相交中存在\(s \)的唯一元素。

因此,这可能有点密集。我对此的简短标语是:“ \(s \)的分解是将\(s \)视为产品的一种方法,就像分区是一种将\(s \)视为一种方式的方式一样脱节联盟。”

If you take one definition away from this first talk, it should be the definition of factorization. I’ll try to explain it from a bunch of different angles to help communicate the concept.

If \(B=\{b_0,\dots,b_{n}\}\) is a factorization of \(S\) , then there exists a bijection between \(S\) and \(b_0\times\dots\times b_{n}\) given by \(s\mapsto([s]_{b_0},\dots,[s]_{b_{n}})\). This bijection comes from sending an element of \(S\) to the tuple consisting only of parts containing that element. And as a consequence of this bijection, \(|S|=\prod_{b\in B} |b|\).

So we’re really viewing \(S\) as a product of these individual factors, with no additional structure.

一种lthough we won’t prove this here, something else you can verify about factorizations is that all of the parts in a factor have to be of the same size.

We’ll write \(\mathrm{Fact}(S)\) for the set of all factorizations of \(S\), and we’ll say that afinite factored set是a pair \((S,B)\), where \(S\) is a finite set and \(B \in \mathrm{Fact}(S)\).

Note that the relationship between \(S\) and \(B\) is somewhat loopy. If I want to define a factored set, there are two strategies I could use. I could first introduce the \(S\), and break it into factors. Alternatively, I could first introduce the \(B\).一种nytime I have a finite collection of finite sets \(B\), I can take their product and thereby produce an \(S\), modulo the degenerate case where some of the sets are empty. So \(S\) can just be the product of a finite collection of arbitrary finite sets.

To my eye, this notion of factorization is extremely natural. It’s basically the multiplicative analog of a set partition. And I really want to push that point, so here’s another attempt to push that point:

一种划分是a set \(X\) of非空
子集of \(S\) such that the obvious
functionfromd是joint unionof
\(x \)的元素\(S\) is a bijection.
一种分解是一组\(b \)的non-trivial
划分sof \(S\) such that the obvious
\(b \)的元素from\(S\) is a bijection.



安德鲁·克里奇(Andrew Critch):从某种意义上说,斯科特(Scott)将“子集”视为对分区的双重分区,我认为这是有效的。然后,从另一个意义上讲,您将“分解”视为分区双重的。这些都是有效的,但也许值得谈论两种双重性。

斯科特:是的。I think what’s going on there is that there are two ways to view a partition. You can view a partition as “that which is dual to a subset,” and you can also view a partition as something that is built up out of subsets. These two different views do different things when you dualize.

Ramana Kumar:I was just going to check: You said you can start with an arbitrary \(B\) and then build the \(S\) from it. It can be literally any set, and then there’s always an \(S\)…


Ramana Kumar:一种h. So the \(S\) in that case will just be tuples.


Brendan Fong:Scott, given a set, I find it very easy to come up with partitions. But I find it less easy to come up with factorizations. Do you have any tricks for…?


Joseph Hirsh:Can I ask one more thing before you do that? You allow factors to have one element in them?


Joseph Hirsh:“Nontrivial” means “not have one element, and not have no elements”?


I’m now going to move on to some examples.

(第1部分,示例)···Enumerating Factorizations

Exercise! What are the factorizations of the set \(\{0,1,2,3\}\) ?

Spoiler space:


\(\begin{split} \{ \ \ \{ \{0\},\{1\},\{2\},\{3\} \} \ \ \} \end{split} \begin{split} \ \ \ \ \underline{\ 0 \ \ \ 1 \ \ \ 2 \ \ \ 3 \ } \end{split}\)


Recall that in the definition of factorization, we wanted that for each way of choosing one part from each factor, we had a unique element in the intersection of those parts. Since we only have one factor here, satisfying the definition just requires that for each way of choosing one part from the discrete partition, there exists a unique element that is in that part.

然后我们想要一些琐碎的分解。In order to have a factorization, we’re going to need some partitions. And the product of the cardinalities of our partitions are going to have to equal the cardinality of our set \(S\) , which is 4.

表达4作为非平凡产品的唯一方法是将其表示为\(2 \ times 2 \)。因此,我们正在寻找具有2个因素的因素化,每个因素都有2个部分。


So if I’m going to have a factorization of \(\{0,1,2,3\}\) that isn’t this trivial one, I’m going to have to pick 2 partitions of my 4-element set that each break the set into 2 parts of size 2. And there are 3 partitions of a 4-element sets that break it up into 2 parts of size 2. For each way of choosing a pair of these 3 partitions, I’m going to get a factorization.

\(\begin{split} \begin{Bmatrix} \ \ \ \{\{0,1\}, \{2,3\}\}, \ \\ \{\{0,2\}, \{1,3\}\} \end{Bmatrix} \end{split} \begin{split} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \begin{array} { |c|c|c|c| } \hline 0 & 1 \\ \hline 2 & 3 \\ \hline \end{array} \end{split}\)

\(\ begin {split} \ begin {bmatrix} \ \ \ \ \ {\ {\ {0,1 \},\ {2,3 \} \},\ \ \ \ \ \ {\ {\ {0,3 \}1,,,,2\}\} \end{Bmatrix} \end{split} \begin{split} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \begin{array} { |c|c|c|c| } \hline 0 & 1 \\ \hline 3 & 2 \\ \hline \end{array} \end{split}\)

\(\begin{split} \begin{Bmatrix} \ \ \ \{\{0,2\}, \{1,3\}\}, \ \\ \{\{0,3\}, \{1,2\}\} \end{Bmatrix} \end{split} \begin{split} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \begin{array} { |c|c|c|c| } \hline 0 & 2 \\ \hline 3 & 1 \\ \hline \end{array} \end{split}\)

So there will be 4 factorizations of a 4-element set.

In general you can ask, “How many factorizations are there of a finite set of size \(n\) ?”. Here’s a little chart showing the answer for \(n \leq 25\):

\(| s | \) \(| \ mathrm {fact}(s)| \)
0 1
1 1
2 1
3 1
4 4
5 1
6 61
7 1
8 1681年
9 5041
10 15121
11 1
12 13638241
13 1
14 8648641
15 1816214401
16 181880899201
17 1
18 45951781075201
19 1
20 3379365788198401
21 1689515283456001
22 14079294028801
23 1
24 4454857103544668620801
25 538583682060103680001

You’ll notice that if \(n\) is prime, there will be a single factorization, which hopefully makes sense. This is the factorization that only has one factor.

在我看来,一个非常令人惊讶的事实是,这个序列没有出现OEIS,,,,which is this database that combinatorialists use to check whether or not their sequence has been studied before, and to see connections to other sequences.

对我来说,这感觉就像Bell numbers。钟数计算了有多少个大小\(n \)的分区。OEI的序列编号为110,超过300,000;而且,即使我对其进行调整并删除了堕落的案例,依此类推,这个序列根本不会出现。



Ramana Kumar:If you’re willing to do it, I’d appreciate just stepping through one of the examples of the factorizations and the definition, because this is pretty new to me.

斯科特:是的。Let’s go through the first nontrivial factorization of \(\{0,1,2,3\}\):

\(\begin{split} \begin{Bmatrix} \ \ \ \{\{0,1\}, \{2,3\}\}, \ \\ \{\{0,2\}, \{1,3\}\} \end{Bmatrix} \end{split} \begin{split} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \begin{array} { |c|c|c|c| } \hline 0 & 1 \\ \hline 2 & 3 \\ \hline \end{array} \end{split}\)

In the definition, I said a factorization should be a set of partitions such that for each way of choosing one part from each of the partitions, there will be a unique element in the intersection of those parts.

在这里,我有一个分区,将小数字与大数字分开:\(\ {\ {0,1 \},\ {2,3 \} \} \)。而且我还有一个分区,将偶数数字与奇数数分开:\(\ {\ {0,2 \},\ {1,3 \} \} \)。

一种nd the point is that for each way of choosing either “small” or “large” and also choosing “even” or “odd”, there will be a unique element of \(S\) that is the conjunction of these two choices.

In the other two nontrivial factorizations, I replace either “small and large” or “even and odd” with an “inner and outer” distinction.

David Spivak:对于分区和许多事情,如果我知道集合\(a \)的分区和集合\(b \)的分区,那么我知道\(a+b \)的一些分区(不相关联合会)或者我知道\(a \ times b \)的一些分区。您知道有什么事实以进行因素化吗?

斯科特:是的。If I have two factored sets, I can get a factored set over their product, which sort of disjoint-unions the two collections of factors. For the additive thing, you’re not going to get anything like that because prime sets don’t have any nontrivial factorizations.


(第2部分,标题幻灯片)···这Main Talk (It’s About Time)

((Part 2, Motivation) · · ·这Pearlian Paradigm



Given a collection of variables and a joint probability distribution over those variables, Pearl can infer causal/temporal relationships between the variables. (In this talk I’m going to use “causal” and “temporal” interchangeably, though there may be more interesting things to say here philosophically.)

Pearl can infer temporal data from statistical data, which is going against the adage that “correlation does not imply causation.” It’s like Pearl is taking the combinatorial structure of your correlation and using that to infer causation, which I think is just really great.

Ramana Kumar:I may be wrong, but I think this is false. Or I think that that’s not all Pearl needs—just the joint distribution over the variables. Doesn’t he also make use of intervention distributions?

斯科特:In the theory that is described in chapter two of the book因果关系,他并没有真正使用其他东西。珍珠在其他地方建立了这个更大的理论。但是,您具有很强的能力,也许假设简单性或其他任何东西(但没有假设您可以访问额外的信息),可以收集变量和对这些变量的联合分布,并从相关性中推断出因果关系。

安德鲁·克里奇(Andrew Critch):Ramana, it depends a lot on the structure of the underlying causal graph. For some causal graphs, you can actually recover them uniquely with no interventions. And only assumptions with zero-measure exceptions are needed, which is really strong.

Ramana Kumar:Right, but then the information you’re using is the graph.

安德鲁·克里奇(Andrew Critch):No, you’re not. Just the joint distribution.

Ramana Kumar:哦好的。抱歉,继续。

安德鲁·克里奇(Andrew Critch):这re exist causal graphs with the property that if nature is generated by that graph and you don’t know it, and then you look at the joint distribution, you will infer with probability 1 that nature was generated by that graph, without having done any interventions.

Ramana Kumar:Got it. That makes sense. Thanks.


不过,我要(一点)反对这一点。我要声称那个珍珠kind of cheating when making this inference. The thing I want to point out is that in the sentence “Given a collection of variables and a joint probability distribution over those variables, Pearl can infer causal/temporal relationships between the variables.”, the words “Given a collection of variables” are actually hiding a lot of the work.

重点通常放在联合probability distribution, but Pearl is not inferring temporal data from statistical data alone. He is inferring temporal data from statistical data和分解数据:如何将世界分解为这些变量。


“Well, what if I take the variables that I’m given in a Pearlian problem and I just forget that structure? I can just take the product of all of these variables that I’m given, and consider the space of all partitions on that product of variables that I’m given; and each one of those partitions will be its own variable. And then I can try to do Pearlian causal inference on this big set of all the variables that I get by forgetting the structure of variables that were given to me.”

问题在于,当您这样做时,您会有很多彼此确定的功能,并且实际上无法使用Pearlian Paradigm推断出东西。

So in my view, this cheating is very entangled with the fact that Pearl’s paradigm isn’t great for handling abstraction and determinism.

((Part 2, Table of Contents) · · ·We Can Do Better


Where Pearl was given a collection of variables, we are going to just consider all partitions of a given set. Where Pearl infers a directed acyclic graph, we’re going to infer a finite factored set.

In the Pearlian world, we can look at the graph and read off properties of time and orthogonality/independence. A directed path between nodes corresponds to one node being before the other, and two nodes are independent if they have no common ancestor. Similarly, in our world, we will be able to read time and orthogonality off of a finite factored set.

((Orthogonality and independence are pretty similar. I’ll use the word “orthogonality” when I’m talking about a combinatorial notion, and I’ll use “independence” when I’m talking about a probabilistic notion.)

In the Pearlian world,d- 您可以从图表中读取的分离,对应于可以放在图表上的所有概率分布中的条件独立性。我们将拥有一个基本的定理,它基本上是同一件事:条件正交性对应于我们可以将其放在各种概率分布中的条件独立性。

In the Pearlian world,d- 分离将满足组成图型公理。在我们的世界中,我们将满足构图半径公理。我声称您首先不应该想要的第五章图形公理。

珍珠做因果推断。我们将讨论如何使用这种新范式进行时间推理,并推断出珍珠方法无法做到的一些非常基本的时间事实。(请注意,珍珠有时也可以推断出时间关系we不能 - 但仅从我们的角度来看,珍珠正在做出其他分解假设。)

一种nd then we’ll talk about a bunch of applications.

Pearl 这个谈话
一种Given Collection of Variables 给定集的所有分区
定向无环图 有限的委托集
Directed Path Between Nodes “Time”
没有共同的祖先 “Orthogonality”
d-Separation “有条件的正交性”
Compositional Graphoid Compositional Semigraphoid
d- 分离↔有条件独立性 基本定理
Causal Inference Temporal Inference
Many Many Applications Many Many Applications

Excluding the motivation, table of contents, and example sections, this table also serves as an outline of the two talks. We’ve already talked about set partitions and finite factored sets, so now we’re going to talk about time and orthogonality.

((Part 2, Main Body) · · ·Time and Orthogonality

I think that if you capture one definition from this second part of the talk, it should be this one. Given a finite factored set as context, we’re going to define the history of a partition.

令\(f =(s,b)\)为有限的分组集。并让\(x,y \ in \ mathrm {part}(s)\)为\(s \)的分区。

history\(x \)的书面\(h^f(x)\)是最小的因素\(h \ subseteq b \),因此,如果\(s,t \ in s \),则(s \ sim_b t \)对于所有\(b \ in H \),然后\(s \ sim_x t \)。

那么,\(x \)的历史是最小的因素\(h \) - 因此,是\(b \)的最小子集 - 如果我采用\(s \)的元素,并且向您隐藏它,您想知道它在\(x \)中的哪一部分,这足以告诉您\(H \)中的每个因素中的哪一部分。

So the history \(H\) is a set of factors of \(S\) , and knowing the values of all the factors in \(H\) is sufficient to know the value of \(X\) , or to know which part in \(X\) a given element is going to be in. I’ll give an example soon that will maybe make this a little more clear.

然后我们要定义timefrom history. We’ll say that \(X\) isweakly before\(y \),书面\(x \ leq^f y \),如果\(h^f(x)\ subseteq h^f(y)\)。我们会说\(x \)是strictly before\(Y\), written \(X<^F Y\), if \(h^F(X)\subset h^F(Y)\).


We’re also going to define orthogonality from history. We’ll say that two partitions \(X\) and \(Y\) areorthogonal,书面\(x \ perp^fy \),如果他们的历史是不相交的:\(h^f(x)\ cap h^f(y)= \ {\ {\} \)。

Now I’m going to go through an example.

((Part 2, Examples) · · ·Game of Life

Let \(S\) be the set of all Game of Life computations starting from an \([-n,n]\times[-n,n]\) board.

LET \(r = \ {(r,c,t)\ in \ mathbb {z}^3 \ mid0 \ leq t \ leq t \ leq n,\ \ \ \ \)n-t \} \)(即,可以从初始\([ - n,n] \ times [-n,n] \)板上计算的单元格。对于\((r,c,t)\ in r \),让\(\ ell(r,c,t)\ subseteq s \)是所有计算的集合,使得row \(r \)的单元格(r \)和列\(c \)在时间\(t \)中还活着。

((Minor footnote: I’ve done some small tricks here in order to deal with the fact that the Game of Life is normally played on an infinite board. We want to deal with the finite case, and we don’t want to worry about boundary conditions, so we’re only going to look at the cells that are uniquely determined by the initial board. This means that the board will shrink over time, but this won’t matter for our example.)

\(s \)是生命计算的所有游戏集合,但是由于生命游戏是确定性的,因此所有计算的集合都与所有初始条件的集合中均属于Berijective对应关系。\(| s | = 2^{(2n+1)^{2}} \),初始板的数量。

This also gives us a nice factorization on the set of all Game of Life computations. For each cell, there’s a partition that separates out the Game of Life computations in which that cell is alive at time 0 from the ones where it’s dead at time 0. Our factorization, then, will be a set of \((2n+1)^{2}\) binary factors, one for each question of “Was this cell alive or dead at time 0?”.

形式上:for \(((r,c,t)\ in r \),让\(l _ {(r,c,t)} = \ {\ ell(r,c,t),s \ setMinus \ ell(r,c,t)\} \)。令\(f =(s,b)\),其中\(b = \ {l _ {(r,c,0)} \ mid -n \ leq r,c \ leq n \} \)。

我们可以谈论的所有生命计算游戏中还将有其他分区。例如,您可以拿一个单元格和一个时间\(t \)说:“这个单元格在时间\(t \)?at time \(t\) from the computations where it’s dead at time \(t\).



上板上的蓝色,绿色和红色正方形是(时间)对。每个广场都对应于所有生命计算游戏集的分区:“该单元格是在给定时间\(t \)的生存还是死亡?”

这history of that partition is going to be all the cells in the initial board that go into computing whether the cell is alive or dead at time \(t\) . It’s everything involved in figuring out that cell’s state. E.g., knowing the state of the nine light-red cells in the initial board always tells you the state of the red cell in the second board.


Meanwhile, the question of whether the red cell is alive or dead is going to beorthogonal到这题of whether the green cell is alive or dead.

一种nd the question of whether the blue cell is alive or dead isnot与绿细胞是活着还是死亡的问题是正交的,因为它们在青色细胞上相交。

概括点,fix \(x = l _ {(r_x,c_x,t_x)},y = l _ {(r_y,c_y,t_y)} \),其中\(r_x,c_x,t_x),(r_y,c_y,c_y,c_y,c_y,c_y,t_y)\ in r \)。然后:

  • \(h^{F}(X)=\{L_{(r,c,0)}\in B\mid |r_X-r|\leq t_X,|c_X-c|\leq t_X\}\).
  • \(X \ ᐸ^{F} \ Y\) if and only if \(t_X \ ᐸ \ t_Y\) and \(|r_Y-r_X|,|c_Y-c_X|\leq t_Y-t_X\).
  • \(x \ perp^f y \)if and&hork if \(| r_y-r_x |> t_y+t_x \)或\(| c_y-c_x |> t_y+t_x \)。


David Spivak:一种priori, that would be a gigantic computation—to be able to tell me that you understand the factorization structure of that Game of Life. So what intuition are you using to be able to make that claim, that it has the kind of factorization structure you’re implying there?

斯科特:So, I’ve defined the factorization structure.

David Spivak:您已经给了我们一定的分解。因此,您对您有很好的直觉history,,,,I guess. Maybe that’s what I’m asking about.



David Spivak:好的。我认为从定义中得出这种直觉是我缺少的,但是我不知道我们是否有时间经历。

斯科特:Yeah, I think I’m not going to here.

((Part 2, Main Body) · · ·Conditional Orthogonality

因此,只是为了设定您的期望:每次我向某人解释珍珠因果的推论时,他们都会说d-separation is the thing they can’t remember.d- 分离是一个比“节点之间的定向路径”和“没有任何共同祖先的节点”的概念要复杂得多。同样,有条件的正交性将比我们的范式中的时间和正交性复杂得多。尽管我确实认为有条件的正交性具有比d-separation.

We’ll begin with the definition of conditional history. We again have a fixed finite set as our context. Let \(F=(S,B)\) be a finite factored set, let \(X,Y,Z\in\text{Part}(S)\), and let \(E\subseteq S\).

\(x \)给定\(e \)的条件历史,书面\(h^f(x | e)\)是满足以下两个条件的最小因子\(h \ subseteq b \)的最小因素集:

  • 对于所有\(s,t \ in e \),如果\(s \ sim_ {b} t \)对于所有\(b \ in H \),则\(s \ sim_x t \)。
  • 对于所有\(s,t \ in e \)和\(r \ in s \),如果\(r \ sim__ {b_0} s \)for all \(b_0 \ in H \)和\(r \ sim_{b_1} t \)for ash as ash \(b_1 \ in B \ setminus h \),然后\(r \ in e \)。

第一个条件是很像的条件had in our definition of history, except we’re going to make the assumption that we’re in \(E\). So the first condition is: if all you know about an object is that it’s in \(E\), and you want to know which part it’s in within \(X\), it suffices for me to tell you which part it’s in within each factor in the history \(H\).

我们的第二个条件实际上并不是要提及\(x \)。这将是\(e \)和\(h \)之间的关系。它说,如果您想弄清楚(s \)的元素是否在\(e \)中,则足以并行化并提出两个问题:

  • “If I only look at the values of the factors in \(H\), is ‘this point is in \(E\)’ compatible with that information?”
  • “If I only look at the values of the factors in \(B\setminus H\) , is ‘this point is in \(E\)’ compatible with that information?”

If both of these questions return “yes”, then the point has to be in \(E\).



We’re going to use conditional history to defineconditional orthogonality,,,,just like we used history to define orthogonality. We say that \(X\) and \(Y\) are给出正交\(e \ subseteq s \),书面\(x \ perp^{f} y \ mid e \),如果\(x \)给定\(e \)的历史与\的历史不相交(y(y)\)给定\(e \):\(h^f(x | e)\ cap h^f(y | e)= \ {\} \)。

We say \(X\) and \(Y\) are给出正交\(z \ in \ text {part}(s)\),书面\(x \ perp^{f} y \ mid z \),如果\(x \ perp^{f} y mid z \)全\(z \ in z \)。因此,考虑到分区的每种单独的方式,该分区可能是该分区中的每个单个部分,就意味着成为正交的是正交的。


((Part 2, Main Body) · · ·组成半径公理

有条件的正交满足组成半径公理,,,,which means finite factored sets are pretty well-behaved.

令\(f =(s,b)\)为有限的委托集,让\(x,y,z,w \ in \ text {part}(s)\)为\(s \)的分区。然后:

  • If \(X \perp^{F} Y \mid Z\), then \(Y \perp^{F} X \mid Z\). (symmetry
  • 如果\ \(x \ perp^{f}(y \ vee_s w)\ mid z \),则\(x \ perp^{f} y \ mid z \)和\(x \ perp^{f} w \ \ perp^{f}中部Z \)。((decomposition
  • If \(X \perp^{F} (Y\vee_S W) \mid Z\), then \(X \perp^{F} Y \mid (Z\vee_S W)\). (weak union
  • if \(x \ perp^{f} y \ mid z \)和\(x \ perp^{f} w \ mid(z \ vee_s y)\),然后\(x \ perp^{f}(y\ vee_s w)\中Z \)。((contraction
  • if \(x \ perp^{f} y \ mid z \)和if \(x \ perp^{f} w \ mid z \),然后\(x \ perp^{f}(y \ vee_s w)\中Z \)。((作品

这first four properties here make up the semigraphoid axioms, slightly modified because I’m working with partitions rather than sets of variables, so union is replaced with common refinement. There’s another graphoid axiom which we’re not going to satisfy; but I argue that we don’t want to satisfy it, because it doesn’t play well with determinism.


Decomposition and composition act like converses of each other. Together, conditioning on \(Z\) throughout, they say that \(X\) is orthogonal to both \(Y\) and \(W\) if and only if \(X\) is orthogonal to the common refinement of \(Y\) and \(W\).

((Part 2, Main Body) · · ·基本定理

除了表现良好之外,我还想证明有条件的正交性非常强大。我想做的方式是通过显示条件正交性与有限分布的所有概率分布中的条件独立性完全相对应。因此,很像d- 在珍珠图中分离,有条件的正交性可以视为概率独立性的组合版本。

一种probability distribution on a finite factored set\(F=(S,B)\) is a probability distribution \(P\) on \(S\) that can be thought of as coming from a bunch of independent probability distributions on each of the factors in \(B\). So \(P(s)=\prod_{b\in B}P([s]_b)\) for all \(s\in S\).


fundamental theorem of finite factored sets说:让\(f =(s,b)\)为有限的委托集,让\(x,y,z \ in \ text {part}(s)\)为\(s \)的分区。然后,\(x \ perp^{f} y \ mid z \),并且仅在\(f \)上所有概率分布\(p \),以及all \(x \ in x \),\(y(y)\ in y \)和\(z \ in z \),我们有\(p(x \ cap z)\ cdot p(y \ cap z)= p(x \ cap y \ cap z)\ cdot p(z)\)。即,如果在所有概率分布中满足\(z \)给定的\(z \),则与\(y \)之间是正交的。在所有概率分布中。

This theorem, for me, was a little nontrivial to prove. I had to go through defining certain polynomials associated with the subsets, and then dealing with unique factorization in the space of these polynomials; I think the proof was eight pages or something.

这fundamental theorem allows us to infer orthogonality data from probabilistic data. If I have some empirical distribution, or I have some Bayesian distribution, I can use that to infer some orthogonality data. (We could also imagine orthogonality data coming from other sources.) And then we can use this orthogonality data to get temporal data.

So next, we’re going to talk about how to get temporal data from orthogonality data.

((Part 2, Main Body) · · ·Temporal Inference

我们将从有限集\(\ omega \)开始,这是我们的样本空间。

您可能会认为我们会尝试做的一件天真的事情是推断\(\ omega \)的分解。我们不会这样做,因为那将太过限制。我们要允许\(\ omega \)向我们隐藏一些信息,以便有一些潜在的结构等。

这re may be some situations that are distinct without being distinct in \(\Omega\). So instead, we’re going to infer a factored set model of \(\Omega\): some other set \(S\), and a factorization of \(S\), and a function from \(S\) to \(\Omega\).

一种model of \(\Omega\) is a pair \((F, f)\), where \(F=(S,B)\) is a finite factored set and \(f:S\rightarrow \Omega\). ( \(f\) need not be injective or surjective.)

然后,如果我有一个\(\ omega \)的分区,则可以在\(f \)上向后发送此分区,并获得\(s \)的唯一分区。如果\(x \ in \ text {parts}(\ omega)\),则\(f^{ - 1}(x)(x)\ in \ text {parts}(s)\)由\(s \ sim_{f^{ - 1}(x)} t \ leftrightArrow f(s)\ sim_x f(t)\)。

那么,我们要做的就是采取有关\(\ omega \)的一堆正交性事实,我们将尝试找到一个捕获正交性事实的模型。

我们将给出一个orthogonality database在\(\ omega \)上,这是一对(d =(o,n)\),其中\(o \)(对于“正交”)和\(n \)(对于“非正交”)是\(\ omega \)的分区的每组\((x,y,z)\)。我们将这些视为关于正交性的规则。

对于模型\((f,f)\)满足数据库\(d \)的含义是:

  • \(f^{ - 1}(x)\ perp^{f} f^{ - 1}(y)\ mid f^{ - 1}(z)\)e时在o \)和
  • \(\) \(\lnot (f^{-1}(X) \perp^{F} f^{-1}(Y) \mid f^{-1}(Z))\) whenever \((X,Y,Z)\in N\).

So we have these orthogonality rules we want to satisfy, and we want to consider the space of all models that are consistent with these rules. And even though there will always be infinitely many models that are consistent with my database, if at least one is—you can always just add more information that you then delete with \(f\)—we would like to be able to sometimes infer that for all models that satisfy our database, \(f^{-1}(X)\) is before \(f^{-1}(Y)\).

一种nd this is what we’re going to mean by inferring time. If all of our models \((F,f)\) that are consistent with the database \(D\) satisfy some claim about time \(f^{-1}(X) \ ᐸ^F \ f^{-1}(Y)\), we’ll say that \(X \ ᐸ_D \ Y\).

((Part 2, Examples) · · ·Two Binary Variables (Pearl)

So we’ve set up this nice combinatorial notion of temporal inference. The obvious next questions are:

  • Can we actually infer interesting facts using this method, or is it vacuous?
  • 一种nd: How does this framework compare to Pearlian temporal inference?


To address that question, we’ll go to an example. Let \(X\) and \(Y\) be two binary variables. Pearl asks: “Are \(X\) and \(Y\) independent?” If yes, then there’s no path between the two. If no, then there may be a path from \(X\) to \(Y\), or from \(Y\) to \(X\), or from a third variable to both \(X\) and \(Y\).



However, I claim that this Pearlian ontology in which you’re handed this collection of variables has blinded us to the obvious next question, which is: is \(X\) independent of \(X \ \mathrm{XOR} \ Y\)?

In the Pearlian world, \(X\) and \(Y\) were our variables, and \(X \ \mathrm{XOR} \ Y\) is just some random operation on those variables. In our world, \(X \ \mathrm{XOR} \ Y\) instead is a variable on the same footing as \(X\) and \(Y\). The first thing I do with my variables \(X\) and \(Y\) is that I take the product \(X \times Y\) and then I forget the labels \(X\) and \(Y\).

So there’s this question, “Is \(X\) independent of \(X \ \mathrm{XOR} \ Y\)?”. And if \(X\)独立于\ \(x \ \ mathrm {xor} \ y \),实际上我们将能够得出结论,\(x \)是before\(Y\)!

So not only is the finite factored set paradigm non-vacuous, and not only is it going to be able to keep up with Pearl and infer things Pearl can’t, but it’s going to be able to infer a temporal relationship from only two variables.

So let’s go through the proof of that.

((Part 2, Examples) · · ·两个二进制变量(分组集)

Let \(\Omega=\{00,01,10,11\}\), and let \(X\), \(Y\), and \(Z\) be the partitions (/questions):

  • \(x = \ {\ {00,01 \},\ {10,11 \} \} \)。(第一个是什么?)
  • \(Y=\{\{00,10\}, \{01,11\}\}\). (What is the second bit?)
  • \(Z=\{\{00,11\}, \{01,10\}\}\). (Do the bits match?)

令\(d =(o,n)\),其中\(o = \ {(x,z,z,\ {\ omega \})\} \)和\(n = \ {(z,z,z,z,\ {\ omega \})\} \)。如果我们从概率分布中获得了这个正交性数据库,那么我们将拥有超过两个规则,因为我们会观察到更多的正交性和非正交性。但是,时间推断对于添加更多规则是单调的,因此我们可以处理证明所需的最小规则。

第一个规则说\(x \)与\(z \)正交。第二个规则说\(z \)不是本身正交的,这基本上只是说\(z \)是非确定性的。可以说,\(z \)中的两个部分都是在函数\(f \)下支持的。\(\ {\ omega \} \)表示我们没有任何条件。

From this, we’ll be able to prove that \(X \ ᐸ_D \ Y\).

证明。First, we’ll show that that \(X\) is weakly before \(Y\). Let \((F,f)\) satisfy \(D\). Let \(H_X\) be shorthand for \(h^F(f^{-1}(X))\), and likewise let \(H_Y=h^F(f^{-1}(Y))\) and \(H_Z=h^F(f^{-1}(Z))\).

由于\((x,z,\ {\ omega \})\ in o \),因此我们有那个\(h_x \ cap h_z = \ {\ {\} \);并且由于\((z,z,\ {\ omega \})\在n \),所以我们有那个\(h_z \ neq \ {\} \)。

Since \(X\leq_{\Omega} Y\vee_{\Omega} Z\)—that is, since \(X\) can be computed from \(Y\) together with \(Z\)—\(H_X\subseteq H_Y\cup H_Z\). (Because a partition’s history is the smallest set of factors needed to compute that partition.)

一种nd since \(H_X\cap H_Z=\{\}\), this implies \(H_X\subseteq H_Y\), so \(X\) is weakly before \(Y\).

为了显示严格的不平等,我们将假设出于矛盾的目的,即(h_x \)= \(h_y \)。

请注意,可以与\(y \)一起从\(x \)一起计算\(z \) - 也就是说,\(z \ leq _ {\ omega} x \ vee _ {\ omega} y \) - 因此(h_z \ subseteq H_x \ cup h_y \)(即\(H_z \ subseteq H_x \))。\(h_z =(H_x \ Cup H_Y)\ CAP H_z = H_X \ CAP H_z \)。但是由于\(h_z \)也与\(h_x \)不相交,因此这意味着\(h_z = \ {\} \),矛盾。

Thus \(H_X\neq H_Y\), so \(H_X \subset H_Y\), so \(f^{-1}(X) \ ᐸ^F \ f^{-1}(Y)\), so \(X \ ᐸ_D \ Y\). □

When I’m doing temporal inference using finite factored sets, I largely have proofs that look like this. We collect some facts about emptiness or non-emptiness of various Boolean combinations of histories of variables, and we use these to conclude more facts about histories of variables being subsets of each other.

I have a more complicated example that uses conditional orthogonality, not just orthogonality; I’m not going to go over it here.

我想在这里提出的一个有趣的观点是,我们正在做时间推断 - 我们正在推断\(x \)在\(y \)之前 - 但我声称我们也在做概念上的推断。

Imagine that I had a bit, and it’s either a 0 or a 1, and it’s either blue or green. And these two facts are primitive and independently generated. And I also have this other concept that’s like, “Is it grue or bleen?”, which is the \(\mathrm{XOR}\) of blue/green and 0/1.

从某种意义上说,我们要推断\(x \)在\(y \)之前,在这种情况下,我们可以推断出蓝度在gruense之前。这指出了一个事实,即蓝色更为原始,而Grueness是一种派生的属性。

在我们的证明中,可以将这些原始属性视为\(x \)和\(z \),而\(y \)是我们从中获得的派生属性。因此,我们不仅在推断时间;我们正在推断出关于什么是自然概念的事实。而且我认为,这个本体论可以为“您无法真正区分蓝色和Grue”的说法做一些希望,而Pearl可以对“相关性并不意味着因果关系”做些什么。

((Part 2, Main Body) · · ·申请 /未来工作 /猜测


Research topics related to inference:

  • Decidability of Temporal Inference
  • 有效的时间推断
  • Conceptual Inference
  • Temporal Inference from Raw Data and Fewer Ontological Assumptions
  • 与确定关系的时间推断
  • 没有正交的时间
  • Conditioned Factored Sets

这re are a lot of research directions suggested by questions like “How do we do efficient inference in this paradigm?”. Some of the questions here come from the fact that we’re making fewer assumptions than Pearl, and are in some sense more coming from the raw data.


  • Extending Definitions to the Infinite Case
  • 有限维的基本定理集合
  • 连续的时间
  • New Lens on Physics


我怀疑可以将基本定理扩展到有限维的分组集(即,\(| b | \)是有限的,但不能将其扩展到任意维定的方面集合。

一种nd then, what I’m really excited about is applications to embedded agency:

  • 嵌入式观测
  • Counterfactability
  • Cartesian FramesSuccessor
  • UnravelingCausal Loops
  • Conditional Time
  • Logical Causality from Logical Induction
  • Orthogonality as Simplifying Assumptions for Decisions
  • 有条件的正交性作为抽象desideratum

I focused on the temporal inference aspect of finite factored sets in this talk, because it’s concrete and tangible to be able to say, “Ah, we can do Pearlian temporal inference, only we can sometimes infer more structure and we rely on fewer assumptions.”



I want to build up the factored set ontology as an alternative to graphs when modeling agents interacting with things, or when modeling information flow. And I’m really excited about that direction.