# BLOG: Probabilistic Models with Unknown Objects

←

→

**Page content transcription**

If your browser does not render page correctly, please read the page content below

B LOG: Probabilistic Models with Unknown Objects∗ Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong and Andrey Kolobov Computer Science Division University of California Berkeley, CA 94720-1776 {milch, bhaskara, russell, dsontag, dlong, karaya1}@cs.berkeley.edu Abstract that explicitly represent objects and the relations between them. However, most FOPLs only deal with fixed sets of ob- This paper introduces and illustrates B LOG, a formal lan- jects, or deal with unknown objects in limited and ad hoc guage for defining probability models over worlds with ways. This paper introduces B LOG (Bayesian LOGic), a unknown objects and identity uncertainty. B LOG unifies and extends several existing approaches. Subject to cer- compact and intuitive language for defining probability dis- tain acyclicity constraints, every B LOG model specifies tributions over outcomes with varying sets of objects. a unique probability distribution over first-order model We begin in Sec. 2 with three example problems, each of structures that can contain varying and unbounded num- which involves possible worlds with varying object sets and bers of objects. Furthermore, complete inference algo- identity uncertainty. We describe generative processes that rithms exist for a large fragment of the language. We produce such worlds, and give the corresponding B LOG mod- also introduce a probabilistic form of Skolemization for els. Sec. 3 observes that these possible worlds are naturally handling evidence. viewed as model structures of first-order logic. It then defines precisely the set of possible worlds corresponding to a B LOG model. The key idea is a generative process that constructs a 1 Introduction world by adding objects whose existence and properties de- Human beings and AI systems must convert sensory input pend on those of objects already created. In such a process, into some understanding of what’s out there and what’s going the existence of objects may be governed by many random on in the world. That is, they must make inferences about variables, not just a single population size variable. Sec. 4 the objects and events that underlie their observations. No discusses how a B LOG model specifies a probability distribu- pre-specified list of objects is given; the agent must infer the tion over possible worlds. existence of objects that were not known initially to exist. Sec. 5 solves a previously unnoticed “probabilistic In many AI systems, this problem of unknown objects is Skolemization” problem: how to specify evidence about engineered away or resolved in a preprocessing step. How- objects—such as radar blips—that one didn’t know existed. ever, there are important applications where the problem is Finally, Sec. 6 briefly discusses inference in unbounded out- unavoidable. Population estimation, for example, involves come spaces, stating a sampling algorithm and a complete- counting a population by sampling from it randomly and mea- ness theorem for a large class of B LOG models, and giving suring how often the same object is resampled; this would experimental results on one particular model. be pointless if the set of objects were known in advance. Record linkage, a task undertaken by an industry of more 2 Examples than 300 companies, involves matching entries across multi- ple databases. These companies exist because of uncertainty In this section we examine three typical scenarios with un- about the mapping from observations to underlying objects. known objects—simplified versions of the population estima- Finally, multi-target tracking systems perform data associa- tion, record linkage, and multitarget tracking problems men- tion, connecting, say, radar blips to hypothesized aircraft. tioned above. In each case, we provide a short B LOG model Probability models for such tasks are not new: Bayesian that, when combined with a suitable inference engine, consti- models for data association have been used since the tutes a working solution for the problem in question. 1960s [Sittler, 1964]. The models are written in English and Example 1. An urn contains an unknown number of balls— mathematical notation and converted by hand into special- say, a number chosen from a Poisson distribution. Balls are purpose code. In recent years, formal representation lan- equally likely to be blue or green. We draw some balls from guages such as graphical models [Pearl, 1988] have led to the urn, observing the color of each and replacing it. We general inference algorithms, more sophisticated models, and cannot tell two identically colored balls apart; furthermore, automated model selection (structure learning). In Sec. 7, we observed colors are wrong with probability 0.2. How many review several first-order probabilistic languages (FOPLs) balls are in the urn? Was the same ball drawn twice? ∗ This work was supported by DARPA under award 03-000219, The B LOG model for this problem, shown in Fig. 1, de- and by an NSF Graduate Research Fellowship to B. Milch. scribes a stochastic process for generating worlds. The first 4

1 type Color; type Ball; type Draw; 1 guaranteed Citation Cite1, Cite2, Cite3, Cite4; 2 random Color TrueColor(Ball); 2 #Researcher ∼ NumResearchersDistrib(); 3 random Ball BallDrawn(Draw); 3 #Publication: (Author) -> (r) ∼ NumPubsDistrib(); 4 random Color ObsColor(Draw); 4 Title(p) ∼ TitlePrior(); 5 guaranteed Color Blue, Green; 5 Name(r) ∼ NamePrior(); 6 guaranteed Draw Draw1, Draw2, Draw3, Draw4; 6 PubCited(c) ∼ Uniform(Publication p); 7 #Ball ∼ Poisson[6](); 7 TitleString(c) ∼ TitleObs(Title(PubCited(c))); 8 TrueColor(b) ∼ TabularCPD[[0.5, 0.5]](); 8 AuthorString(c) ∼ AuthorObs(Name(Author(PubCited(c)))); 9 BallDrawn(d) ∼ Uniform({Ball b}); 9 CitString(c) ∼ CitDistrib(TitleString(c),AuthorString(c)); 10 ObsColor(d) 11 if (BallDrawn(d) != null) then Figure 2: B LOG model for Ex. 2 with four observed citations 12 ∼ TabularCPD[[0.8, 0.2], [0.2, 0.8]] 13 (TrueColor(BallDrawn(d))); (type and function declarations are omitted). Figure 1: B LOG model for the urn-and-balls scenario of Ex. 1 with four draws. area. Then, for each time step t (starting at t = 0), choose the state (position and velocity) of each aircraft given its state at time t − 1. Also, for each aircraft a and time step t, possibly lines introduce the types of objects in these worlds—colors, generate a radar blip r with Source(r) = a and Time(r) = t. balls, and draws—and the functions that can be applied to Whether a blip is generated or not depends on the state of the these objects. Lines 5–7 specify what objects may exist in aircraft—thus the number of objects in the world depends on each world. In every world, the colors are blue and green and certain objects’ attributes. Also, at each step, generate some there are four draws; these are the guaranteed objects. On the false alarm blips r′ with Source(r′ ) = null. Finally, sample other hand, different worlds have different numbers of balls, the position for each blip given the true state of its source so the number of balls that exist is chosen from a prior—a aircraft (or using a default distribution for a false-alarm blip). Poisson with mean 6. Each ball is then given a color, as spec- ified on line 8. Properties of the four draws are filled in by 3 Syntax and Semantics: Possible Worlds choosing a ball (line 9) and an observed color for that ball (lines 10–13). The probability of the generated world is the 3.1 Outcomes as first-order model structures product of the probabilities of all the choices made. The possible outcomes for Ex. 1 through Ex. 3 are structures Example 2. We have a collection of citations that refer to containing many related objects, which we will treat formally publications in a certain field. What publications and re- as model structures of first-order logic. A model structure searchers exist, with what titles and names? Who wrote which provides interpretations for the symbols of a first-order lan- publication, and which publication does each citation refer guage. (Usually, first-order languages are described as having to? For simplicity, we just consider the title and author-name predicate, function, and constant symbols. For conciseness, strings in these citations, which are subject to errors of vari- we view all symbols as function symbols; predicates are just ous kinds, and we assume only single-author publications. functions that return a Boolean value, and constants are just zero-ary functions.) Fig. 2 shows a B LOG model for this example, based on For Ex. 1, the language has function symbols such as the model in [Pasula et al., 2003]. The B LOG model de- TrueColor(b) for the true color of ball b; BallDrawn(d) for the fines the following generative process. First, sample the to- ball drawn on draw d; and Draw1 for the first draw. (Statisti- tal number of researchers from some distribution; then, for cians might use indexed families of random variables such as each researcher r, sample the number of publications by that {TrueColorb }, but this is mainly a matter of taste.) researcher. Sample the publications’ titles and researchers’ names from appropriate prior distributions. Then, for each ci- tation, sample the publication cited by choosing uniformly at 1 type Aircraft; type Blip; random from the set of publications. Sample the title and au- 2 random R6Vector State(Aircraft, NaturalNum); thor strings used in each citation from string corruption mod- 3 random R3Vector ApparentPos(Blip); els conditioned on the true attributes of the cited publication; 4 nonrandom NaturalNum Pred(NaturalNum) = Predecessor; finally, generate the full citation string according to a format- 5 generating Aircraft Source(Blip); ting model. 6 generating NaturalNum Time(Blip); 7 #Aircraft ∼ NumAircraftDistrib(); Example 3. An unknown number of aircraft exist in some volume of airspace. An aircraft’s state (position and veloc- 8 State(a, t) 9 if t = 0 then ∼ InitState() ity) at each time step depends on its state at the previous time 10 else ∼ StateTransition(State(a, Pred(t))); step. We observe the area with radar: aircraft may appear as 11 #Blip: (Source, Time) -> (a, t) identical blips on a radar screen. Each blip gives the approxi- 12 ∼ DetectionDistrib(State(a, t)); mate position of the aircraft that generated it. However, some 13 #Blip: (BlipTime) -> (t) 14 ∼ NumFalseAlarmsDistrib(); blips may be false detections, and some aircraft may not be detected. What aircraft exist, and what are their trajectories? 15 ApparentPos(r) 16 if (BlipSource(r) = null) then ∼ FalseAlarmDistrib() Are there any aircraft that are not observed? 17 else ∼ ObsDistrib(State(Source(r), Time(r))); The B LOG model for this scenario (Fig. 3) describes the following process: first, sample the number of aircraft in the Figure 3: B LOG model for Ex. 3.

To eliminate meaningless random variables, we use a such processes, we first introduce generating function decla- typed, free logical language, in which every object has a type rations, such as lines 5–6 of Fig. 3. Unlike other functions, and in which functions may be partial. Each function sym- generating functions such as Source or Time have their val- bol f has a type signature (τ0 , . . . , τk ), where τ0 is the return ues set when an object is added. A generating function must type of f and τ1 , . . . , τk are the argument types. A partial take a single argument of some type τ (namely Blip in the function applied to arguments outside its domain returns the example); it is then called a τ -generating function. special value null, which is not of any type. Generative steps that add objects to the world are described The truth of any first-order sentence is determined by a by number statements. For instance, line 11 of Fig. 3 says model structure for the corresponding language. A model that for each aircraft a and time step t, the process adds some structure specifies the extension of each type and the inter- number of blips r such that Source(r) = a and Time(r) = t. pretation for each function symbol: In general, the beginning of a number statement has the Definition 1. A model structure ω of a typed, free, first-order form: #τ : (g1 , . . . , gk ) –> (x1 , . . . , xk ) where τ is a ω language consists of an extension [τ ] for each type τ , which type, g1 , . . . , gk are τ -generating functions, and x1 , . . . , xk may be an arbitrary set, and an interpretation [f ] for each ω are logical variables. (For types that are generated ab initio function symbol f . If f has return type τ0 and argument types with no generating functions, the () -> () is omitted, as ω ω ω τ1 , . . . , τk , then [f ] is a function from [τ1 ] × · · · × [τk ] to in Fig. 1.) The inclusion of a number statement means that ω [τ0 ] ∪ {null}. for each appropriately typed tuple of objects o1 , . . . , ok (the appropriate types are the return types of g1 , . . . , gk ), the gen- A B LOG model defines a probability distribution over a erative process adds some number (possibly zero) of objects particular set of model structures. In Ex. 1, identity uncer- ω q of type τ such that [gi ] (q) = oi for i = 1, . . . , k. ω tainty arises because [BallDrawn] (Draw1) might be equal Object generation can even be recursive: for instance, in ω to [BallDrawn] (Draw2) in one structure but not another. a model for phylogenetic trees, we could have a generating ω The set of balls, [Ball] , can also vary between structures. function Parent that takes a species and returns a species; 3.2 Outcomes with fixed object sets then we could model speciation events with a number state- ment that begins: #Species: (Parent) -> (p). B LOG models for fixed object sets have five kinds of state- We can also view number statements more declaratively: ments. A type declaration, such as the two statements on line 1 of Fig. 3, introduces a type. Certain types, namely Boolean, Definition 2. Let ω be a model structure of LM , and con- NaturalNum, Integer, String, Real, and RkVector (for each sider a number statement for type τ with generating functions ω k ≥ 2) are already provided. A random function declaration, g1 , . . . , gk . An object q ∈ [τ ] satisfies this number statement ω such as line 2 of Fig. 3, specifies the type signature for a func- applied to o1 , . . . , ok in ω if [gi ] (q) = oi for i = 1, . . . , k, ω tion symbol whose values will be chosen randomly in the gen- and [g] (q) = null for all other τ -generating functions g. erative process. A nonrandom function definition, such as the Note that if a number statement for type τ omits one of the one on line 4 of Fig. 3, introduces a function whose interpreta- τ -generating functions, then this function takes on the value tion is fixed in all possible worlds. In our implementation, the null for all objects satisfying that number statement. For in- interpretation is given by a Java class (Predecessor in this stance, Source is null for objects satisfying the false-detection example). A guaranteed object statement, such as line 5 in number statement on line 13 of Fig. 3. Also, a B LOG model Fig. 1, introduces and names some distinct objects that exist cannot contain two number statements with the same set of in all possible worlds. For the built-in types, the obvious sets generating functions. This ensures that each object o has ex- of guaranteed objects and constant symbols are predefined. actly one generation history, which can be found by tracing The set of guaranteed objects of type τ in B LOG model M is back the generating functions on o. denoted GM (τ ). Finally, for each random function symbol, The set of possible worlds ΩM is the set of model struc- a B LOG model includes a dependency statement specifying tures that can be constructed by M ’s generative process. To how values are chosen for that function. We postpone further complete the picture, we must explain not only how many discussion of dependency statements to Sec. 4. objects are added on each step, but also what these ob- The first four kinds of statements listed above define a par- jects are. It turns out to be convenient to define the gen- ticular typed first-order language LM for a model M . The erated objects as follows: when a number statement with set of possible worlds of M , denoted ΩM , consists of those type τ and generating functions g1 , . . . , gk is applied to model structures of LM where the extension of each type τ generating objects o1 , . . . , ok , the generated objects are tu- is GM (τ ), and all nonrandom function symbols (including ples {(τ, (g1 , o1 ), . . . , (gk , ok ), n) : n = 1, . . . , N }, where guaranteed constants) have their given interpretations. N is the number of objects generated. Thus in Ex. 3, For each random function f and tuple of appropriately the aircraft are pairs (Aircraft, 1), (Aircraft, 2), etc., and typed guaranteed objects o1 , . . . , ok , we can define a random the blips generated by aircraft are nested tuples such as ω variable (RV) Vf [o1 , . . . , ok ] (ω) , [f ] (o1 , . . . , ok ). The (Blip, (Source, (Aircraft, 2)), (Time, 8), 1). The tuple en- possible worlds are in one-to-one correspondence with full codes the object’s generation history; of course, it is purely instantiations of these basic RVs. Thus, a joint distribution internal to the semantics and remains invisible to the user. for the basic RVs defines a distribution over possible worlds. The set of all objects (nested tuples) of type τ that can be generated from the guaranteed objects via finitely many re- 3.3 Unknown objects cursive applications of number statements is called the uni- In general, a B LOG model defines a generative process in verse of τ , and denoted UM (τ ). In each possible world, the which objects are added iteratively to a world. To describe extension of τ is some subset of UM (τ ).

Definition 3. For a B LOG model M , the set of possible 4 Syntax and Semantics: Probabilities worlds ΩM is the set of model structures ω of LM such that: ω 1. for each type τ , GM (τ ) ⊆ [τ ] ⊆ UM (τ ); 4.1 Dependency statements 2. nonrandom functions have the specified interpretations; Dependency and number statements specify exactly how the 3. for each number statement in M with type τ and steps are carried out in our generative process. Consider generating functions g1 , . . . , gk , and each appropri- the dependency statement for State(a, t) beginning on line ately typed tuple of generating objects (o1 , . . . , ok ) in 8 of Fig. 3. This statement is applied for every basic RV of ω the form VState [a, t] where a ∈ UM (Aircraft) and t ∈ N. ω, the set of objects in [τ ] that satisfy this num- ber statement applied to these generating objects is If t = 0, the conditional distribution for State(a, t) is given {(τ, (g1 , o1 ), . . . , (gk , ok ), n) : n = 1, . . . , N } for some by the elementary CPD InitState; otherwise it is given natural number N ; by the elementary CPD StateTransition, which takes ω State(a, Pred(t)) as an argument. These elementary CPDs 4. for every type τ , each element of [τ ] satisfies some number statement applied to some objects in ω. define distributions over objects of type R6Vector (the return Note that by part 3 of this definition, the number of objects type of State). In our implementation, elementary CPDs are generated by any given application of a number statement in Java classes with a method getProb that returns the proba- world ω is a finite number N . However, a world can still con- bility of a particular value given a list of CPD arguments. tain infinitely many non-guaranteed objects if some number A dependency statement begins with a function symbol f statements are applied recursively, or are triggered for every and a tuple of logical variables x1 , . . . , xk representing the ar- natural number (like the ones generating radar blips in Ex. 3). guments to this function. In a number statement, the variables With a fixed set of objects, it was easy to define a set of ba- x1 , . . . , xk represent the generating objects. In either case, sic RVs such that a full instantiation of the basic RVs uniquely the rest of the statement consists of a sequence of clauses. identified a possible world. To achieve the same effect with When the statement is not abbreviated, the syntax for the unknown objects, we need two kinds of basic RVs: first clause is if cond then ∼ elem-cpd(arg1, . . ., argN ). The cond portion is a formula of LM (in which only Definition 4. For a B LOG model M , the set VM of basic x1 , . . . , xk can occur as free variables) specifying the con- random variables consists of: dition under which this clause should be used to sample a • for each random function f with type signature value for a basic RV. More precisely, if the possible world (τ0 , . . . , τk ) and each tuple of objects (o1 , . . . , ok ) ∈ constructed so far is ω, then the applicable clause is the first UM (τ1 ) × · · · × UM (τk ), a function application RV one whose condition is satisfied in ω (assuming for the mo- ω Vf [o1 , . . . , ok ] (ω) that is equal to [f ] (o1 , . . . , ok ) if ment that ω is complete enough to determine the truth val- o1 , . . . , ok all exist in ω, and null (or false for Boolean ues of the conditions). If no clause’s condition is satisfied, RVs) otherwise; or if the basic RV refers to objects that do not exist in ω, • for each number statement with type τ and generating then the value is set by default to false for Boolean functions, functions g1 , . . . , gk that have return types τ1 , . . . , τk , null for other functions, and zero for number variables. If the and each tuple of objects (o1 , . . . , ok ) ∈ UM (τ1 )×· · ·× condition in a clause is just “true”, then the whole string UM (τk ), a number RV N(τ,g1 ,...,gk ) [o1 , . . . , ok ] (ω) “if true then” may be omitted. equal to the number of objects that satisfy this number In the applicable clause, each CPD argument is evaluated statement applied to o1 , . . . , ok in ω. in ω. The resulting values are then passed to the elementary Intuitively, each step in the generative world-construction CPD. In the simplest case, the arguments are terms or formu- process determines the value of a basic variable. The crucial las of LM , such as State(a, Pred(t)). An argument can also result about basic RVs is the following: be a set expression of the form {τ y : ϕ}, where τ is a type, y Proposition 1. For any B LOG model M and any complete is a logical variable, and ϕ is a formula. The value of such an ω instantiation of VM , there is at most one model structure in expression is the set of objects o ∈ [τ ] such that ω satisfies ΩM consistent with this instantiation. ϕ with y bound to o. If the formula ϕ is just true it can be Equating objects with tuples might seem unneces- omitted: this is the case on line 9 of Fig. 1. sarily complicated, but it becomes very helpful when We require that the elementary CPDs obey two rules re- we define a Bayes net over the basic RVs (which lated to non-guaranteed objects. First, a CPD should never we do in Sec. 4.2). For instance, the sole parent assign positive probability to objects that do not exist in the of VApparentPos [(Blip, (Source, (Aircraft, 2)), (Time, 8), 1)] is partially completed world ω. Thus, we allow an elementary VState [(Aircraft, 2), 8]. It might seem more elegant to assign CPD to assign positive probability to a non-guaranteed object numbers to objects as they are generated, so that the exten- only if the object was passed in as a CPD argument. Second, sion of each type in each possible world would be simply a an elementary CPD cannot “peek” at the tuple representations prefix of the natural numbers. Specifically, we could num- of objects that are passed in: it must be invariant to permuta- ber the aircraft arbitrarily, and then number the radar blips tions of the non-guaranteed objects. lexicographically by aircraft and time step. Then we would have basic RVs such as VApparentPos [23], representing the ap- 4.2 Declarative semantics parent aircraft position for blip 23. But blip 23 could be gen- So far we have explained B LOG semantics procedurally, in erated by any aircraft at any time step. In fact, the parents terms of a generative process. To facilitate both knowledge of VApparentPos [23] would have to include all the #Blip and engineering and the development of learning algorithms, we State variables in the model. So, defining objects as tuples would like to have declarative semantics. The standard ap- yields a much simpler Bayes net. proach — which is used in most existing FOPLs — is to say

that a B LOG model defines a certain Bayesian network (BN) Definition 7. A distribution P over ΩM satisfies a B LOG over the basic RVs. In this section we discuss how that ap- model M if for every finite, self-supporting instantiation σ proach needs to be modified for B LOG. with vars(σ) ⊆ VM : We will write σ to denote an instantiation of a set of RVs QN vars(σ), and σX to denote the value that σ assigns to X. If P (Ωσ ) = n=1 pXn (σXn | σ{X1 ,...,Xn−1 } ) a BN is finite, then the probability Q it assigns to each com- plete instantiation σ is P (σ) = X∈vars(σ) pX (σX |σPa(X) ), where Ωσ is the set of possible worlds consistent with σ and X1 , . . . , XN is a numbering of σ as in Def. 6. where pX is the CPD for X and σPa(X) is σ restricted to the parents of X. In an infinite BN, we can write a similar expres- A B LOG model is well-defined if there is exactly one prob- sion for each finite instantiation σ that is closed under the par- ability distribution that satisfies it. Recall that a BN is well- ent relation (that is, X ∈ vars(σ) implies Pa(X) ⊆ vars(σ)). defined if it is acyclic and each variable has a finite set of an- If the BN is acyclic and each variable has finitely many an- cestors. Another way of saying this is that each variable can cestors, then these probability assignments define a unique be “reached” by enumerating its ancestors in a finite, topolog- distribution [Kersting and De Raedt, 2001]. ically ordered list. The well-definedness criterion for B LOG The difficulty is that in the BN corresponding to a B LOG is similar, but deals with finite, self-supporting instantiations model, variables often have infinite parent sets. For instance, rather than finite, topologically ordered lists of variables. the BN for Ex. 1 has an infinite number of basic RVs of the Theorem 1. Let M be a B LOG model. Suppose that VM form VTrueColor [b]: if it had only a finite number N of these is at most countably infinite,1 and for each V ∈ VM and RVs, it could not represent outcomes with more than N balls. ω ∈ ΩM , there is a self-supporting instantiation that agrees Furthermore, each of these VTrueColor [b] RVs is a parent of with ω and includes V . Then M is well-defined. each VObsColor [d] RV, since if BallDrawn(d) happens to be The theorem follows from a result in [Milch et al., 2005] b, then the observed color on draw d depends directly on the that deals with distributions over full instantiations of a set color of ball b. So the VObsColor [d] nodes have infinitely many of RVs. Prop. 1 makes the connection between full instantia- parents. In such a model, assigning probabilities to finite in- tions of VM and possible worlds. stantiations that are closed under the parent relation does not To check that the criterion of Thm. 1 holds for a partic- define a unique distribution: in particular, it tells us nothing ular example, we need to consider each basic RV. In Ex. 1, about the VObsColor [d] variables. the number RV for balls is supported by the empty instan- We required instantiations to be closed under the par- tiation, so in every world it is part of a self-supporting in- ent relation so that the factors pX (σX |σPa(X) ) would be stantiation of size one. Each TrueColor(b) RV depends only well-defined. But we may not need the values of all of X’s on whether its argument exists, so these variables participate parents in order to determine the conditional distribution in self-supporting instantiations of size two. Similarly, each for X. For instance, knowing VBallDrawn [d] = (Ball, 13) BallDrawn variable depends only on what balls exist. To sam- and VTrueColor [(Ball, 13)] = Blue is sufficient to determine ple an ObsColor(d) variable, we need to know BallDrawn(d) the distribution for VObsColor [d]: the colors of all the other and TrueColor(BallDrawn(d)), so these variables are in self- balls are irrelevant in this context. We can read off this supporting instantiations of size three. Similar arguments can context-specific independence from the dependency state- be made for Examples 2 and 3. Of course, we would like ment for ObsColor in Fig. 1 by noting that the instantiation to have an algorithm for checking whether a B LOG model is (VBallDrawn [d] = (Ball, 13), VTrueColor [(Ball, 13)] = Blue) well-defined; the criteria given in Thm. 2 in Sec. 6.2 are a first determines the value of the sole CPD argument step in this direction. TrueColor(BallDrawn(d)). We say this instantiation supports the variable VObsColor [d] (see [Milch et al., 2005]). 5 Evidence and Queries Definition 5. An instantiation σ supports a basic RV V of the form Vf [o1 , . . . , ok ] or Np [o1 , . . . , ok ] if all possible Because a well-defined B LOG model M defines a distribution worlds consistent with σ agree on (1) whether all the objects over model structures, we can use arbitrary sentences of LM o1 , . . . , ok exist, and, if so, on (2) the applicable clause in the as evidence and queries. But sometimes such sentences are dependency or number statement for V and the values for the not enough. In Ex. 3, the user observes radar blips, which are CPD arguments in that clause. not referred to by any terms in the language. The user could assert evidence about the blips using existential quantifiers, Note that some RVs, such as NBall [] in Ex. 1, are supported but then how could he make a query of the form, “Did this by the empty instantiation. We can now generalize the notion blip come from the same aircraft as that blip?” of being closed under the parent relation. A natural solution is to allow the user to extend the lan- guage when evidence arrives, adding constant symbols to re- Definition 6. A finite instantiation σ is self-supporting if its fer to observed objects. In many cases, the user observes instantiated variables can be numbered X1 , . . . , XN such some new objects, introduces some new symbols, and assigns that for each n ≤ N , the restriction of σ to {X1 , . . . , Xn−1 } the symbols to the objects in an uninformative order. To han- supports Xn . dle such cases, B LOG includes a special macro. For instance, This definition lets us give semantics to B LOG models in given 4 radar blips at time 8, one can assert: a way that is meaningful even when the corresponding BNs {Blip r: Time(r) = 8} = {Blip1, Blip2, Blip3, Blip4}; contain infinite parent sets. We will write pV (v | σ) for the probability that V ’s dependency or number statement assigns 1 This is satisfied if the Real and RkVector types are not argu- to the value v, given an instantiation σ that supports V . ments to random functions or return types of generating functions.

This introduces the new constants Blip1, . . . , Blip4 and as- reaches a clause whose condition is either undetermined or serts that there are exactly 4 radar blips at time 8. determined to be true given σ (if all the conditions are deter- Formally, the macro augments the model with dependency mined to be false, then it returns the default value for V ). If statements for the new symbols. The statements implement the condition is undetermined, then σ does not support V . If sampling without replacement; for our example, we have it is determined to be true, then the subroutine evaluates each Blip1 ∼ Uniform({Blip r : (Time(r) = 8)}); of the CPD arguments in this clause. If σ determines the val- Blip2 ∼ Uniform({Blip r : (Time(r) = 8) & (Blip1 != r)}); ues of all the arguments, then the subroutine samples a value and so on. Once the model has been extended this way, for V by passing those values to the sample method of this the user can make assertions about the apparent positions of clause’s elementary CPD. Otherwise, σ does not support V . Blip1, Blip2, etc., and then use these symbols in queries. To evaluate terms and quantifier-free formulas, we use a These new constants resemble Skolem constants, but con- straightforward recursive algorithm. The base case looks up ditioning on assertions about the new constants is not the the value of a particular function application RV in σ and re- same as conditioning on an existential sentence. For exam- turns “undetermined” if it is not instantiated. A formula may ple, suppose you go into a new wine shop, pick up a bottle at be determined even if some of its subformulas are not deter- random, and observe that it costs $40. This scenario is cor- mined: for example, α ∧ β is determined to be false if α is rectly modeled by introducing a new constant Bottle1 with a false. It is more complicated to evaluate set expressions such Uniform CPD. Then observing that Bottle1 costs over $40 as {Blip r: Time(r) = 8}, which can be used as CPD suggests that this is a fancy wine shop. On the other hand, the arguments. A naive algorithm for evaluating this expression mere existence of a $40+ bottle does not suggest this, because would first enumerate all the objects of type Blip (which re- almost every shop has some bottle at over $40. quires certain number variables to be instantiated), then select the blips r that satisfy Time(r) = 8. But Fig. 3 specifies that there may exist some blips for each aircraft a and each natu- 6 Inference ral number t: since there are infinitely many natural numbers, Because the set of basic RVs of a B LOG model can be infinite, some worlds contain infinitely many blips. Fortunately, the it is not obvious that inference for well-defined B LOG models number of blips r with Time(r) = 8 is necessarily finite: in is even decidable. However, the generative process intuition every world there are a finite number of aircraft, and each suggests a rejection sampling algorithm. We present this al- one generates a finite number of blips at time 8. We have gorithm not because it is particularly efficient, but because it an algorithm that scans the formula within a set expression demonstrates the decidability of inference for a large class of for generating function restrictions such as Time(r) = 8, and B LOG models (see Thm. 2 below) and illustrates several is- uses them to avoid enumerating infinite sets when possible. sues that any B LOG inference algorithm must deal with. At A similar method is used for evaluating quantified formulas. the end of this section, we present experimental results from a somewhat more efficient likelihood weighting algorithm. 6.2 Termination criteria In order to generate each sample, the algorithm above repeat- 6.1 Rejection sampling edly instantiates the first variable that is supported but not yet Suppose we are given a partial instantiation e as evidence, and instantiated, until it instantiates all the query and evidence a query variable Q. Our rejection sampling algorithm starts variables. When can we be sure that this will take a finite by imposing an arbitrary numbering on the the basic RVs. amount of time? The first way this process could fail to termi- To generate each sample, it starts with an empty instantiation nate is if it goes into an infinite loop while checking whether a σ. Then it repeats the following process: scan the basic RVs particular variable is supported. This happens if the program in the imposed order until we reach the first RV V that is ends up enumerating an infinite set while evaluating a set ex- supported by σ but not already instantiated in σ; sample a pression or quantified formula. We can avoid this by ensuring value v for V according to V ’s dependency statement; and that all such expressions in the B LOG model are finite once augment σ with the assignment V = v. It continues until all generating function restrictions are taken into account. the query and evidence variables have been sampled. If the The sample generator also fails to terminate if it never con- sample is consistent with the evidence e, then it increments a structs an instantiation that supports a particular query or ev- counter Nq , where q is the sampled value of Q. Otherwise, it idence variable. To see how this can happen, consider calling rejects this sample. After N accepted samples, the estimate the subroutine described above to sample a variable V . If of P (Q = q | e) is Nq /N . V is not supported, the subroutine will realize this when it This algorithm requires a subroutine that determines encounters a variable U that is relevant but not instantiated. whether a partial instantiation σ supports a basic RV V , and Now consider a graph over basic variables where we draw an if so, returns a sample from V ’s conditional distribution. For edge from U to V when the evaluation process for V hits U a basic RV V of the form Vf [o1 , . . . , ok ] or Np [o1 , . . . , ok ], in this way. If a variable is never supported, then it must be the subroutine begins by checking the values of the relevant part of a cycle in this graph, or part of a receding chain of number variables in σ to determine whether all of o1 , . . . , ok variables V1 ← V2 ← · · · that is extended infinitely. exist. If some of these number variables are not instantiated, The graph constructed in this way varies from sample to then σ does not support V . If some of o1 , . . . , ok do not ex- sample: for instance, sometimes the evaluation process for ist, the subroutine returns the default value for V . If they VObsColor [d] will hit VTrueColor [(Ball, 7)], and sometimes it do all exist, the subroutine follows the semantics for depen- will hit VTrueColor [(Ball, 13)]. However, we can rule out cy- dency statements discussed in Sec. 4.1. First, it iterates over cles and infinite receding chains in all these graphs by consid- the clauses in the dependency (or number) statement until it ering a more abstract graph over function symbols and types.

Definition 8. The symbol graph for a B LOG model M is a 0.45 directed graph whose nodes are the types and random func- 0.4 tion symbols of M , where the parents of a type τ or function 0.35 symbol f are: 0.3 • the random function symbols that occur on the right Probability 0.25 hand side of the dependency statement for f or some 0.2 number statement for τ ; • the types of variables that are quantified over in formu- 0.15 las or set expressions on the right hand side of such a 0.1 statement; 0.05 • the types of the arguments for f or the return types of 0 generating functions for τ . 1 2 3 4 5 6 7 8 Number of balls in urn If the sampling subroutine for a basic RV V hits a basic RV 0.18 U , then there must be an edge from U ’s function symbol (or type, if U is a number RV) to V ’s function symbol (or type) 0.16 in the symbol graph. This property, along with ideas from 0.14 [Milch et al., 2005], allows us to prove the following: 0.12 Probability Theorem 2. Suppose M is a B LOG model where: 0.1 1. uncountable built-in types do not serve as function argu- 0.08 ments or as the return types of generating functions; 0.06 2. each quantified formula and set expression ranges over a 0.04 finite set once generating function restrictions are taken 0.02 into account; 0 3. the symbol graph is acyclic. 0 5 10 15 20 25 Then M is well-defined. Also, for any evidence instantiation Number of balls in urn e and query variable Q, the rejection sampling algorithm de- Figure 4: Distribution for the number of balls in the urn scribed in Sec. 6.1 converges to the posterior P (Q|e) defined (Ex. 1). Dashed lines are the uniform prior (top) or Poisson by the model, taking finite time per sampling step. prior (bottom); solid lines are the exact posterior given that The criteria in Thm. 2 are very conservative: in particular, 10 balls were drawn and all appeared blue; and plus signs when we construct the symbol graph, we ignore all structure are posterior probabilities computed by 5 independent runs in the dependency statements and just check for the occur- of 20,000 samples (top) or 100,000 samples (bottom). rence of function and type symbols. These criteria are satis- fied by the models in Figures 1 and 2. However, the model in Fig. 3 does not satisfy the criteria because there is a self-loop 7 Related Work from State to State. The criteria do not exploit the fact that Gaifman [1964] was the first to suggest defining a probability State(a, t) depends only on State(a, Pred(t)), and the non- distribution over first-order model structures. Halpern [1990] random function Pred is acyclic. Friedman et al. [1999] have defines a language in which one can make statements about already dealt with this issue in the context of probabilistic re- such distributions: for instance, that the probability of the lational models; their algorithm can be adapted to obtain a set of worlds that satisfy Flies(Tweety) is 0.8. Probabilis- stronger version of Thm. 2 that covers Fig. 3. tic logic programming [Ng and Subrahmanian, 1992] can be seen as an application of this approach to Horn-clause knowl- 6.3 Experimental results edge bases. Such an approach only defines constraints on Milch et al. [2005] describe a guided likelihood weighting distributions, rather than defining a unique distribution. algorithm that uses backward chaining from the query and Most first-order probabilistic languages (FOPLs) that de- evidence nodes to avoid sampling irrelevant variables. This fine unique distributions fix the set of objects and the interpre- algorithm can also be adapted to B LOG models. We applied tations of (non-Boolean) function symbols. Examples include this algorithm for Ex. 1, asserting that 10 balls were drawn relational Bayesian networks [Jaeger, 2001] and Markov and all appeared blue, and querying the number of balls in logic models [Domingos and Richardson, 2004]. Prolog- the urn. The top graph of Fig. 4 shows that when the prior for based languages such as probabilistic Horn abduction [Poole, the number of balls is uniform over {1, . . . , 8}, the posterior 1993], PRISM [Sato and Kameya, 2001], and Bayesian logic puts more weight on small numbers of balls; this makes sense programs [Kersting and De Raedt, 2001] work with Herbrand because the more balls there are in the urn, the less likely it is models, where the objects are in one-to-one correspondence that they are all blue. The bottom graph, using a Poisson(6) with the ground terms of the language. prior, shows a similar but less pronounced effect. Note that There are a few FOPLs that allow explicit reference uncer- the posterior probabilities computed by the likelihood weight- tainty, i.e., uncertainty about the interpretations of function ing algorithm are very close to the exact values (computed by symbols. Among these are two languages that use indexed exhaustive enumeration). These results could not be obtained RVs rather than logical notation: BUGS [Gilks et al., 1994] using any algorithm that constructed a single fixed BN, since and indexed probability diagrams (IPDs) [Mjolsness, 2004]. the number of potentially relevant VTrueColor [b] variables is Reference uncertainty can also be represented in probabilistic infinite in the Poisson case. relational models (PRMs) [Koller and Pfeffer, 1998], where

a “single-valued complex slot” corresponds to an uncertain perceptual processes. Our approach defines generative mod- unary function. PRMs are unfortunately restricted to unary els that create first-order model structures by adding objects functions (attributes) and binary predicates (relations). Prob- and setting function values; everything else follows naturally abilistic entity-relationship models [Heckerman et al., 2004] from this design decision. Much remains to be done, espe- lift this restriction, but represent reference uncertainty using cially on inference: we expect to employ MCMC with user- relations (such as Drawn(d, b)) and special mutual exclusivity defined and possibly adaptive proposal distributions, and to constraints, rather than with functions such as BallDrawn(d). develop algorithms that work directly with objects rather than Multi-entity Bayesian network logic (MEBN) [Laskey, 2004] at the lower level of basic RVs. is similar to B LOG in allowing uncertainty about the values of functions with any number of arguments. References The need to handle unknown objects has been appreciated [Charniak and Goldman, 1993] E. Charniak and R. P. Goldman. A since the early days of FOPL research: Charniak and Gold- Bayesian model of plan recognition. AIJ, 64(1):53–79, 1993. man’s plan recognition networks (PRNs) [1993] can con- [Domingos and Richardson, 2004] P. Domingos and M. Richard- tain unbounded numbers of objects representing hypothe- son. Markov logic: A unifying framework for statistical rela- sized plans. However, external rules are used to decide what tional learning. In Proc. ICML Wksp on Statistical Relational objects and variables to include in a PRN. While each pos- Learning and Its Connections to Other Fields, 2004. sible PRN defines a distribution on its own, Charniak and [Friedman et al., 1999] N. Friedman, L. Getoor, D. Koller, and Goldman do not claim that the various PRNs are all approxi- A. Pfeffer. Learning probabilistic relational models. In Proc. 16th mations to some single distribution over outcomes. IJCAI, pages 1300–1307, 1999. [Gaifman, 1964] H. Gaifman. Concerning measures in first order Some more recent FOPLs do define a single distribution calculi. Israel J. Math., 2:1–18, 1964. over outcomes with varying objects. IPDs allow uncer- [Getoor et al., 2001] L. Getoor, N. Friedman, D. Koller, and tainty over the index range for an indexed family of RVs. B. Taskar. Learning probabilistic models of relational structure. PRMs and their extensions allow a variety of forms of un- In Proc. 18th ICML, pages 170–177, 2001. certainty about the number (or existence) of objects satisfy- [Gilks et al., 1994] W. R. Gilks, A. Thomas, and D. J. Spiegelhalter. ing certain relational constraints [Koller and Pfeffer, 1998; A language and program for complex Bayesian modelling. The Getoor et al., 2001] or belonging to each type [Pasula et al., Statistician, 43(1):169–177, 1994. 2003]. However, there is no unified syntax or semantics for [Halpern, 1990] J. Y. Halpern. An analysis of first-order logics of dealing with unknown objects in PRMs. MEBNs take yet probability. AIJ, 46:311–350, 1990. [Heckerman et al., 2004] D. Heckerman, C. Meek, and D. Koller. another approach: an MEBN model includes a set of unique Probabilistic models for relational data. Technical Report MSR- identifiers, for each of which there is an “identity” RV indi- TR-2004-30, Microsoft Research, 2004. cating whether the named object exists. [Jaeger, 2001] M. Jaeger. Complex probabilistic modeling with re- Our approach to unknown objects in B LOG can be seen as cursive relational Bayesian networks. Annals of Math and AI, unifying the PRM and MEBN approaches. Number state- 32:179–220, 2001. ments neatly generalize the various ways of handling un- [Kersting and De Raedt, 2001] K. Kersting and L. De Raedt. Adap- known objects in PRMs: number uncertainty [Koller and Pf- tive Bayesian logic programs. In 11th Int. Conf. on ILP, 2001. effer, 1998] corresponds to a number statement with a sin- [Koller and Pfeffer, 1998] D. Koller and A. Pfeffer. Probabilistic gle generating function; existence uncertainty [Getoor et al., frame-based systems. In Proc. 15th AAAI, pages 580–587, 1998. 2001] can be modeled with two or more generating functions [Laskey, 2004] K. B. Laskey. MEBN: A logic for open-world prob- abilistic reasoning. Technical report, George Mason Univ., 2004. (and a CPD whose support is {0, 1}); and domain uncertainty [Milch et al., 2005] B. Milch, B. Marthi, D. Sontag, S. Russell, [Pasula et al., 2003] corresponds to a number statement with D. L. Ong, and A. Kolobov. Approximate inference for infinite no generating functions. There is also a correspondence be- contingent Bayesian networks. In 10th AISTATS Wksp, 2005. tween B LOG and MEBN logic: the tuple representations in [Mjolsness, 2004] E. Mjolsness. Labeled graph notations for graph- a B LOG model can be thought of as unique identifiers in an ical models. Technical Report 04-03, School of Information and MEBN model. The difference is that B LOG determines which Computer Science, UC Irvine, 2004. objects actually exist in a world using number variables rather [Ng and Subrahmanian, 1992] R. T. Ng and V. S. Subrahmanian. than individual existence variables. Probabilistic logic programming. Information and Computation, Finally, it is informative to compare B LOG with the IBAL 101(2):150–201, 1992. [Pasula et al., 2003] H. Pasula, B. Marthi, B. Milch, S. Russell, and language [Pfeffer, 2001], in which a program defines a dis- I. Shpitser. Identity uncertainty and citation matching. In NIPS tribution over outputs that can be arbitrary nested data struc- 15. MIT Press, Cambridge, MA, 2003. tures. An IBAL program could implement a B LOG-like gen- [Pearl, 1988] J. Pearl. Probabilistic Reasoning in Intelligence Sys- erative process with the outputs viewed as logical model tems. Morgan Kaufmann, San Francisco, revised edition, 1988. structures. But the declarative semantics of such a program [Pfeffer, 2001] A. Pfeffer. IBAL: A probabilistic rational program- would be less clear than the corresponding B LOG model. ming language. In Proc. 17th IJCAI, pages 733–740, 2001. [Poole, 1993] D. Poole. Probabilistic Horn abduction and Bayesian networks. AIJ, 64(1):81–129, 1993. 8 Conclusion [Sato and Kameya, 2001] T. Sato and Y. Kameya. Parameter learn- B LOG is a representation language for probabilistic models ing of logic programs for symbolic-statistical modeling. JAIR, with unknown objects. It contributes to the solution of a very 15:391–454, 2001. general problem in AI: intelligent systems must represent and [Sittler, 1964] R. W. Sittler. An optimal data association problem in surveillance theory. IEEE Trans. Military Electronics, MIL- reason about objects, but those objects may not be known 8:125–139, 1964. a priori and may not be directly and uniquely identified by

You can also read