Session 5: Foundational Aspects of General Intelligence and AI
3.30 pm – 4:00 pm GMT
Registration link- https://us02web.zoom.us/webinar/register/WN_Ls2kE36-TFyJo5u1aPVscw
Chaired by Sheri Markose (University of Essex)
Hector Zenil (Oxford and Alan Turing Institute), Karl Friston (UCL), Vincent Mueller (Humboldt Professor and Alan Turing Institute)
I think Silvia had a question for call. Do you want to ask the question verbally? Silvia?
Silvia Sanchez Roman
Thank you, Karl. It was a wonderful talk. And I wanted to know your opinion, what do you think it is self organization at the nervous system level is founded at the single cell level at the neuron level, or even going back for every cell in in our bodies at the gene level. So this gene organization comes from the gene structure organization.
Yeah, so, absolutely, I didn’t introduce it in that presentation. But the link with self organization comes from the study of non equilibrium study states in systems in physics. And the key thing is, you have to introduce a separation or boundary between the internal states and the external states of any system, in the very introduction of that boundary, under the assumption that the dynamics have an attracting set, then you get for free dynamics that optimize or do gradient flows on that on that function. And the picture you then have is basically what’s referred to these Markov boundaries or blankets, you’re in the sense of Judea pearl that enshrouded each other at multiple scales. So, the idea is exactly the same mechanics would be seen at the molecular level, then it would be seen in terms of intracellular organelles, and then the cell itself has a cell service with a Markov blanket and then a population specialized brain region, and then a brain as an organ and then different organs in my body, and then so on to conspecifics institutions, right up to the gear if you want. So, in principle, if that if that hierarchical nesting of blanketed structures that enabled separation of states that are internal at that scale of description from the external states exists, then it has to be the case you can write down their dynamics as a as a gradient flow on that evidence, evidence lower bound. So then the question is really, you know, what is the relationship between the blankets, or different scales and blankets and blankets, but you’re upside in principle, and indeed, there are papers, looking at dendritic cells organization, treating a single neuron as a system, and it learning to sample those presynaptic inputs that are the most predictable, if you like, in store in its in its in its sort of postsynaptic specializations, some model of the contingencies and inputs that it hears or sees from other populations.
yes, no, as Sorry for being late. So it’s a question to Karl really, out of ignorance, and just how you see, I don’t know how to call it but maybe the free energy frame or how it relates to other approaches to intelligence, like someone mentioned, game theory, but maybe deep learning. So it is a more, is it a more general framework? Or do we have your competitive frameworks
I would look at it as a more general framework. So it is not competitive. In fact, if there’s something out there that works, that cannot be cast as a usually a limiting or a special case of that sort of generic imperative, existential imperative, from the perspective of statistics of self organization, then that’s a problem for the free energy framework. So, I think the clearest example of that is, when I was talking about the ways of decomposing evidence and expected evidence or expected reality bounds on evidence into things like risk and ambiguity or intrinsic versus extrinsic value, so a limiting case of that isn’t there is just expected utility or intrinsic value. As a special case, and then from that you get you can get economics, utility theory, or if you’re in behavioral psychology, reinforcement learning, deep learning, as I understand it, is an interesting case. In the sense that it would require you to specify what kind of deep learning you’re doing. If you’re doing a variational auto encoder, then that, then that is under a very, very simple kind of generative model, exactly. Effectively a gradient flow, or at least a gradient, a coordinate sent on that objective function, the variational free energy. However, if you’re doing deep RL, and you’re specifying the cost function or the value function, then that sort of that is still a special case. But there’s no, there’s no uncertainty about the weights or the parameters in your, in your deep learning. That’s really interesting, because if you don’t have an evaluation of the uncertainties someone was talking about the result was talking about uncertainty quantification on the weights of your deep network, you can’t evaluate the marginal likelihood of all the evidence for that at that network. And therefore, you can’t, you can’t minimize the complexity. And that usually manifests in terms of sharp minima. So you have to resort to all sorts of tricks like mini batching, and the like, in order to in order to sort of reintroduce that uncertainty that you’ve not been able to measure because you haven’t got to believe distribution of the weights. So deep learning would be an interesting special case that has special problems that might be resolved by slightly more generic, and all that would require is putting belief posterior beliefs over the weights themselves. Does that make sense? I’m not sure whether you like deep learning or not.
Yes, no, no, thank you very much. It definitely it clarifies things. Thank Thank you.
So and just out of interest, since you do work on Bayesian learning, you know
Yeah, so the casting the world as you know, I do believe that there are ancient bayesian priors from which you would start, right. And I think the reality people do believe and I think Karl himself talks about embodied cognition, you know, that it is embodied in ourselves. And that is used as the Bayesian priors. But would that be enough for us to learn everything else that we would need to?
Yeah, I really don’t. I might say, I’m not doing any Bayesian learning. I come from philosophy and logic. So yes.
We’ll have any questions before we. We what’s happening now is because we’ve got time that Vincent documents in stock.
Can I ask a question? Yeah. So Karl, this is a question to you. And I think the theory that you’re proposing is very fascinating in the sense that it’s very generalizable, it seems so but from a kind of game theory perspective, I My problem is that in this case, I do not know a lot about, you know, this particular strategy and epistemological game theory, but I know that they take, you know, ambiguity aversion quite seriously. They have decision theoretic frameworks for that. So I was just wondering that, even if I ignore the strategic interaction, and so purely in a decision theoretic perspective, and just technically speaking, how is this different from regular, let’s say, ambiguity averse agent who is performing in a Bayesian world? Exactly, where does it differ from that because you are still using a basic Bayesian framework, right, if I understand correctly.
I don’t know very much about economics. But if if I am writing assuming that you’ve in economics, you can equip a utility function with with an ambiguity or cost. function, then I would say, that’s a really nice example of the mechanics I’ve been talking about. So this is an example where the free energy formulation is meant to generalize stuff that works already. The The only thing that the free energy formulation brings to the table which might be interesting from your from your point of view, as an economist would be that splitting it to risk an ambiguity or to something that is predominantly dictated by your prior preferences or your loss functions, and your which will be the risk part of it. And then on the other hand, the the pure ambiguity, which has got nothing to do with what you want it to happen, that that carving into to is some is just an interpretation thing. You’re the underlying objective function is just one expression. It’s basically an expected functional expected free energy, which tells you that there is, because there’s only one objective function, it means that for any weighting of ambiguity, there is an optimal weighting of the risk. Sorry, to put it another way, there is always a Bayes Optimal weighting of the ambiguity and the risk term. So that that from from my perspective, it is an interesting feature, because it means there is an optimal balance between these sort of more epistemic imperatives, and these more utilitarian imperatives. Usually, the way that’s resolved, practically, in the uses of the formulas that I was describing, is by acknowledging the different preferences or value functions, you cast those as a log probability of expected outcomes. So, things that are highly valuable are the kinds of outcomes I expect to happen to me, and, therefore, the unsurprising, but I will also have a precision of the about this belief, so, I may believe I am going to be rich, but I may believe that with very small competence, or very, very great precision and confidence, so the entropy or the precision of your beliefs that define your cost function now becomes another free parameter that that is learnable. And it’s that the tunes the if you have a balance between the ambiguity part and the risk part. So, that opens up an interesting world, which speaks to the previous question about, Yeah, can you start from any full set of priors? Not really, in my world, because as soon as you start to equip a belief with a precision, you’re raising a probability of the power of something or multiplying a log probability, you’re suddenly introducing a hierarchical generative model. And as soon as you introduce a hierarchical generative model, there are no full prize in the sense that any, any belief at an intermediate level in the generative model becomes an empirical prior, and now becomes learnable or updatable. From from, from the observations. So the question now comes is, where do your empirical priors come from? And in the application of game theory to try and phenotype people in psychiatry, say people who show addictive behavior, we try to estimate those positions, how much did they value that outcome relative to the ambiguity? And I would imagine that you’ve got exactly the same construct in looking at objective and subjective utility and ISO preference and prospect there, I imagine you’ve got very similar devices and economics to try to estimate how risk averse somebody is. So we do the exact same thing, but it’s cast in terms of these empirical priors, usually precisions of belief structures.
A follow up question, if it is okay with you. So, Karl this is kind of follow up question. So in then what would be the simplest sort of object? I mean, I guess human beings are comparatively much more complex in terms of making decisions and so on. So this is I guess, something related to what Silvia was asking that, when we do not quite have this much. I mean, the ability to assign a probability itself requires a lot of complex calculation to begin with. So what would be a simplest object, which will be able to do this calculation and just trying to understand the range of applications where it can be applied?
I see. Right? So I think the first thing to say is when I talk about beliefs, I just mean probability distributions. I don’t mean anything propositional or something that somebody has worked out in their heads, I simply mean a conditional probability distribution that is parameterize by the physical state of something. So everything I’ve said would apply to a thermostat. So the the electrical currents in thermostat would stand in for some belief about the causes of the temperature sensor, and its priorities would be that the would look It’s probably it’s where the the temperature should be at its setpoint. And everything that he does, and everything he believes, is in the service of minimizing surprise and its inputs by bringing the temperature back down to its setpoint. So then the interesting question about complexity is, what’s the difference between me or you and a thermostat? And that’s a really interesting question. The normal answer to that is that the, it’s the, the depth into the future that you can, that your generative model entertains. So a thermostat can’t plan. Whereas you and I can. So we now if I was a physiologist, we now have Alice faces. But or, if I was in machine learning, I would be appealing to planning as infants. And that’s really interesting, because as soon as your generative model or your cause effects structure that you’re trying to secure evidence for, has a notion of the consequences of an action or a movement, then it must have a it must cover the future. So requires a temporal depth, which the thermostat doesn’t have. I think it’s that it’s that fundamental depth and counterfactual richness that comes for free. As soon as you consider generative models that cars could entertain or model, what would happen if I did that, or that or that. So now in the future, and the counterfactual. So they have a particular depth, which makes me new, very different from a thermostat. And then you move on to well, let’s see what happens now. Put these thermostats with temporal depth together to play and then we get back to the questions about what would emerge in terms of sort of coupling these systems together whether or trying to infer and plan around around each other. And then we get back to game theory. So those are many moves away from the Bayesian thermostat. And really interesting question about the nature
of parallel sort of with that selfish in idea in some sense that the future planning may not be quite there, but somehow it’s locally optimizing and eventually that leads to some kind of, you know, complex organism coming up.