LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>A good reference: https://www.amazon.com/Foundations-Linear-Generalized-Probability-Statistics/dp/1118730038

LikeLike

]]>LikeLike

]]>LikeLike

]]>First is Point Value | Actions

Second is Point Value | Players [following the derivation you have above]; where the “action_vector” is now treated using the counting-measure-at-possession-level application learned off the neural network.

By the way, softmax is indeed linear in actions. It’s called a linear model, after all. Work through the equations and it will pop out immediately. It’s the move from the linear weights to a Gaussian prior that makes this non-linear in players. This is because we’ve effectively introduced a natural logarithm.

LikeLike

]]>On the softmax being linear with respect to the actions, what confuses me is the softmax isn’t linear with respect to the logits, which are already a non-linear function of the actions if you include non-linear activations. I thought the action network was approximating p( point outcome | action vector ) without any dependance on the players, but maybe not, since you mention it’s nonlinear in the players.

On the player model and its prior, I guess what would help most is understanding the distribution you are trying to model and how you represented it. I’m guessing here, but I think your hierarchical player model is approximating the distribution

p( point outcome | players )

= integral_{action vector} p( point outcome, action vector | players)

= integral_{action vector} p( point outcome | action vector, players ) * p( action vector | players ).

I understand how this provides a natural hierarchy. And it makes sense to have a prior over the action vectors in this case. If this is in fact the learning task, then my only question would be how do you tease out the impact of individual players? Is there another regression on top of this model?

LikeLike

]]>