A gaussian mixture model

First of all we define our model,

using KissABC
using Distributions

function model(P,N)
    μ_1, μ_2, σ_1, σ_2, prob=P
    d1=randn(N).*σ_1 .+ μ_1
    d2=randn(N).*σ_2 .+ μ_2
    ps=rand(N).<prob
    R=zeros(N)
    R[ps].=d1[ps]
    R[.!ps].=d2[.!ps]
    R
end
model (generic function with 1 method)

Let's use the model to generate some data, this data will constitute our dataset

parameters = (1.0, 0.0, 0.2, 2.0, 0.4)
data=model(parameters,5000)
5000-element Array{Float64,1}:
  1.1013306846760305
  0.6947767535721752
  0.4381586383717882
  1.6352325672168446
 -1.8981241314713049
  0.7414097347259663
  2.861138142130368
 -2.2612292472590183
 -1.136565673025339
  0.7180420013476974
  ⋮
  1.082457575438042
  0.9284113706074747
  1.0528429915449815
  0.8777806798713996
  1.4338180515011563
  0.7973982302625535
  0.8115706389190371
 -1.4709421327139285
 -1.8433430094775847

let's look at the data

using Plots
histogram(data)

ex1_hist1

we can now try to infer all parameters using KissABC, first of all we need to define a reasonable prior for our model

prior=Factored(
            Uniform(0,2), # there is surely a peak between 0 and 2
            Uniform(-1,1), #there is a smeared distribution centered around 0
            Uniform(0,1), # the peak has surely a width below 1
            Uniform(0,4), # the smeared distribution surely has a width less than 4
            Beta(2,2) # the number of total events from both distributions look about the same, so we will favor 0.5 just a bit
        );

let's look at a sample from the prior, to see that it works

rand(prior)
(1.3022139272278652, 0.24122004977258982, 0.7338695442555514, 2.110487811388831, 0.3895361418247243)

now we need a distance function to compare datasets, this is not the best distance we could use, but it will work out anyway

function D(x,y)
    r=(0.1,0.2,0.5,0.8,0.9)
    mean(abs,quantile(x,r).-quantile(y,r))
end
D (generic function with 1 method)

we can now run ABCDE to get the posterior distribution of our parameters given the dataset data

plan=ABCplan(prior,model,data,D,params=5000)
res,Δ,converged=ABCDE(plan,0.02,parallel=true,generations=500,verbose=false);

Has it converged to the target tolerance?

print("Converged = ",converged)
Converged = false

let's see the median and 95% confidence interval for the inferred parameters and let's compare them with the true values

function getstats(V)
    (
        median=median(V),
        lowerbound=quantile(V,0.025),
        upperbound=quantile(V,0.975)
    )
end

labels=(:μ_1, :μ_2, :σ_1, :σ_2, :prob)
P=[getindex.(res,i) for i in 1:5]
stats=getstats.(P)

for is in eachindex(stats)
    println(labels[is], " ≡ " ,parameters[is], " → ", stats[is])
end
μ_1 ≡ 1.0 → (median = 0.9839341588022492, lowerbound = 0.9386423788048345, upperbound = 1.034681505133927)
μ_2 ≡ 0.0 → (median = 0.026594220664156364, lowerbound = -0.06022689038612259, upperbound = 0.09780361725015226)
σ_1 ≡ 0.2 → (median = 0.19721174143790032, lowerbound = 0.16082237655575468, upperbound = 0.2410458954711207)
σ_2 ≡ 2.0 → (median = 2.0587464236410202, lowerbound = 1.9110899977227997, upperbound = 2.1990909848399376)
prob ≡ 0.4 → (median = 0.4167638523309556, lowerbound = 0.3735619960917253, upperbound = 0.459723599251158)

The inferred parameters are close to nominal values


This page was generated using Literate.jl.