A gaussian mixture model
First of all we define our model,
using KissABC
using Distributions
function model(P,N)
μ_1, μ_2, σ_1, σ_2, prob=P
d1=randn(N).*σ_1 .+ μ_1
d2=randn(N).*σ_2 .+ μ_2
ps=rand(N).<prob
R=zeros(N)
R[ps].=d1[ps]
R[.!ps].=d2[.!ps]
R
end
model (generic function with 1 method)
Let's use the model to generate some data, this data will constitute our dataset
parameters = (1.0, 0.0, 0.2, 2.0, 0.4)
data=model(parameters,5000)
5000-element Array{Float64,1}:
1.1013306846760305
0.6947767535721752
0.4381586383717882
1.6352325672168446
-1.8981241314713049
0.7414097347259663
2.861138142130368
-2.2612292472590183
-1.136565673025339
0.7180420013476974
⋮
1.082457575438042
0.9284113706074747
1.0528429915449815
0.8777806798713996
1.4338180515011563
0.7973982302625535
0.8115706389190371
-1.4709421327139285
-1.8433430094775847
let's look at the data
using Plots
histogram(data)
we can now try to infer all parameters using KissABC
, first of all we need to define a reasonable prior for our model
prior=Factored(
Uniform(0,2), # there is surely a peak between 0 and 2
Uniform(-1,1), #there is a smeared distribution centered around 0
Uniform(0,1), # the peak has surely a width below 1
Uniform(0,4), # the smeared distribution surely has a width less than 4
Beta(2,2) # the number of total events from both distributions look about the same, so we will favor 0.5 just a bit
);
let's look at a sample from the prior, to see that it works
rand(prior)
(1.3022139272278652, 0.24122004977258982, 0.7338695442555514, 2.110487811388831, 0.3895361418247243)
now we need a distance function to compare datasets, this is not the best distance we could use, but it will work out anyway
function D(x,y)
r=(0.1,0.2,0.5,0.8,0.9)
mean(abs,quantile(x,r).-quantile(y,r))
end
D (generic function with 1 method)
we can now run ABCDE to get the posterior distribution of our parameters given the dataset data
plan=ABCplan(prior,model,data,D,params=5000)
res,Δ,converged=ABCDE(plan,0.02,parallel=true,generations=500,verbose=false);
Has it converged to the target tolerance?
print("Converged = ",converged)
Converged = false
let's see the median and 95% confidence interval for the inferred parameters and let's compare them with the true values
function getstats(V)
(
median=median(V),
lowerbound=quantile(V,0.025),
upperbound=quantile(V,0.975)
)
end
labels=(:μ_1, :μ_2, :σ_1, :σ_2, :prob)
P=[getindex.(res,i) for i in 1:5]
stats=getstats.(P)
for is in eachindex(stats)
println(labels[is], " ≡ " ,parameters[is], " → ", stats[is])
end
μ_1 ≡ 1.0 → (median = 0.9839341588022492, lowerbound = 0.9386423788048345, upperbound = 1.034681505133927)
μ_2 ≡ 0.0 → (median = 0.026594220664156364, lowerbound = -0.06022689038612259, upperbound = 0.09780361725015226)
σ_1 ≡ 0.2 → (median = 0.19721174143790032, lowerbound = 0.16082237655575468, upperbound = 0.2410458954711207)
σ_2 ≡ 2.0 → (median = 2.0587464236410202, lowerbound = 1.9110899977227997, upperbound = 2.1990909848399376)
prob ≡ 0.4 → (median = 0.4167638523309556, lowerbound = 0.3735619960917253, upperbound = 0.459723599251158)
The inferred parameters are close to nominal values
This page was generated using Literate.jl.