Solve Interface
CommonSolve.solve — Functionsolve(problem::AbstractIntervalMDPProblem, alg::RobustValueIteration; callback=nothing)Solve minimizes/maximizes optimistic/pessimistic specification problems using value iteration for interval Markov processes.
It is possible to provide a callback function that will be called at each iteration with the current value function and iteration count. The callback function should have the signature callback(V::AbstractArray, k::Int).
solve can be called without specifying the algorithm, in which case it defaults to RobustValueIteration.
Examples
using IntervalMDP
prob1 = IntervalAmbiguitySets(;
lower = [
0.0 0.5
0.1 0.3
0.2 0.1
],
upper = [
0.5 0.7
0.6 0.5
0.7 0.3
],
)
prob2 = IntervalAmbiguitySets(;
lower = [
0.1 0.2
0.2 0.3
0.3 0.4
],
upper = [
0.6 0.6
0.5 0.5
0.4 0.4
],
)
prob3 = IntervalAmbiguitySets(;
lower = [
0.0 0.0
0.0 0.0
1.0 1.0
],
upper = [
0.0 0.0
0.0 0.0
1.0 1.0
]
)
transition_probs = [prob1, prob2, prob3]
initial_state = [1]
mdp = IntervalMarkovDecisionProcess(transition_probs, initial_state)
# output
FactoredRobustMarkovDecisionProcess
├─ 1 state variables with cardinality: (3,)
├─ 1 action variables with cardinality: (2,)
├─ Initial states: [1]
├─ Transition marginals:
│ └─ Marginal 1:
│ ├─ Conditional variables: states = (1,), actions = (1,)
│ └─ Ambiguity set type: Interval (dense, Matrix{Float64})
└─Inferred properties
├─Model type: Interval MDP
├─Number of states: 3
├─Number of actions: 2
├─Default model checking algorithm: Robust Value Iteration
└─Default Bellman operator algorithm: O-Maximizationreach_states = [3]
time_horizon = 10
prop = FiniteTimeReachability(reach_states, time_horizon)
spec = Specification(prop, Pessimistic, Maximize)
# output
Specification
├─ Satisfaction mode: Pessimistic
├─ Strategy mode: Maximize
└─ Property: FiniteTimeReachability
├─ Time horizon: 10
└─ Reach states: CartesianIndex{1}[CartesianIndex(3,)]# Verification
problem = VerificationProblem(mdp, spec)
sol = solve(problem, RobustValueIteration(default_bellman_algorithm(mdp)); callback = (V, k) -> println("Iteration ", k))
V, k, res = sol # or `value_function(sol), num_iterations(sol), residual(sol)`
# output
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
Iteration 7
Iteration 8
Iteration 9
Iteration 10
IntervalMDP.VerificationSolution{Float64, Vector{Float64}, Nothing}([0.9597716063999999, 0.9710050144, 1.0], [0.01593864639999998, 0.011487926399999848, -0.0], 10, nothing)
# Control synthesis
problem = ControlSynthesisProblem(mdp, spec)
sol = solve(problem, RobustValueIteration(default_bellman_algorithm(mdp)); callback = (V, k) -> println("Iteration ", k))
σ, V, k, res = sol # or `strategy(sol), value_function(sol), num_iterations(sol), residual(sol)`
# output
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
Iteration 7
Iteration 8
Iteration 9
Iteration 10
IntervalMDP.ControlSynthesisSolution{TimeVaryingStrategy{1, Vector{Tuple{Int32}}}, Float64, Vector{Float64}, Nothing}(TimeVaryingStrategy{1, Vector{Tuple{Int32}}}(Vector{Tuple{Int32}}[[(1,), (2,), (1,)], [(1,), (2,), (1,)], [(1,), (2,), (1,)], [(1,), (2,), (1,)], [(1,), (2,), (1,)], [(1,), (2,), (1,)], [(1,), (2,), (1,)], [(1,), (2,), (1,)], [(1,), (2,), (1,)], [(1,), (2,), (1,)]]), [0.9597716063999999, 0.9710050144, 1.0], [0.01593864639999998, 0.011487926399999848, -0.0], 10, nothing)IntervalMDP.residual — Functionresidual(s::VerificationSolution)Return the residual of a verification solution.
residual(s::ControlSynthesisSolution)Return the residual of a control synthesis solution.
IntervalMDP.num_iterations — Functionnum_iterations(s::VerificationSolution)Return the number of iterations of a verification solution.
num_iterations(s::ControlSynthesisSolution)Return the number of iterations of a control synthesis solution.
IntervalMDP.value_function — Functionvalue_function(s::VerificationSolution)Return the value function of a verification solution.
value_function(s::ControlSynthesisSolution)Return the value function of a control synthesis solution.
IntervalMDP.strategy — Methodstrategy(s::ControlSynthesisSolution)Return the strategy of a control synthesis solution.
IntervalMDP.StationaryStrategy — TypeStationaryStrategyA stationary strategy is a strategy that is the same for all time steps.
IntervalMDP.TimeVaryingStrategy — TypeTimeVaryingStrategyA time-varying strategy is a strategy that may vary over time. Since we need to store the strategy for each time step, the strategy is finite, and thus only applies to finite time specifications, of the same length as the strategy.
VI-like Algorithms
IntervalMDP.RobustValueIteration — TypeRobustValueIterationA robust value iteration algorithm for solving interval Markov decision processes (IMDPs) with interval ambiguity sets. This algorithm is designed to handle both finite and infinite time specifications, optimizing for either the maximum or minimum expected value based on the given specification.