Vision Target Estimator Deep Dive
PX4 v1.18 ExperimentalThis guide expands on the Vision Target Estimator module overview and targets advanced users who want to tune, extend, and debug the estimator in detail. It documents the system architecture of the Vision Target Estimator, and outlines workflows for log analysis and sensor integration.
Table of Contents
System Architecture
The implementation is split across a scheduler, a task layer, and two independent estimators:
VisionTargetEst(src/modules/vision_target_estimator/VisionTargetEst.cpp) owns the work-queue task. Its main loop handles generic vehicle inputs (vehicle_attitude,vehicle_acceleration,vehicle_local_position,vehicle_gps_position, and angular rates), downsamples acceleration, publishesvte_input, and sends the samples to the position and orientation filters every 20 ms (50 Hz). It also owns a priority-ordered registry of runtime tasks and keeps the generic logic for task dispatch, estimator start/stop, idle scheduling, and timeout handling.tasks/VteTask.hdefines the task interface and theVTE_TASK_MASKbit constants. A task owns all state that only exists for one VTE use-case, such as subscriptions to external status topics, cached mission setpoints, and end-of-task conditions.VisionTargetEstpolls each enabled task, then activates the first task in the registry whose readiness conditions are satisfied.tasks/PrecLandTask.cppimplements the precision-landing task. It monitorsprec_land_status, caches the land waypoint fromnavigator_mission_item(withposition_setpoint_tripletas a fallback), and completes when precland stops or touchdown is detected.VTEPosition(Position/VTEPosition.cpp) owns the three per-axis position filters. It maintains observation buffers, enforces timeouts, and publisheslanding_target_pose,vte_position, and everyvte_aid_*innovation topic. The state vector is[r, v^uav, b](relative position, vehicle velocity, GNSS-vs-vision bias); see Dynamic models.- It also runs the bias-initialization state machine described in Bias Initialization Design, so the bias
bis reconciled before it is fused. - The math is delegated to
Position/KF_position.cpp, which calls the SymForce-generated routines documented in SymForce-generated derivations. - Helper unions
SensorFusionMaskUandObsValidMaskUmirror the bit layout ofVTE_AID_MASK, making it straightforward to add new observation types (Adding new measurement sources).
- It also runs the bias-initialization state machine described in Bias Initialization Design, so the bias
VTEOrientation(Orientation/VTEOrientation.cpp) handles yaw fusion with the same measurement staging helpers as the position filter, on a smaller[psi, psi_dot]state. The math lives inOrientation/KF_orientation.cpp; prediction and the transition matrix are SymForce-generated, while innovation and correction stay hand-written so yaw can be wrapped to. - Shared utilities live in
common.handVTEOosm.h(templated OOSM manager, see OOSM Implementation).
text
VisionTargetEst (work item, 50 Hz)
├── tasks/ priority-ordered registry, scheduler picks the first ready task
│ ├── VteTask task interface + VTE_TASK_MASK bits
│ └── PrecLandTask precision-landing activation/exit logic
├── VTEPosition per-axis filters, observation buffers, vte_* topic publication
│ ├── KF_position 1D Kalman math (SymForce-generated predict/correct)
│ └── bias init state machine reconciles GNSS-vs-vision bias before fusion
├── VTEOrientation yaw fusion
│ └── KF_orientation SymForce-generated predict, hand-written correct with yaw wrap
└── shared utilities common.h, VTEOosm.h (templated OOSM history buffer)Bias Initialization Design
The bias state exists because the absolute target reference and the relative vision reference do not necessarily agree:
- If
VTE_AID_MASKuses the mission landing point, that waypoint can be offset from the real pad. - If
VTE_AID_MASKuses target GNSS, the target receiver can still be off by metres when it is not RTK-grade. - Vision measures the target relative to the vehicle, so it can be locally accurate even when the absolute GNSS reference is biased.
VTE therefore models GNSS-like position observations as z = r + b once the bias becomes observable. This lets corrected GNSS continue to point to the target when vision drops out.
Naive bias initialization is not robust:
- Zero bias assumes GNSS and vision already agree. If the real offset is large, the first vision updates can look like outliers and get rejected.
- Instantaneous bias from the first sample trusts a single early vision frame. If that frame is noisy and vision is lost soon after, the estimator can continue descending with the wrong corrected GNSS offset.
The implementation therefore uses two different bias-initialization paths depending on which position source is trusted first:
- Startup prerequisites:
VTEPosition::initializeEstimator()requires a recent local-velocity or UAV-GNSS-velocity estimate before the filter can start. This is needed to initialize thev^{uav}state and also to support the GNSS-to-vision time alignment used by the bias logic. - Raw bias sample: whenever a valid GNSS-relative sample exists, the code computes the initial bias from
selectBiasGnssSample(sample_time), wheresample_timeis the vision timestamp. If the GNSS-relative sample is older than vision but still valid, the helper propagates it forward with the UAV velocity estimate. In other words, the bias logic usespos_rel_gnss(t_vision), not the stale storedpos_rel_gnss. - Why the behavior is asymmetric: if GNSS is already driving the position state, feeding in raw vision before the bias is ready would mix two frames that can differ by metres. That is why VTE delays vision in the GNSS-first case. If vision is already driving the position state, the safer choice is the opposite: keep the trusted vision-based
rand solve for the bias immediately when GNSS and vision overlap for the first time. - Immediate activation case: when the estimator is already vision-referenced, VTE computes
b = pos_rel_gnss(t_vision) - pos_rel_visionand keepsr = pos_rel_vision. The same immediate-activation path is also used when VTE_BIA_AVG_TOUT=0, even if GNSS was active first. - Averaging case: when GNSS is active first and averaging is enabled, vision updates the LPF at the vision rate while GNSS remains the active position source. The raw sample is still
pos_rel_gnss(t_vision) - pos_rel_vision, so the rate mismatch between GNSS and vision does not force the estimator to discard intermediate vision frames. - Exit condition: the LPF is accepted only after 5 consecutive raw-bias delta norms stay below VTE_BIA_AVG_THR and at least
2 * tauhas elapsed (tau = 0.3 s), or when VTE_BIA_AVG_TOUT expires. - Activation after averaging: once the LPF is accepted, the state reset is
activateBiasEstimate(gnss_sample - filtered_bias, filtered_bias), which meansr = pos_rel_gnss(t_vision) - b_filtered,b = b_filtered. - Stale-GNSS fallback: if GNSS goes stale during averaging,
selectBiasGnssSample()fails and the code intentionally switches to the current vision position while keeping the current filtered bias. This is the only bias-initialization branch that does not usepos_rel_gnss(t_vision), because no valid GNSS sample exists anymore. - No re-initialization after recovery: once the bias is set for the current estimator run, a temporary vision dropout does not restart the averaging phase. When vision returns, the existing bias stays active.
For the runtime tuning of the bias state (how aggressively it follows new observations once activated), see Tuning the bias state in the next section.
Balancing Process and Observation Noise
Two distinct noise tunings shape the filter's behaviour, and the right trade-off has to be found on both:
- The observation noise of each sensor sets how much that sensor is trusted relative to others within a single fusion step. Larger reported variance means a smaller Kalman gain and less influence on the state.
- The process noise of the prediction model sets how much the prediction is trusted between updates. A stiffer prediction (smaller process noise) shrinks the Kalman gain because the predicted covariance
is smaller, so each observation moves the state less.
The two interact through the Kalman gain VTE_*_NOISE parameters can only raise VTE_*_NOISE on each sensor).
Pushing either knob to an extreme produces the same failure mode:
much larger than (loose prediction, or a noise minimum so low that matches a sensor under-reporting its noise) the state chases every sample. much smaller than (stiff prediction, or a noise minimum so high that good sensor variance is overridden) the filter under-reacts to legitimate corrections.
Tune them together so the resulting gain produces the response speed and noise rejection you actually want.
Between Observation Sources
How much the filter trusts vision relative to GNSS at any given moment is decided by the per-sample observation variance each sensor reports (fiducial_marker_pos_report.cov_rel_pos, sensor_gps.eph, sensor_gps.epv, target_gnss.s_acc_m_s). The bias-initialization section above explained how the two frames are reconciled; this part covers how trust is split between sources within a single fusion step.
The estimator can only override the sensor in one direction. Each fused source has a noise-floor parameter that imposes a lower bound on its standard deviation; the effective observation variance used for fusion is:
| Parameter | Units | Default | Applies to |
|---|---|---|---|
| VTE_EVP_NOISE | m | 0.10 | Vision relative position |
| VTE_EVA_NOISE | rad | 0.07 | Vision yaw |
| VTE_GPS_P_NOISE | m | 0.50 | GNSS position (target and mission) |
| VTE_GPS_V_NOISE | m/s | 0.30 | UAV and target GNSS velocity |
These are standard deviations, not variances; the estimator squares them internally before clamping. They only matter when the sensor under-reports its own noise. If the reported variance already sits above the floor, the parameter has no effect.
Two common mis-tunes:
- Floor too tight (for example
VTE_EVP_NOISE = 0.01for a marker pipeline that is really 10 cm accurate): the filter is told to trust every sample to within 1 cm, so legitimate per-sample noise looks like outliers, the VTE_POS_NIS_THRE gate fires often, and the few samples that pass cause state overshoots. - Floor too loose (for example
VTE_GPS_P_NOISE = 5.0with an RTK receiver reporting 3 cmeph): the floor wins and the filter throws away the receiver's confidence.vte_position.cov_rel_posstays close to the prediction variance and the bias is never refined during descent.
To tune, overlay vte_aid_fiducial_marker.observation_variance and vte_aid_gps_pos_target.observation_variance (or vte_aid_gps_pos_mission) on a typical descent and compare each curve to the accuracy you actually expect from the sensor at the relevant operating distance. A healthy plot shows a crossover where the two sources are roughly equal, with vte_position.cov_rel_pos staying below both, as shown in the estimator-output plot further down this page.
Process Noise: Variance Rates
All process-noise parameters are continuous-time power spectral densities (PSDs). A PSD describes the strength of a continuous white-noise process: it is the variance the noise injects per second into the state it directly drives, with units chosen so that PSD × time is the variance of that state. See Spectral density (Wikipedia) for background.
| Parameter | Drives directly | Unit | What it represents |
|---|---|---|---|
| VTE_BIAS_UNC | bias (m) | PSD of the bias random walk | |
| VTE_ACC_D_UNC | vel_uav (m/s) | PSD of white acceleration noise on the UAV acceleration input | |
| VTE_YAW_ACC_UNC | yaw_rate (rad/s) | PSD of white yaw-acceleration noise | |
| VTE_ACC_T_UNC | acc_target (m/s²) | PSD of white jerk noise driving the target-acceleration random walk (moving-target builds) |
The unit always follows the same rule: it is [unit of the directly driven state]² / s. The bias is in metres, so its PSD is m²/s; the UAV velocity is in m/s, so the acceleration PSD that drives it is (m/s)²/s = m²/s³; and so on. The YAML may list the same unit in two equivalent forms (for example m²/s³ and (m/s²)²/Hz, since Hz = 1/s), but both refer to the same physical quantity.
Variance rate, not linear drift rate. This is the most common source of confusion. A spectral density is not a velocity. Every predict step the integrated process-noise contribution adds a term proportional to spectral_density * dt to the directly driven variance, so the 1-sigma expected change of that quantity grows with the square root of elapsed time:
Because the noise is scaled by dt at every predict step, the value is independent of the filter update rate. Running the filter at 25 Hz, 50 Hz, or 100 Hz produces the same long-term allowance, so no retuning is required if the rate changes.
Tuning the UAV Acceleration Process Noise
VTE_ACC_D_UNC (unit: vel_uav state. Larger values let measurements pull the state harder between updates; smaller values lock the state to the IMU-driven prediction.
Applying the
| Elapsed time | VTE_ACC_D_UNC = 0.02 (default) | VTE_ACC_D_UNC = 0.2 | VTE_ACC_D_UNC = 1.0 |
|---|---|---|---|
| 0.1 s | |||
| 1 s | |||
| 10 s |
A useful first estimate comes from vte_input.acc_xyz: on a representative segment, read off the peak swing
Symptoms and fixes (visible on vte_position.rel_pos overlaid with the matching vte_aid_*.observation):
- Staircase between fusions: the state drifts away from the observation between samples and snaps back at each fusion.
VTE_ACC_D_UNCis too low. Raise it. Before pushing it far, first checkvte_input.acc_xyzfor a persistent bias (attitude/gravity leakage, lever-arm error, or real vehicle acceleration); raising process noise does not remove the bias itself. - State copies the per-sample jitter of the measurement, and the vehicle oscillates near the ground:
VTE_ACC_D_UNCis too high. Lower it, or raise the measurement-noise floor of the chased sensor (VTE_EVP_NOISE for vision, VTE_GPS_P_NOISE for GPS).
After every change, vte_aid_*.innovation should look like zero-mean white noise rather than ramping with one sign or carrying a persistent offset.
Tuning the Bias State
Once the bias is activated, two parameters govern how aggressively the filter lets vte_position.bias follow new observations:
- VTE_BIAS_UNC (unit:
, default ) is the random-walk process-noise spectral density of the bias state. Every predict step adds VTE_BIAS_UNC * dtto the bias variance, so this parameter is the runtime knob governing how much the bias can change between observations once the filter is settled. - VTE_BIA_UNC_IN (unit:
, default ) is the initial bias variance applied at filter initialization and at bias activation. Once fusion has run for a few seconds, the runtime covariance is dominated by VTE_BIAS_UNCand the measurement updates, so this parameter does not influence steady-state bias behaviour. Keep it large so initialization can absorb initial misalignments.
Applying the VTE_BIAS_UNC = 0.01
| Elapsed time | VTE_BIAS_UNC = 0.01 | VTE_BIAS_UNC = 0.001 | VTE_BIAS_UNC = 0.0001 |
|---|---|---|---|
| 1 s | |||
| 10 s | |||
| 36 s | |||
| 100 s |
Pick VTE_BIAS_UNC such that T. For consumer GNSS this is typically a few centimetres over tens of seconds. RTK is even slower.
Symptoms and fixes:
vte_position.biasjumps to chase a single bad vision sample that just passed the VTE_POS_NIS_THRE gate:VTE_BIAS_UNCis too high. The runtime bias variance has grown to the point where the Kalman gain on bias dominates the response to vision innovations, so the filter passes the inconsistency into the bias state. Lower the value.vte_position.biasnever settles after activation and keeps drifting on a stationary target:VTE_BIAS_UNCis too low. The bias variance is too tight to absorb legitimate slow corrections, so the filter pushes them intopos_relor rejects them via the NIS gate. Raise the value.- Bias activates correctly but corrected GNSS drifts visibly during a vision dropout:
VTE_BIAS_UNCallows more drift than the real receiver, so the bias has been overfit to recent vision. Combine a lowerVTE_BIAS_UNCwith a stricter VTE_POS_NIS_THRE or a higher VTE_EVP_NOISE floor to keep borderline vision samples out of the bias state.
VTE_BIAS_UNC complements the outlier-rejection gate (VTE_POS_NIS_THRE) and the vision-noise floor (VTE_EVP_NOISE). A well-tuned filter combines all three: a realistic bias variance rate, a sensible NIS threshold, and a vision-noise floor that matches the actual quality of the relative-position sensor.
Tuning the Yaw Rate
The yaw filter uses VTE_YAW_ACC_UNC as the spectral density of white yaw-acceleration noise that drives the yaw-rate state, with the same
Worked example. With the default
Case study: yaw oscillation. The Orientation Filter plot further down was generated with VTE_EVA_NOISE at 4 deg (about 0.07 rad) and VTE_YAW_ACC_UNC at VTE_YAW_ACC_UNC) and the observations less (larger VTE_EVA_NOISE) breaks that loop: the state stays close to the underlying yaw trend, the drone holds steady, and the observations themselves become cleaner. Lower these defaults only if the target is truly rotating or your camera is exceptionally accurate.
When the State Follows Per-Sample Jitter
If the state in your logs follows the per-sample jitter rather than the underlying trend (the Smoothing Through Bounded Noise plot illustrates the healthy case), three knobs make the output smoother:
- Raise the observation-noise floors VTE_EVP_NOISE (vision), VTE_EVA_NOISE (vision yaw), or VTE_GPS_P_NOISE (GNSS) so the effective per-sample variance matches the actual sensor noise. This is the right fix whenever the sensor under-reports its own accuracy: a larger
shrinks the Kalman gain, so each sample moves the state less. The clamping mechanism and typical mis-tunes are detailed in Between observation sources above. - Reduce the prediction process noise VTE_ACC_D_UNC, and for moving targets VTE_ACC_T_UNC, only when the predicted state is already unbiased and the problem is excessive responsiveness to zero-mean jitter. A smaller predicted variance
also shrinks the Kalman gain. If GNSS bias drift is contributing to the jitter, the same applies to VTE_BIAS_UNC; see Tuning the bias state for worked examples. Set the prediction noise too low and the filter lags real motion or rejects legitimate dynamics. - Tighten the chi-squared gate VTE_POS_NIS_THRE (or VTE_YAW_NIS_THRE) only if the noise is a mix of in-distribution jitter and occasional outliers. Against uniformly bounded noise every sample stays inside the gate, so lowering the threshold just throws good data away. This knob targets outliers, not jitter.
The underlying trade-off is the process-noise versus measurement-noise balance: each step toward a smoother state is also a step toward slower response to legitimate motion. Watch vte_aid_*.innovation after retuning. A healthy filter still has zero-mean white-noise innovations; a growing mean or persistent offset signals that the response has been damped too much.
Time Alignment
Vision and GNSS observations can arrive delayed due to transport and processing latency. The position and orientation filters therefore support an Out-of-Sequence Measurements (OOSM) approximation which uses a history-consistent projected correction strategy.
OOSM Implementation
Each filter maintains a fixed-size ring buffer of recent state snapshots spanning roughly the last half second of operation (25 samples ≈ 0.5 s at 50 Hz). For the position filter the snapshot stores
Retrieve: fetch the closest:
before . Predict: use the KF model to predict the state
using (for the orientation filter the control-input term is omitted). Innovate: compute the innovation
and innovation variance : Correct: compute the optimal correction vector:
Project: project this correction forward with the state transition matrix
: Apply: apply the projected correction to the current live state and to every stored history sample
that occurred after , i.e. . The corrected posterior at is also written back into the history (replacing or inserting a sample at that timestamp), so a second delayed measurement in the same interval replays from the already-corrected state instead of from the stale pre-measurement floor.
Updating the history buffer keeps it self-consistent for subsequent delayed measurements and provides the practical benefits of a lag-smoother while remaining deterministic (fixed buffer, bounded runtime) and avoiding a full backward smoother over the timeline.
The state prediction includes the control input,
OOSM Approximation Assumptions
In Algorithm I of Zhang et al. Optimal Update with Out-of-Sequence Measurements, the optimal gain for an Out-of-Sequence Measurement (OOSM) depends on
With the optimal recursion Eq. 5:
First approximation: projected gain
To avoid the computational complexity and storage required to track every intermediate gain
Under this assumption, the recursion in Eq. (5) simplifies to a pure dynamic propagation:
By iterating this simplified recursion from the measurement time
Substituting this approximation back into the gain formula:
Since the standard Kalman gain at time
Effectively, this justifies the "Project" step in the implementation: we calculate the correction vector using the snapshot at
Second approximation: stored prediction instead of full smoothing
The paper's globally optimal update uses the smoothed past-state estimate and covariance.
- Optimal innovation: The innovation in Eq. (3) is defined as
This is the estimate of the state at time conditioned on all measurements up to time . - Optimal covariance: The innovation covariance
in Eq. (4) uses , which is the error covariance at time given all data up to .
The approximation: The current implementation calculates the innovation using a snapshot retrieved from history (
- State approximation: The predicted state
(or filtered state ) is substituted for the smoothed state . - Covariance approximation: Similarly,
is calculated using (which is ), substituting the prediction covariance for the smoothed covariance.
Justification and impact: Calculating the exact
Dynamic Model Process Noise
Process Noise Computation
The deterministic prediction uses the measured inputs to advance the mean state. Process noise describes how wrong that prediction might be. For the static-target position filter, the per-axis continuous-time model can be read as:
where:
ris target position relative to the vehicle.v^{uav}is the vehicle velocity.a^{uav}_{meas}is the NED acceleration sample fromvte_input.acc_xyz.w_ais unknown acceleration error on top of the measured acceleration.w_bis unknown GNSS/vision bias drift.
This is the key intuition: VTE_ACC_D_UNC is an acceleration-error spectral density, but that error is applied to the derivative of velocity. Position is affected one integration later because relative position depends on vehicle velocity.
Over one prediction interval dt, a white acceleration-error spectral density q_a = VTE_ACC_D_UNC contributes:
The negative covariance term is expected: in the static-target model, a positive vehicle-velocity error makes the predicted relative position decrease.
The bias noise is simpler. Bias does not feed any other state in the prediction model, so VTE_BIAS_UNC contributes only:
In moving-target builds, VTE_ACC_T_UNC is target jerk noise. It drives the derivative of target acceleration:
The target acceleration noise then reaches target velocity after one integration and relative position after two integrations.
The implementation integrates the full continuous-time noise response, including cross-covariances, and SymForce writes the closed-form result into Position/vtest_derivation/generated/predictCov.h. The standard continuous-to-discrete expression is
but users normally do not need to manipulate this formula. Practically, tune the spectral densities using the variance-rate interpretation in Process noise: variance rates. The yaw filter uses the same idea for VTE_YAW_ACC_UNC: yaw-acceleration error drives yaw-rate first, and yaw angle after one integration.
Assumptions
- Gaussian white noise. Unknown physical disturbances are modelled as zero-mean continuous-time Gaussian white noise. For the UAV, that means unmeasured forces (gusts, motor jitter not captured by the accelerometer) enter as white acceleration noise on top of the measured input. For a moving target, target jerk is modelled as a white-noise driver, so target acceleration evolves as a Brownian-motion random walk.
- Random-walk bias. The GNSS-to-vision bias is modelled as a pure random walk. The offset between the absolute GNSS frame and the relative vision frame drifts slowly and has no deterministic high-frequency dynamics.
- Axis decoupling. The three NED axes are integrated independently. Cross-axis aerodynamic coupling exists in reality, but ignoring it keeps each filter one-dimensional with negligible loss in precision-landing accuracy.
Moving-target Mode (experimental)
WARNING
The moving-target mode is marked experimental. Initial flight tests have been performed (see Moving target precision landing further down), but coverage is limited compared to the static mode. Use it only for controlled experiments.
When the firmware is built with CONFIG_VTEST_MOVING=y, the position filter is extended with two additional states per axis to track target dynamics:
Target jerk is modelled as continuous-time white noise with PSD VTE_ACC_T_UNC; see Process noise computation. The initial target-acceleration variance is set by VTE_ACC_UNC_IN.
Build Setup
The moving-target firmware is selected by CONFIG_VTEST_MOVING=y in the KConfig board configuration. Two prebuilt targets are available:
make px4_fmu-v6c_visionTargetEstMoving: experimental moving-target hardware build.make px4_sitl_vtest-moving: SITL experimental moving-target build.
Target GNSS Velocity Fusion
With moving-target builds, bit 4 of VTE_AID_MASK enables fusion of the target GNSS velocity from the target_gnss topic with observation model
Precision-Landing Projection
The estimator publishes the current target pose and velocity only. The precision-landing controller applies a lookahead using PLD_MOVING_T_MIN and PLD_MOVING_T_MAX. It computes an intersection time
The two bounds play different roles:
- PLD_MOVING_T_MAX caps how far ahead the setpoint can lead the target. Without a cap, at high altitude the projected setpoint can sit so far ahead that the camera no longer sees the target (target lost on vision), or so far that the horizontal error stays larger than PLD_HACC_RAD forever (vehicle never enters the descent phase).
- PLD_MOVING_T_MIN sets a floor on the lookahead so the vehicle is still leading the target at touchdown. This matters because the horizontal velocity at contact should match the target's, not drop to zero.
Tuning rule of thumb. For an expected maximum target speed PLD_MOVING_T_MAX so the maximum lookahead offset stays inside the acceptance radius:
Worked example. With a target at 0.5 m/s and PLD_HACC_RAD at the 0.2 m default, the bound is PLD_MOVING_T_MAX raised together.
For PLD_MOVING_T_MIN, pick a value at least as large as the touchdown altitude window divided by the descent speed:
For a vehicle that descends at 0.5 m/s with the last 0.1 m being the touchdown window,
Gazebo Simulation
The moving-target SITL setup reuses the same land_pad.sdf model as the static configuration. Build with CONFIG_VTEST_MOVING=y so the estimator tracks the target velocity, then edit Tools/simulation/gazebo-classic/sitl_gazebo-classic/models/land_pad/land_pad.sdf to enable pad motion through the libgazebo_random_velocity_plugin.so plugin:
<initial_velocity>0.5 0 0</initial_velocity>applies a constant velocity along the pad's X axis at simulation start.<velocity_factor>0.5</velocity_factor>scales the magnitude of newly drawn random velocities. Set to0for purely constant motion (matches the plot below).<update_period>500</update_period>is the period in seconds between new random velocities.<min_x>,<max_x>,<min_y>,<max_y>clamp the velocity range per axis. Keep<min_z>0</min_z><max_z>0</max_z>so the pad stays on the ground.
Tips for a stable touchdown on a moving target in SITL:
- Enable target GNSS position and target GNSS velocity aiding (VTE_AID_MASK bits 0 and 4). The target GNSS velocity is the only direct observation of
; without it the target-velocity state is poorly observed, the lookahead is noisy, and touchdown accuracy suffers. - Raise PLD_HACC_RAD (try 2 to 3 m to start). With the 0.2 m default, the lookahead-projected setpoint will sit outside the acceptance radius at altitude even for slow targets, blocking the transition to descent.
- Make the pad visually larger so vision detects it from a higher altitude (raise the visual box in
land_pad.sdfto1.5 1.5 0.01). - Validate on a static pad first: build with
CONFIG_VTEST_MOVING=ybut keep<initial_velocity>0 0 0</initial_velocity>for the first runs. Confirm that the moving-mode filter is well behaved on a static target, then introduce motion. This separates filter problems from precision-landing-projection problems.
Log Analysis and Expected Plots
This section provides an overview of the topics and fields that matter during log review: the published estimator outputs and the upstream input feeds. Then log analysis guidance is provided in What to look for in logs and the troubleshooting checklist. The plot examples at the end illustrate the expected convergence behaviour.
Estimator Outputs
- Full state estimations with variance and validity flags for position and orientation filters:
vte_positionvte_orientation
vte_bias_init_status: raw GNSS/vision bias sample, low-pass filtered bias, and the norm of the consecutive raw-bias delta while the initial bias averaging phase is running.landing_target_pose: controller-facing pose plusrel_pos_valid,rel_vel_valid, andrel_vel_ekf2_validbooleans.vte_input: downsampled acceleration in NED and the quaternion fed to each prediction step. Can be used to correlate estimator behaviour with vehicle attitude changes.vte_aid_*: one topic per fused source (vte_aid_gps_pos_target,vte_aid_gps_pos_mission,vte_aid_gps_vel_uav,vte_aid_gps_vel_target,vte_aid_fiducial_marker,vte_aid_ev_yaw). Each publishes the observation, observation variance, innovation, innovation variance, chi-squaredtest_ratio, thefusion_statusoutcome code (per-axis on the 3D variant), and OOSM diagnostics (time_since_meas_ms,history_steps,time_last_predict). See Aid-source diagnostics for the meaning of eachfusion_statusvalue.
Aid-Source Diagnostics
Every fusion attempt is logged on the corresponding vte_aid_* topic. The fusion_status field records which fusion branch the filter took:
| Value | Status | Meaning | Typical action |
|---|---|---|---|
| 0 | STATUS_IDLE | No fusion attempted on this axis (e.g. measurement not provided for that axis). | None. |
| 1 | STATUS_FUSED_CURRENT | Measurement fused immediately against the live state. | Healthy. |
| 2 | STATUS_FUSED_OOSM | Measurement was delayed and fused via the OOSM history buffer; check history_steps, and time_since_meas_ms. | Healthy as long as history_steps stays below the buffer depth kOosmHistorySize = 25. |
| 3 | STATUS_REJECT_NIS | Innovation failed the chi-squared gate. | Check sensor variance, frame rotations, time skew. Compare with test_ratio and the matching vte_input sample. |
| 4 | STATUS_REJECT_COV | Innovation covariance was non-finite or non-positive. | Numerical issue: check process-noise parameters and that the state has not diverged. |
| 5 | STATUS_REJECT_TOO_OLD | Sample older than the OOSM buffer (kOosmMaxTimeUs = 500 ms) or older than the oldest stored snapshot. | Reduce sensor latency, raise the publication rate, or check transport timestamps. |
| 6 | STATUS_REJECT_TOO_NEW | Sample timestamped in the future relative to the latest filter prediction. | Time-sync problem: verify mavlink_timesync or the sensor clock. |
| 7 | STATUS_REJECT_STALE | Filter history was reset because no prediction had run for longer than the buffer span. | Expected after long pauses; otherwise investigate scheduler stalls. |
| 8 | STATUS_REJECT_EMPTY | History buffer not yet populated when the measurement arrived. | Expected on the first cycles after the estimator starts; persistent occurrences indicate a startup race. |
Two additional fields make latency tractable without manual timestamp arithmetic:
time_since_meas_ms = (time_last_predict - timestamp_sample) * 1e-3. Negative values mean the measurement is timestamped after the latest prediction (i.e. the filter has not caught up yet).history_stepsis non-zero only forSTATUS_FUSED_OOSMand tells you how many 20 ms snapshots were replayed when projecting the correction forward.
Main Input Feeds
- Vision:
fiducial_marker_pos_report/fiducial_marker_yaw_report - GNSS on the target:
target_gnss - Vehicle GNSS:
sensor_gps(used to convert absolute GNSS measurements to relative vehicle-carried NED measurements) - Mission position: cached from
navigator_mission_item, withposition_setpoint_tripletas a fallback when a valid LAND setpoint is available there first vehicle_local_positionandvehicle_attitude(used for frame transforms and timeout checks)
What to Look For in Logs
- Estimator output vs. observations: In all axis directions, overlay the estimator outputs position
vte_position.rel_pos[0],vte_orientation.yawwith measurement observationsvte_aid_*.observation[0](e.g.vte_aid_fiducial_marker.observation[0]or the relevant GNSS observation). The traces should converge after a short transient. Large steady offsets point to calibration errors or incorrect measurement-frame-to-NED transforms. - Initial bias averaging: When GNSS is active before vision, inspect
vte_bias_init_status.raw_bias,vte_bias_init_status.filtered_bias, andvte_bias_init_status.delta_norm. Heredelta_normis the norm between consecutive raw bias samples, so it directly reflects the stability criterion used by the low-pass filter exit. The filtered bias should settle beforevte_position.biasis initialized. Large swings in the raw bias or a persistently largedelta_normpoint to unstable early vision data or GNSS/vision time-alignment problems. If vision is active first, you should normally see immediate bias activation instead of an averaging phase. - Innovation behaviour:
vte_aid_*.innovation(e.g.vte_aid_fiducial_marker.innovation) should be centred at zero and resemble white noise. Recall that the innovation is defined as the difference between the state prediction and the measurement observation of the state. It follows that the drifting innovations can come from:- State prediction errors:
- check the estimators state outputs which correspond to the prediction of the state when no measurements are fused
- check
vte_input(accelerations, attitude) - check the frequency of the prediction
dt
- State update errors:
- errors in the measurement implementation, compare the raw sensor input (e.g.
fiducial_marker_pos_report) with the processed observation (e.g.vte_aid_fiducial_marker) - time delays between the attitude of the drone and the observation (causing biases in the frame transform from sensor to NED)
- Incorrect noise assumptions
- errors in the measurement implementation, compare the raw sensor input (e.g.
- State prediction errors:
- Measurement acceptance: Inspect
fusion_statusalongsidetest_ratio.STATUS_FUSED_CURRENTandSTATUS_FUSED_OOSMare healthy; the variousSTATUS_REJECT_*codes each point to a different root cause (NIS gate, covariance, latency, time sync, scheduler stall). See Aid-source diagnostics for the action mapped to each code. Remember that gaps longer than VTE_TGT_TOUT clear the validity flags, and exceeding VTE_BTOUT resets the affected estimator; it restarts as soon as the task remains active and enabled fusion input is available. - Time alignment: Use
time_since_meas_mson thevte_aid_*topic; it encodestime_last_predict - timestamp_samplein milliseconds. For an immediately fused source it should stay within a few times the 20 ms estimator loop period. Larger steady-state offsets, or a growinghistory_stepsonSTATUS_FUSED_OOSM, indicate delayed delivery or a missing time synchronisation step. - Coordinate transforms: Plot the raw measurement (e.g.
fiducial_marker_pos_report.rel_pos[0]) together with its processed observationvte_aid_fiducial_marker.observation. Ensure that there is no mistake in the rotation (fiducial_marker_pos_report.q), covariance handling, or offsets applied in the handler. - Prediction inputs: Use
vte_inputto track the downsampled acceleration and quaternion. Compare them tovehicle_attitude.qandvehicle_accelerationto ensure the estimator sees the expected attitude, especially when diagnosing timestamp mismatches.
Troubleshooting Checklist
Start by confirming that the estimator is running (vision_target_estimator status) and that the relevant vte_aid_* topic is present in the log. If a topic is missing, verify that the upstream sensor message is publishing and that the fusion mask bit is enabled.
TIP
Task activation: Enable the debug bit in VTE_TASK_MASK. When the debug bit is set the estimator always runs, making it ideal for bench testing. Enable debug prints (PX4_DEBUG).
| Symptom | Likely cause | Remedy |
|---|---|---|
| Vehicle misses the pad or ignores target yaw | Mission land waypoint not set to precision mode or yaw alignment disabled | In QGroundControl set the land waypoint Precision landing field (or MAV_CMD_NAV_LAND param2) to Opportunistic/Required as described in Precision landing missions, and enable PLD_YAW_EN. In logs verify that trajectory_setpoint.x/y converge to landing_target_pose.x_abs/y_abs and that trajectory_setpoint.yaw follows vte_orientation.yaw. |
| Frequent innovation rejections | Incorrect noise floors, miscalibrated measurement variance, or timestamp skew | The NIS gate uses test_ratio = innov^2 / S with S = H P H^T + R, where R is the observation variance. If R is under-reported (measurement noise floor set too low), even small innovations exceed the gate and healthy data is rejected; if R is over-reported, even corrupted samples pass the gate and pollute the state. Read fusion_status to find which branch fires. STATUS_REJECT_NIS points at the noise floors and the VTE_POS_NIS_THRE / VTE_YAW_NIS_THRE thresholds: for vision raise VTE_EVP_NOISE or VTE_EVA_NOISE, for GNSS check VTE_GPS_P_NOISE and VTE_GPS_V_NOISE, and compare with the observation_variance field on the corresponding vte_aid_* topic. STATUS_REJECT_TOO_OLD / STATUS_REJECT_TOO_NEW point at latency or time-sync; time_since_meas_ms on the same vte_aid_* topic gives the latency directly. |
| Bias does not converge | No secondary position source, or the GNSS/vision agreement check is too strict for the available vision quality | Ensure vision fusion is enabled so the filter can observe GNSS bias. If GNSS is active before vision and the bias stays near zero, inspect the first GNSS/vision agreement window and tune VTE_BIA_AVG_THR / VTE_BIA_AVG_TOUT. If vision is active first, expect the bias to appear on the first overlapping GNSS+vision sample rather than after an averaging phase. |
| Orientation estimate drifts | Missing yaw measurements or low NIS gate | Enable VTE_YAW_EN and ensure vision yaw data is present. Increase VTE_YAW_NIS_THRE if legitimate data is being rejected. |
rel_pos_valid toggles during descent | Position measurements arriving too slowly | Increase VTE_TGT_TOUT or improve the measurement rate so updates remain inside VTE_M_REC_TOUT. |
No vte_aid_* topics in the log | Sensor not publishing or fusion mask disabled | Use listener on the raw sensor topic, confirm VTE_AID_MASK includes the relevant bit, and rerun the test with the debug task active. |
| Estimator never starts | Task mask disabled or mission not requesting precision landing | Set VTE_TASK_MASK=1 (or 3 for continuous debugging) and verify that new measurements arrive with valid timestamps. |
Plot Examples
The dashboards below provide hints on how to analyse the estimator:
- Nominal behaviour: how the position filter behaves, how the bias is initialised, what the innovations should look like, and how the orientation filter behaves.
- Out-of-sequence measurements (OOSM): how the filter handles delayed samples.
- Filter robustness: what happens when vision drops out, when observations are noisy, and when an outlier slips through.
- Precision landing on a static target: the controller actually lands on the pad.
- Moving target: precision landing on a target in motion.
TIP
PlotJuggler (or the PX4 DevTools log viewer) is the easiest way to inspect the estimator as it allows you to group related signals into subplots that share the time axis.
TIP
Adding new plots: the screenshots in this section use a fixed colour convention so the same family of signals is easy to recognise across dashboards. Re-use it when adding or updating plots:
- Blue (
#1f77b4): vision aid source (vte_aid_fiducial_marker.*) - Red (
#d62728): GNSS aid sources (vte_aid_gps_pos_target.*,vte_aid_gps_pos_mission.*,vte_aid_gps_vel_uav.*,vte_aid_gps_vel_target.*) - Green (
#1ac938): filter state output (vte_position.*) - Purple (
#bd22a0): controller-facing pose (landing_target_pose.*) - Orange (
#ec8414): EKF reference (vehicle_local_position.*)
Nominal Behaviour
Estimator output and observation consistency: Quick health check that the fused sensors agree with the estimated state during a real precision-landing approach using vision and the mission landing waypoint as the absolute reference.
- Top row (observations and state output):
vte_aid_fiducial_marker.observation[0],vte_aid_gps_pos_mission.observation[0],vte_position.rel_pos[0], and a custom tracegps_pos_mission_bias_compensated = vte_aid_gps_pos_mission.observation[0] - vte_position.bias[0]. The bias itself is not plotted explicitly here, it is consumed to produce the compensated trace. The filter follows vision most of the time because vision is the most precise source. - Second row (observation variances and posterior):
vte_aid_fiducial_marker.observation_variance[0],vte_aid_gps_pos_mission.observation_variance[0], andvte_position.cov_rel_pos[0]. At the vertical line, mission GNSS reports about, vision about , and the posterior vte_position.cov_rel_posabout, below either input. This is the Kalman filter doing its job: combining independent measurements produces a posterior variance smaller than any of its inputs. - Third row (vehicle velocity):
vte_aid_gps_vel_uav.observation[0]against the filter outputvte_position.vel_uav[0]. The filter trace stays smooth instead of copying the per-sample GNSS jitter, which confirms the velocity prediction and the GNSS velocity floor (VTE_GPS_V_NOISE) are tuned consistently.
What this plot tells you:
- Vision is the dominant source throughout the descent. The two observation variances never cross: vision is more precise than the mission GNSS at every altitude in this flight. This is intentional because the GPS module on the vehicle is not RTK-grade and the reported variance reflects that. If your receiver is more precise, retune VTE_GPS_P_NOISE so the floor does not throw away its confidence.
- Higher altitudes have noisier vision. At the start of the descent vision is noisier and
vte_position.rel_pos[0]deviates from individual vision samples. The vertical red line marks the moment vision becomes very precise; from there the filter locks onto vision. - A steady offset between the filter and the compensated GNSS is a tuning signal. During the last phase of the landing, a roughly 15 cm gap between
vte_position.rel_pos[0]andgps_pos_mission_bias_compensatedremains. If a comparable gap is persistent in your logs, the bias state can be too tight to absorb the slow drift. Loosen VTE_BIAS_UNC so the bias has more process noise, or lower VTE_GPS_P_NOISE so GNSS pushes the bias harder. Do not push either knob too far: a bias that moves too freely will drift when vision is lost, see Vision occlusion during descent. - End-of-flight vision loss. Vision stops detecting the target at the end of the trace and
vte_position.cov_rel_posstarts to climb. The corrected GNSS still points to the pad thanks to the latest bias value, so the vehicle keeps tracking it.

Initial bias averaging: Shows the GNSS/vision bias low-pass filter while the estimator is in the GNSS-first averaging phase. The plot was generated with VTE_BIA_AVG_TOUT=10s and VTE_BIA_AVG_THR=0.01 so the convergence threshold is never met and the full 10 s averaging window is exercised.
- Top row (raw vs filtered bias):
vte_bias_init_status.raw_bias(per-axis raw GNSS minus vision delta) overlaid withvte_bias_init_status.filtered_bias. The filtered bias is the LPF output (tau = 0.3 s); it should track the average of the raw samples and settle to a steady value as more vision observations arrive. - Second row (delta norm):
vte_bias_init_status.delta_normis the norm of consecutive raw-bias deltas which is the stability metric used by the convergence test. With VTE_BIA_AVG_THR=0.01 the filter would exit as soon as five consecutive deltas stay below 0.01 m and at least2 * tauhas passed. This threshold can be tuned based on the precision of the vision sensor. - Third row (state activation):
vte_position.biasjumps from zero to the activated bias once averaging completes. The reset usesr = pos_rel_gnss(t_vision) - b_filtered,b = b_filtered, so after activationvte_position.rel_posaligns with the vision observation whilevte_position.biasstores the GNSS offset.

Innovation consistency: Confirms that every fused sensor produces zero-mean white-noise innovations. Generated from the same real precision-landing flight as the dashboard above, with vision and the mission landing waypoint as the absolute reference. The dashboard intentionally pairs a healthy aid source (vision) with a less well-tuned one (mission GNSS) so the two patterns can be compared side by side.
- Top row (vision innovations):
vte_aid_fiducial_marker.innovation[0..2]on all three NED axes. The innovations look like white noise centred on zero, which is the healthy signature. They are slightly larger at higher altitudes where vision is noisier. - Second row (vision fusion status):
vte_aid_fiducial_marker.fusion_status[0..2]stays at 2 (STATUS_FUSED_OOSM), so every sample is fused via the OOSM history replay. The exact status code matters less than the fact that the sample is fused:STATUS_FUSED_CURRENTandSTATUS_FUSED_OOSMare both healthy. - Third row (mission GNSS innovations):
vte_aid_gps_pos_mission.innovation[0..2]. The innovations are not zero-mean white noise. A persistent bias is visible (at the vertical red line, aboutm in North, m in East, and m in Down), which means the bias state is too tight to absorb the slow drift between the GNSS frame and the vision frame, so the residual ends up in the innovation. Loosen VTE_BIAS_UNC to let the bias track more of that drift, but only enough that the bias still stays stable when vision is lost (see Vision occlusion during descent). - Bottom row (mission GNSS fusion status):
vte_aid_gps_pos_mission.fusion_status[0..2]is mostlySTATUS_FUSED_OOSM, so the GNSS variance is wide enough to absorb the biased innovation through the NIS gate. A short burst ofSTATUS_REJECT_TOO_OLD(status code 5) is also visible, meaning a GNSS sample arrived older than the OOSM buffer span (500 ms). To investigate, plotvte_aid_gps_pos_mission.time_since_meas_msnext (the OOSM under measurement delay dashboard is the right template).

Orientation filter: Validates that yaw aiding stays smooth yet responsive without making the drone oscillate. Generated from a real precision-landing flight; enable the orientation filter with VTE_YAW_EN to log the same signals.
- Top row (yaw observation vs. state):
vte_aid_ev_yaw.observationoverlaid withvte_orientation.yaw. The state should track the slow trend of the observation without copying its high-frequency jitter. - Second row (innovation, rad):
vte_aid_ev_yaw.innovationshould resemble white noise centred on zero. - Third row (innovation, deg): same signal converted to degrees, which is the unit most users intuitively reason about when setting VTE_EVA_NOISE.
- Bottom row (test ratio):
vte_aid_ev_yaw.test_ratioshould stay below VTE_YAW_NIS_THRE.
If the yaw state follows the observation noise, the solution is the same as for position: raise the observation floor or stiffen the prediction. See When the state follows per-sample jitter.

Out-of-Sequence Measurements
OOSM under measurement delay: Shows that delayed measurements are still consumed correctly thanks to the OOSM history replay. The plot was generated by delaying each simulated measurement by a uniform random value in [0, 500] ms before reaching the autopilot.
- Top two rows (measurement latency):
vte_aid_fiducial_marker.time_since_meas_msandvte_position.history_steps. With a 0-500 ms delay we observetime_since_meas_msranging up to ~450 ms andhistory_stepsproportional to it (history_steps ≈ time_since_meas_ms / 20, capped at the 25-sample buffer). - Third row (raw vs processed observation):
vte_position.rel_posoverlaid withvte_aid_fiducial_marker.observation. The processed observation lags the live state by roughly the measurement delay. The OOSM allows fusing old data without forcing the live state to wait for it. - Fourth row (innovation):
vte_aid_fiducial_marker.innovation[0]stays close to zero. The innovation is computed against the predicted state att_meas, not against the current live state, so the latency cancels out.
A concrete sample from this run: at vte_position.rel_pos[0] = 0.026 m and the measurement that just arrived is vte_aid_fiducial_marker.observation[0] = 0.041 m, with time_since_meas_ms = 368 ms and history_steps = 21. Looking 368 ms back into the history at vte_position.rel_pos[0] = 0.034 m. The OOSM innovation is computed against that historic prediction, giving vte_aid_fiducial_marker.innovation[0] = 0.007 m. Without OOSM the filter would compare the delayed measurement against the live state and produce

Filter Robustness
Smoothing through bounded noise: When the filter is well tuned, the state stays smooth even when individual observations are noisy. The plot was generated by adding uniform random noise in the range vte_aid_fiducial_marker.observation) and target-GNSS (vte_aid_gps_pos_target.observation) position observations on the x, y, z axes. The injected noise stays within the NIS gate, so every sample is fused.
- Top row (x axis):
vte_aid_fiducial_marker.observation[0]andvte_aid_gps_pos_target.observation[0]are visibly noisy, butvte_position.rel_pos[0]follows the underlying trend smoothly. - Bottom row (y axis): same behaviour on the lateral axis.
The smoothness is the Kalman filter doing its job: each sample only contributes a fraction of its innovation to the state, weighted by the ratio of the predicted state variance to the observation variance. As long as the noise stays inside the NIS gate, no rejection happens and the per-sample jitter averages out across many fusions. If the state in your own logs follows the jitter instead of the trend, see When the state follows per-sample jitter for the tuning recipe.

Rejecting a corrupted measurement: Demonstrates how the estimator rejects a faulty measurement. The plot was obtained by injecting during 6 seconds an additive bias uniformly sampled between -1 and 1 metre on the x-axis vision observation, while leaving the other axis untouched. The healthy axes keep fusing as expected, while the corrupted axis repeatedly fails the NIS gate, which is what protects the state from following the outliers into the wrong solution.
- Top row (x observation):
vte_position.rel_pos[0]followsvte_aid_fiducial_marker.observation[0]until the measurements are altered, at which point it does not follow the biased measurements. - Second row (x fusion status):
vte_aid_fiducial_marker.fusion_status[0]flips toSTATUS_REJECT_NISwhenever the observation disagrees with the filter prediction, confirming that the chi-squared gate is doing its job. - Third row (x test ratio):
vte_aid_fiducial_marker.test_ratio[1]spikes and breaches VTE_POS_NIS_THRE for the biased measurements. - Bottom row (y, z fusion status):
vte_aid_fiducial_marker.fusion_statusstays at 1:STATUS_FUSED_CURRENTor 2:STATUS_FUSED_OOSMin the y and z direction.
What to do when you see this pattern in your own logs:
- Confirm that the rejection actually corresponds to a corrupted measurement and not a healthy sample being thrown out by an overly tight gate. Compare the raw sensor topic (e.g.
fiducial_marker_pos_report) withvte_aid_fiducial_marker.observationto be sure the input itself is bad rather than a frame transform or timestamp issue. - If the rejections are legitimate (the data really is corrupted), no parameter change is needed: the NIS gate is performing exactly the function it is there for. Investigate the upstream sensor instead (camera exposure, marker visibility, attitude alignment).
- If healthy samples are being rejected, the observation variance is the first thing to revisit. As detailed in the troubleshooting checklist row on Frequent innovation rejections, an under-reported variance causes the NIS gate to fire on small innovations. Raise VTE_EVP_NOISE (or VTE_GPS_P_NOISE for GNSS) so the floor matches the actual sensor accuracy.
- If both the upstream sensor and the variance look correct but legitimate outliers are still slipping through, loosen the gate slightly with VTE_POS_NIS_THRE (defaults to
, i.e. a false-rejection rate; smaller values reject more aggressively, larger values are more permissive).

Vision dropout behaviour: Illustrates what happens when vision is the only relative-position source and the marker is temporarily lost, and why fusing the vehicle GNSS velocity matters in practice.
Both plots come from the same SITL scenario with a static target: the filter is well converged on vision, vision drops out for several seconds, then comes back. The only difference between the two runs is whether VTE_AID_MASK bit 1 (vehicle GNSS velocity) is enabled.
To make the full dropout visible, VTE_BTOUT was raised to 10 s for these runs (default 3 s) so the filter keeps predicting the state instead of resetting early. On both plots the vertical blue line marks the moment vision is lost, and the vertical red line marks the estimator timeout 10 s later, when VTE_BTOUT expires and the filter is reset.
The reason this case deserves its own plot is structural rather than a tuning issue. The per-axis state vte_position.rel_pos.
Plot 1, vision only: VTE_AID_MASK = vision only (bit 2).
- Top row (relative position):
vte_position.rel_pos[0]overlaid withvte_aid_fiducial_marker.observation[0]. The state stays close to the last vision sample for roughly 3 seconds after the dropout (the time it takes the residualto integrate to a visible offset), then drifts by about 60 cm over the next 7 seconds. When vision returns, the state snaps back by ~60 cm in a single innovation, which is the accumulated drift. - Bottom row (UAV velocity):
vte_position.vel_uav[0],vehicle_local_position.vx, andsensor_gps.vel_n_m_s. The three traces agree while vision is fused. As soon as vision drops, the filter estimate diverges by about 0.1 m/s from the EKF2 and GNSS references. That ~0.1 m/s gap over 7 seconds is the position drift observed in the top row.

Plot 2, vision and vehicle GNSS velocity: Same scenario, with VTE_AID_MASK bit 1 enabled. Vehicle GNSS velocity is now fused as
- Top row (relative position):
vte_position.rel_pos[0]properly follows the expected trend through the dropout. The correction on vision recovery is small (in this run, from 0.26 m to 0.24 m, i.e. 2 cm), which is within the vision noise level rather than a real drift. - Bottom row (UAV velocity):
vte_position.vel_uav[0]now tracksvehicle_local_position.vxandsensor_gps.vel_n_m_sfor the full dropout, because GNSS velocity continues to constrain the state when no vision sample is available.

What to take away from this comparison:
- With vision alone, the vehicle velocity state is only weakly observable, so the filter cannot estimate cleanly through gaps in vision. Any additional source that touches the velocity state directly substantially extends how long the filter remains accurate without a vision sample.
- The size of the relative-position jump on vision recovery is a useful diagnostic on its own: a large jump means the velocity state had drifted during the gap, not that the recovered vision sample is wrong.
Vision occlusion during descent (real flight): Demonstrates the central motivation behind multi-sensor fusion. Once the bias between the absolute frame (here the mission landing waypoint) and the vision frame is estimated, the corrected GNSS observation still points at the pad after vision drops out, so the vehicle still lands precisely. Generated by removing the vision fusion 5 m above the target during a real precision-landing flight. The drone landed at the center of the target despite the vision loss.
The vertical blue line marks the moment vision is stopped, and the vertical red line marks touchdown.
- Top row (relative position and bias-compensated GNSS):
vte_aid_fiducial_marker.observation[0],vte_aid_gps_pos_mission.observation[0],vte_position.rel_pos[0], and the custom tracegps_pos_mission_bias_compensated = vte_aid_gps_pos_mission.observation[0] - vte_position.bias[0]. - Middle row (bias estimates):
vte_position.bias[0]andvte_position.bias[1]. The bias converges in roughly 6 seconds as the drone descends and vision becomes more precise. Between the blue and red lines (8 seconds without vision), the deltas are aboutm on x and m on y, so the bias stays effectively stable. By touchdown, the bias settles at about m on x (north) and m on y (east). Without bias compensation the drone would have touched down 1.18 m east of the pad. - Bottom row (relative position covariance):
vte_position.cov_rel_pos[0]. The variance climbs by aboutonce vision is lost, corresponding to a 1-sigma growth of . This is the filter reporting that the relative position is now less certain because no vision sample is constraining it.
Why this matters: this is exactly the case multi-sensor fusion is meant to handle. The mission landing waypoint reports the pad position with a metre-scale offset (the bias) that vision corrects. Once the bias is observed, the corrected GNSS effectively becomes a second relative-position sensor for the remaining descent, including any segment where the marker temporarily disappears (motion blur, partial occlusion, or the marker leaving the camera field of view because it is too large at low altitude). This is particularly important for large targets where the marker is expected to leave the camera frame in the final metres of descent.
What to watch in your own logs:
- The bias must already be settled before vision drops out. If your dropout test shows the bias still moving when vision is lost, lower VTE_BIAS_UNC.
- The post-dropout bias drift gives a direct read on how aggressively the bias is tuned. The deltas above (a few centimetres over 8 seconds) match a well-tuned filter. Larger drift means VTE_BIAS_UNC is too loose; the filter overfit the bias to recent vision and now lets it walk away when vision is gone.

Precision Landing on a Static Target
Precision landing alignment: Compares the precision-landing target with the vehicle position to confirm that the vehicle is actually navigating toward the target. Generated from a real precision-landing flight.
landing_target_pose.x_abs and landing_target_pose.y_abs are expressed in the local NED frame, which is fixed relative to the EKF2 origin. For a static target both values should stay roughly constant for the duration of the approach: the estimator can refine the target position as new observations arrive, but big jumps or steady drift in this signal indicate noisy observations or an estimator that is following them too aggressively. If you see this, When the state follows per-sample jitter lists the right knobs.
The plot is also useful for catching misconfigurations, for example PLD_YAW_EN disabled when yaw alignment is expected. If the controller struggles to follow even when the target signal is clean, review the multicopter position tuning guide.
- Top row (north alignment):
landing_target_pose.x_absagainstvehicle_local_position.x. The curves should overlay once precision landing is triggered. - Second row (east alignment):
landing_target_pose.y_absagainstvehicle_local_position.y, same as north. - Third row (yaw alignment):
vte_orientation.yawshould remain steady whiletrajectory_setpoint.yawtracks it when PLD_YAW_EN is enabled. - Bottom row (descent context):
vehicle_local_position.dist_bottomshows when the final approach begins and how high above the pad the precision-landing alignment converged. A higher convergence altitude is better.

Moving Target
Moving target precision landing: Shows how the precision-landing controller projects the setpoint ahead of a moving target, how the lead converges onto the target as altitude drops, and how the moving-target states behave during the descent. Generated from a real flight with a camera publishing target estimates at 10 Hz at CONFIG_VTEST_MOVING=y.
- Top row (axis along which the target moves):
vehicle_local_position.x,landing_target_pose.x_abs, andtrajectory_setpoint.position[0]. The trajectory setpoint sits ahead of the estimated target position for most of the descent: that is the PLD_MOVING_T_MAX lookahead leading the vehicle toward where the target will be at touchdown. As altitude drops, the lookahead converges toward PLD_MOVING_T_MIN, the gap between setpoint and target closes, and the vehicle lands on the moving target rather than where it was at the start of the approach. - Second row (vision innovations):
vte_aid_fiducial_marker.innovation[0..2]on all three NED axes. They look like white noise centred on zero, which means the filter dynamics match the marker observations even as the target moves. - Third row (moving-target states):
vte_position.vel_targetandvte_position.acc_target. These states only exist in the moving-target build and can be compared to the expected target velocity and acceleration. - Bottom row (descent context):
vehicle_local_position.dist_bottomshows the distance to the ground and indicates the phase of the landing.
What to take away:
- Add more sensors when you can. With vision only, the mission must end up somewhere the target is visible because the filter has no other observation of the target position. Adding a target GNSS receiver gives the filter an absolute reference even before vision detects the marker: the filter converges faster, the vehicle can fly to a moving target before it is in view, and the risk of missing the target is reduced. Any offset between the receiver and the marker is estimated by the bias state.
- Add a direct target-velocity observation when you can. Target GNSS velocity (VTE_AID_MASK bit 4) is the only measurement that constrains
directly. Without it, target velocity is only weakly observable and the precision-landing lookahead becomes noisier. This is the moving-target analogue of the static-target Vision dropout behaviour, where fusing vehicle GNSS velocity directly anchored the vehicle velocity state and removed the relative-position drift. - Publish target estimates at a high rate. With a 10 Hz camera the prediction has up to 100 ms of unconstrained motion between samples. Raising the rate shrinks the predicted variance growth between fusions, makes the precision-landing setpoint smoother, and keeps
vte_aid_fiducial_marker.innovationcloser to zero through the descent.

Development and Debugging Tips
- To print
PX4_DEBUGstatements from the module, launch SITL withPX4_LOG_LEVEL=debug(for example,PX4_LOG_LEVEL=debug make px4_sitl_visionTargetEst). On hardware builds, compile with the debug configuration or enable the console log level before running tests so the additional diagnostics appear on the shell. - Keep the estimator alive on the bench by setting VTE_TASK_MASK=3; the debug bit enables the continuous update of the position and orientation estimators (if enabled via VTE_YAW_EN and VTE_POS_EN).
- Use shell helpers while iterating:
listener landing_target_pose,listener vte_bias_init_status,listener vte_aid_fiducial_marker(or the relevantvte_aid_*) to inspect bias averaging and innovations,listener vte_input 5for prediction inputs, andvision_target_estimator statusto ensure both filters are running.
SITL Simulation Pipeline
The Gazebo Classic SITL setup produces both VTE inputs (fiducial_marker_pos_report and target_gnss) end-to-end, so the filter can be tested without real hardware. This is also where you should inject extra noise or latency to stress-test the filter.
The pipeline has four stages:
- Aruco marker detection:
arucoMarkerPlugin(Tools/simulation/gazebo-classic/sitl_gazebo-classic/src/gazebo_aruco_plugin.cpp) runs on the simulated camera, detects the marker, and publishes aTargetRelativeGazebo message. The reported standard deviations are hard-coded inOnNewFrame()(set_std_x,set_std_y,set_std_z,set_yaw_std); change them there to make the vision report more or less noise. Marker visibility is controlled byTools/simulation/gazebo-classic/sitl_gazebo-classic/models/land_pad/land_pad.sdf: the<visual><box><size>element sets the rendered Aruco square (the<collision>size is independent and just controls the physical pad). The<pose>element on the<model name="land_pad">block places the pad in the world; its last value is the pad yaw in radians (for example<pose>0.0 0.0 0.06 0 0 0.7</pose>rotates the pad by 0.7 rad, about 40 deg), which is the easiest way to exercise the orientation filter in SITL. - Target GPS: the
gps_targetmodel included fromland_pad.sdfruns the standard Gazebo GPS plugin. Tune the noise floors directly inTools/simulation/gazebo-classic/sitl_gazebo-classic/models/gps/gps.sdf(the<noise>blocks for horizontal/vertical position and velocity). - Mavlink bridge:
gazebo_mavlink_interface.cppconnects the two plugins to the autopilot.targetRelativeCallback()packs the marker into aTARGET_RELATIVEmavlink message, andTargetGpsCallback()packs the GPS sample intoTARGET_ABSOLUTE. - PX4 side:
SimulatorMavlink::handle_message_target_relative()andhandle_message_target_absolute()decode the mavlink messages back intofiducial_marker_pos_reportandtarget_gnss(or directly intolanding_target_posewhenVTE_EN=0). From there the filter sees exactly the same topics it would on real hardware, so anything you tune at the sensor level reproduces faithfully end to end.
Adding New Measurement Sources
To integrate a new sensor:
- Define a uORB message that carries the measurement in either the vehicle body frame or NED, complete with variance estimates and timestamps.
- Extend the fusion mask: add a bit to
SensorFusionMaskU(src/modules/vision_target_estimator/common.h) and update the parameter definition for VTE_AID_MASK (see also sensor fusion selection) invision_target_estimator_params.yaml. - Augment observation enums: append the new entry to the relevant
ObsTypeenum (VTEPosition.horVTEOrientation.h), updateObsValidMaskU, and update helper functions such ashasNewNonGpsPositionSensorData()andselectInitialPosition()if the measurement can provide a relative position. - Subscribe and validate: add a
uORB::Subscriptionto the filter, check for finite values, and reject samples that are too oldisMeasUpdatedor timestamped in the future before marking the observation valid. - Implement the handler in
processObservations(). Convert the measurement into NED coordinates, populateTargetObs::meas_xyz,meas_unc_xyz, and the observation Jacobian (meas_h_xyzormeas_h_theta), and set the fusion-mask flag only after the data passes validation. - Provide tunable noise: declare a parameter (e.g.
VTE_<SENSOR>_NOISE) and clamp it withkMinObservationNoiseso the estimator never believes a measurement is perfect. - Log the innovations: add a publication member and ORB topic (see
vte_aid_fiducial_markerfor reference) so that logs include the innovation, variance, andfusion_statusoutcome for the new sensor. Use theFusionResultreturned by the filter to populatefusion_statusso successes and rejections share the same status enum as the existing sources. - Exercise SITL: update the Gazebo (or other) simulation so that replay tests produce the new measurement. This keeps CI coverage intact and provides a reference data set for tuning.
- Document the workflow: update this deep dive and any setup how-tos so users know how to enable the new bit, calibrate the sensor, and interpret its logs.
WARNING
Timeout policy: Every measurement must be time-aligned and checked so that stale data never reaches the update step. Reject samples older than VTE_M_REC_TOUT (isMeasRecent(hrt_abstime ts)) and check time_since_meas_ms (and fusion_status for STATUS_REJECT_TOO_OLD / STATUS_REJECT_TOO_NEW) on the new vte_aid_* topic to confirm the sensor fits inside the estimator deadlines. If observations are stored in cache, invalidate it inside checkMeasurementInputs when older than VTE_M_UPD_TOUT (isMeasUpdated(hrt_abstime ts)).
Adding New Tasks
Add a new VteTask when the estimator should only run during a particular mission phase or flight-mode behavior, or when that behavior needs its own subscriptions, cached state, or completion logic. Do not put that state back into VisionTargetEst; the whole point of the task layer is to keep the scheduler generic.
Use this checklist:
- Reserve a task-mask bit: add the bit in
task_bits(tasks/VteTask.h). Those constants are the code-level source of truth forVTE_TASK_MASK. Then updatevision_target_estimator_params.yamland theVTE_TASK_MASKdocumentation so users can enable the new task. - Create a task class: add a new file under
src/modules/vision_target_estimator/tasks/and derive it fromVteTask. Put all task-specific subscriptions and status flags in that class. - Implement the interface deliberately:
maskBit()andname()identify the task,pollStatus()refreshes external state,isReady()exposes level-triggered readiness, andisComplete()reports self-termination. UseonActivate()for one-shot state resets on task entry, andonPosEstStart()only when the position filter needs task-specific seeding such as a cached mission waypoint. - Keep side effects out of readiness checks:
pollStatus()andisReady()are called every scheduler cycle while the task bit is enabled. They should be cheap. Transition-only work belongs inonActivate(). - Register the task by priority: add the task instance to
VisionTargetEstand place it in_task_registryin descending priority order. If multiple bits are enabled at the same time, the first ready task wins. - Test lifecycle and cache behaviour: extend
TEST_VTE_VisionTargetEst.cppwith coverage for task activation completion, repeated starts, and any cached mission/context data that must survive estimator restarts within the same task. - Document the operational meaning: update the overview page, this deep dive, and any feature-specific docs so users understand when the new task is active and which
VTE_TASK_MASKbit selects it.
SymForce-Generated Derivations
The Kalman-filter math is not written by hand. The symbolic state, prediction model, and covariance update are defined in Python and expanded into C++ by SymForce, so the C++ that runs on the autopilot always matches the model defined in derivation.py.
Source files (Python):
src/modules/vision_target_estimator/Position/vtest_derivation/derivation.pyfor the position filter.src/modules/vision_target_estimator/Orientation/vtest_derivation/derivation.pyfor the yaw filter.
Generated outputs (C++ headers) used at runtime:
| Filter | Generated headers |
|---|---|
| Position | state.h, predictState.h, predictCov.h, computeInnovCov.h, getTransitionMatrix.h, applyCorrection.h |
| Orientation | predictState.h, predictCov.h, getTransitionMatrix.h (innovation computation and correction stay hand-written in Orientation/KF_orientation.cpp so yaw angles can be wrapped to |
Build-time behavior:
- By default, CMake copies the committed pre-generated headers from
Position/vtest_derivation/generated/into the build tree underbuild/<target>/src/modules/vision_target_estimator/vtest_derivation/generated/, and likewise for orientation. No SymForce install is required for the static build. CONFIG_VTEST_MOVING=yautomatically setsVTEST_SYMFORCE_GEN=ONand requires SymForce in the Python environment so the 5-state moving-target headers are produced.- Setting
-DVTEST_SYMFORCE_GEN=ONmanually regenerates both filters at configure time.
The generated files are included from the build directory and must never be edited by hand. To change the model, edit derivation.py and regenerate.
Regenerating the symbolic model
- Configure CMake with
-DVTEST_SYMFORCE_GEN=ON(automatic whenCONFIG_VTEST_MOVING=y) and ensure SymForce is available in the Python environment. - Reconfigure with
cmake --fresh ...so the custom command insrc/modules/vision_target_estimator/CMakeLists.txtruns. The outputs land inbuild/<target>/src/modules/vision_target_estimator/vtest_derivation/generated/(position) and.../vte_orientation_derivation/generated/(orientation). - To refresh the committed reference files, add
-DVTEST_UPDATE_COMMITTED_DERIVATION=ONand commit the regenerated files inPosition/vtest_derivation/generated*/once vetted.
If the build fails during regeneration, inspect the CMake output for the SymForce invocation and rerun it manually inside Position/vtest_derivation/ to catch Python errors. After regenerating, rebuild the module to ensure the Jacobians and code stay in sync.
Runtime Performance on Hardware
The estimator publishes per-section perf counters that can be read at any time on the shell:
sh
perf | grep "vision_target_estimator"Each line reports event count, total elapsed time, average, min, max, and standard deviation in microseconds. Six counters cover the vision target estimator:
| Counter | What it covers |
|---|---|
VTE cycle | One full work-queue of VisionTargetEst::Run(). Includes input subscription polls, task scheduling, and the dispatch into the position/orientation update steps. |
VTE cycle pos | The position-filter part of the cycle (every prediction or measurement update for VTEPosition). |
VTE cycle yaw | The orientation-filter part (same scope, for VTEOrientation). |
VTE prediction | Just the Kalman prediction inside the position filter: Predictstate + Predictcov plus OOSM history. |
VTE update | The full update step in the position filter, including measurement validation, frame transforms, fusion calls for every active aid source, and OOSM history maintenance. |
VTE fusion | Just the per-axis fusion inside an update: innovation, gating, gain projection, and history correction. This is where OOSM scales with history_steps. |
The numbers below come from a Pixhawk 6c running the static-target filter (vte_aid_fiducial_marker + vte_aid_gps_pos_target + vte_aid_gps_vel_uav) at the default 50 Hz cadence.
Baseline (no injected delay):
text
VTE fusion: 20568 events, 297061us elapsed, 14.44us avg, min 1us max 88us 14.138us rms
VTE update: 6856 events, 318536us elapsed, 46.46us avg, min 3us max 163us 41.827us rms
VTE prediction: 18567 events, 105034us elapsed, 5.66us avg, min 5us max 52us 1.575us rms
VTE cycle : 92576 events, 1614647us elapsed, 17.44us avg, min 6us max 281us 27.204us rms
VTE cycle yaw: 18563 events, 125422us elapsed, 6.76us avg, min 4us max 60us 3.985us rms
VTE cycle pos: 18569 events, 803841us elapsed, 43.29us avg, min 22us max 253us 38.464us rmsWith PX4_SIM_TARGET_MEASUREMENT_DELAY_MAX_MS=500 (≈ 480–500 ms latency, ~20 OOSM steps on average):
text
VTE fusion: 31932 events, 1860362us elapsed, 58.26us avg, min 63us max 135us 23.520us rms
VTE update: 10644 events, 1894804us elapsed, 178.02us avg, min 4us max 301us 69.331us rms
VTE prediction: 29156 events, 166179us elapsed, 5.70us avg, min 5us max 52us 1.614us rms
VTE cycle : 163952 events, 4215369us elapsed, 25.71us avg, min 6us max 511us 57.621us rms
VTE cycle yaw: 32850 events, 274433us elapsed, 8.35us avg, min 1us max 86us 7.852us rms
VTE cycle pos: 32861 events, 2703185us elapsed, 82.26us avg, min 10us max 479us 96.704us rmsHow to read these:
VTE predictionis essentially constant (~5.7 us). Predictions do a fixed amount of math regardless of measurement latency, which is what we expect.VTE fusionjumps from ~14 us to ~58 us (≈ 4× increase). This is the OOSM cost: the projection step touches every history sample aftert_meas, so the runtime grows roughly linearly inhistory_steps. With ~20 history steps the per-fusion cost is still well under 100 us on average.VTE updatefollows the same trend (~46 us → ~178 us) since it wraps multiple fusion calls. The worst-case (~300 us) corresponds to a cycle where every aid source hits the OOSM path.VTE cycle pos(~43 us → ~82 us) confirms the same behaviour at the cycle level. A full position cycle still completes in well under 0.5 ms even in the worst case, which is small relative to the 20 ms (50 Hz) loop budget.VTE cycleandVTE cycle yawsee only minor changes: these counters are dominated by lightweight scheduling and yaw-only fusion that does not involve OOSM history projection.
A few things to keep in mind when interpreting your own numbers:
- The total
VTE cycleevent count is roughly five times theVTE cycle poscount becauseRun()is woken on every input topic update, but the position estimator only ticks at 50 Hz. history_stepssaturates at the buffer depth (kOosmHistorySize = 25). If you see it pinned at 25 in the logs, the measurement is older than 500 ms and is being rejected asSTATUS_REJECT_TOO_OLDrather than fused.
Even with ~500 ms induced latency on every measurement the estimator stays inside its 20 ms loop budget on a Pixhawk 6c.
Unit Test Suites
TEST_VTE_KF_position: Kalman filter math for the position state (prediction, NIS gating, bias-aware H, OOSM gold standard). Static model (state size 3) by default, moving-target tests (state size 5) build only whenCONFIG_VTEST_MOVINGis enabled.TEST_VTE_KF_orientation: Kalman filter math for yaw/yaw-rate (wrap logic, process noise, dt edge cases, covariance symmetry).TEST_VTE_VTEPosition: Module logic for vision/GNSS fusion, offsets, interpolation, ordering, and uORB innovation topics (static + moving gated byCONFIG_VTEST_MOVING).TEST_VTE_VTEOrientation: Module logic for yaw fusion, noise models, resets, and OOSM handling.TEST_VTE_VTEOosm: Generic OOSM manager behavior.
Run locally:
sh
make tests TESTFILTER=TEST_VTE*