Skip to content

Vision Target Estimator (VTEST)

PX4 v1.18 Experimental

The Vision Target Estimator (VTEST) estimates where a target is, relative to the vehicle, by combining a camera-based detection of the target (for example an ArUco fiducial marker) with one or more absolute position references: the vehicle's own GNSS, the mission landing waypoint, and/or a GNSS receiver mounted on the target itself. Its main use is precision landing, where the vehicle needs to touch down on a small, well-defined spot rather than the approximate mission waypoint.

This page is for developers who want to enable VTEST on a custom build, tune it for their vehicle, or integrate a new vision pipeline that publishes landing-target observations. It assumes familiarity with PX4 and basic state estimation, but does not require prior knowledge of the module itself. For more depth see the Vision Target Estimator deep dive.

WARNING

VTEST is a beta feature, disabled in default board configurations, and should only be enabled on custom builds after careful bench and flight testing.

Table of Contents

Building the Module

The estimator is not part of the default PX4 board configurations.

To enable a build that includes the module you need to modify the KConfig board configuration for your target board (or create a custom .px4board file). The keys and values that need to be present are:

  • CONFIG_MODULES_VISION_TARGET_ESTIMATOR=y: Enable VTEST in firmware
  • CONFIG_MODULES_LANDING_TARGET_ESTIMATOR=n: Disable the landing target estimator (both modules publish landing_target_pose and will conflict if enabled together).
  • CONFIG_MAVLINK_DIALECT="development": Enable using the development dialect. Note that this step will no longer be required once the MAVLink messages used by VTEST have been validated.

Two prebuilt targets are available:

  • make px4_fmu-v6c_visionTargetEstStatic: static-target hardware build.
  • make px4_sitl_default: SITL static-target build.

For the experimental moving-target build, see Moving-target mode.

Configuration

Module Enable

  • Set VTE_EN=1 to run the estimator (reboot required).
  • VTE_POS_EN and VTE_YAW_EN enable the position and orientation filters individually (reboot required). Both are on by default; disable the yaw filter if your vision pipeline does not report target heading.

Timeouts

The default timeouts are appropriate for a typical precision-landing approach where measurements keep arriving until touchdown. You only need to raise them when you know in advance that there will be a long gap with no fresh data, for example a vision-only setup where the marker is larger than the camera field of view during the final descent and you do not want the estimator to time out before touchdown.

When that applies:

  • Raise VTE_TGT_TOUT past the expected gap so the published target pose stays marked valid; the precision-landing controller stops following the target once this expires.
  • Raise VTE_BTOUT past the expected gap if you want the filter to coast on its last estimate. Otherwise the filter is reset and will restart as soon as enabled fusion sources reappear.

Measurements timestamped in the future relative to the latest filter prediction are always rejected and reported on the corresponding innovation topic.

Measurement Freshness

VTE_M_REC_TOUT and VTE_M_UPD_TOUT decide how long an incoming sample stays usable:

  • VTE_M_REC_TOUT is the maximum age at which a new measurement is still eligible for fusion
  • VTE_M_UPD_TOUT bounds how long a cached observation remains valid inside the filter state.

These are independent of the timeouts above; tune them to the dynamics of your platform and target, not to a fixed value:

  • For fast-moving targets and aggressive vehicles, keep both values short: stale data describes a world that has already changed, so fusing it pulls the filter in the wrong direction.
  • For larger vehicles with slow dynamics and modest roll, pitch, yaw rates, the defaults (or even larger values) work fine: a slightly older sample is still informative because the relative geometry has not significantly changed.

If you are unsure, leave the defaults and watch the logs: tracking lag or overshoot on a fast platform usually means these values are too large; legitimate fusions being missed because samples arrive just outside the window means they are too tight.

Task Selection

VTE_TASK_MASK is a bitmask that selects the runtime tasks for which the estimators perform computations and estimate the state of the target. Set the indicated bit to enable the corresponding task.

BitTask
0Precision landing
1DEBUG (always active)

WARNING

Precision landing yaw control is disabled by default. Enable PLD_YAW_EN when you want the mission controller to align the vehicle with the target heading, and configure the landing waypoint for precision landing (see Mission precision landing). Without both settings the aircraft will only track the position estimate from the Vision Target Estimator.

Sensor Fusion Selection

VTE_AID_MASK is a bitmask that defines which measurements can be fused:

BitMeaning
0Target GNSS position
1Vehicle GNSS velocity
2Vision-relative position
3Mission landing waypoint
4Target GNSS velocity (moving mode only)

Bit 2 also enables processing of fiducial_marker_yaw_report in the orientation filter.

TIP

Enable more than one source when you can. Each source observes a different combination of states: vision contributes a direct relative position, vehicle GNSS velocity directly constrains the vehicle velocity state, and target GNSS or the mission landing waypoint contribute the absolute reference needed to estimate the GNSS/vision bias. Multiple sources are what make the filter robust to any single one momentarily dropping out.

Vehicle GNSS velocity is the most important non-vision source. With only vision enabled, the vehicle velocity state is only indirectly observable. A short vision dropout then produces a visible relative-position drift that snaps back when vision returns. With vehicle GNSS velocity enabled, the velocity state is observed directly and the relative position stays more stable through vision dropout. The deep dive plots both cases side by side in Vision dropout behaviour. Only disable this source when you have a specific reason, for example no GNSS available.

An absolute reference makes precision landing robust to vision loss. Fusing an absolute reference (target GNSS or the mission landing waypoint) lets the filter estimate the bias between the absolute frame and the vision frame. Once the bias is observed, the corrected absolute observation effectively becomes a second relative-position sensor: the vehicle can still touch down on the target even when vision is no longer available (for example because the marker leaves the camera field of view in the final metres of descent, or because of motion blur or partial occlusion). This is especially important for large targets where vision is expected to drop out before touchdown. The deep dive analyses this on a real flight in Vision occlusion during descent.

Target GNSS position and mission landing waypoint are mutually exclusive. Both provide the absolute reference for the same GNSS/vision bias, and only one bias can be estimated at a time. If both bits are set, the module disables the mission landing waypoint at startup and prints a warning, but it is cleaner to pick one explicitly: use target GNSS when a receiver is mounted on the target (most accurate), and the mission landing waypoint when no receiver is available.

For each source's uORB topic, observation model, and fusion notes, see Measurement sources at the end of the page.

Noise and Gating

Start from the defaults: they are tuned for a typical setup of consumer GNSS plus a marker-based vision pipeline. Re-tune only when log analysis points at a specific symptom. The deep dive walks through what a healthy filter looks like in Plot examples and how to balance the parameters in Balancing process and observation noise.

  • Initial state variances (VTE_POS_UNC_IN, VTE_VEL_UNC_IN, VTE_BIA_UNC_IN, VTE_ACC_UNC_IN, VTE_YAW_UNC_IN) seed the filter at initialization or reset. Lower them only if you see aggressive transients on vte_position immediately after activation; raise them if convergence is very slow. Updates while the estimator is already running take effect on the next start.
  • Process noise (VTE_ACC_D_UNC, VTE_BIAS_UNC, VTE_YAW_ACC_UNC, and VTE_ACC_T_UNC for moving-target builds) controls how fast the predicted state variance grows between measurements, which in turn sets how strongly each new observation moves the filter. Raise these when the estimator lags real motion or repeated innovations show that the prediction uncertainty is too optimistic; lower them when the state follows per-sample jitter rather than the underlying trend. For a step-by-step recipe with worked examples, see Balancing process and observation noise.
  • Bias averaging (VTE_BIA_AVG_THR, VTE_BIA_AVG_TOUT) only takes effect when GNSS becomes active before vision; the defaults work for typical consumer GNSS plus marker-based vision. Set VTE_BIA_AVG_TOUT=0 to skip averaging and activate the bias on the first joint sample. Raise VTE_BIA_AVG_THR if a noisy vision pipeline cannot satisfy the stability criterion within the timeout.
  • Outlier gates (VTE_POS_NIS_THRE, VTE_YAW_NIS_THRE) default to a 95% chi-squared confidence interval (3.84). Tighten the gate if outliers from a noisy sensor are still being fused and visibly pulling the state; loosen it if vte_aid_*.fusion_status shows legitimate samples rejected as STATUS_REJECT_NIS. The troubleshooting checklist and the rejection-pattern walkthrough in Plot examples help tell the two cases apart.
  • Sensor noise floors (VTE_GPS_P_NOISE, VTE_GPS_V_NOISE, VTE_EVP_NOISE, VTE_EVA_NOISE) are lower bounds on the per-sample standard deviation each sensor is allowed to report. Raise the floor for a sensor that under-reports its noise (you will see the filter chasing every sample of that sensor in vte_position and the corresponding vte_aid_*.observation); leave it alone if the reported variance already exceeds the floor. Do not push these towards zero: a very small floor tells the filter to trust every sample fully, which makes it chase per-sample jitter and can drive the controller into oscillations. The runtime enforces a hard minimum to keep Kalman gains bounded, but the safer practice is to set a realistic floor that matches the actual sensor accuracy. See Between observation sources for typical mis-tunes.

Sensor-specific Settings

  • GNSS antenna offsets - VisionTargetEst reads vehicle_gps_position.antenna_offset_{x,y,z} to remove position- and rotation-induced velocity offsets before forming GNSS measurements. Configure the vehicle GPS antenna location with SENS_GPS0_OFFX, SENS_GPS0_OFFY, and SENS_GPS0_OFFZ.

EKF2 Aiding

For a static target, the relative velocity tracked by the Vision Target Estimator can be fed into the main vehicle state estimator (EKF2) as an auxiliary velocity measurement. This is especially useful during the final descent, where the camera is typically the most accurate source of horizontal velocity available to the vehicle.

The auxiliary velocity is exposed to EKF2 whenever landing_target_pose.rel_vel_ekf2_valid is true, which requires:

  • landing_target_pose.rel_pos_valid = true, and
  • landing_target_pose.rel_vel_valid = true,
  • landing_target_pose.is_static = true, and
  • a vision-relative position measurement has been fused within VTE_M_REC_TOUT.

The recent vision-relative measurement requirement is intentional: GNSS and mission position can keep the target estimate valid, but they are not true relative measurements and must not enable EKF2 relative-velocity aiding by themselves.

The decision to actually fuse this signal as an EKF2 auxiliary velocity lives in EKF2 and is gated by EKF2_AVEL_EN. EKF2 only fuses the data when:

  • EKF2_AVEL_EN=1,
  • landing_target_pose.rel_vel_ekf2_valid = true.

If any condition fails, EKF2 ignores the input.

The estimator receives target information through two MAVLink messages emitted by an onboard companion (vision pipeline and/or external GNSS receiver):

  • TARGET_RELATIVE carries each vision-based observation of the target.
  • TARGET_ABSOLUTE carries absolute target state from a GNSS receiver mounted on the target.

INFO

These messages are currently in the MAVLink development dialect. To make them available in PX4 builds the CONFIG_MAVLINK_DIALECT="development" key must be set in the build configuration.

Data rates

The estimator runs its prediction step at 50 Hz regardless of the measurement cadence, and each new sample is only fused if it is fresher than the Measurement Freshness window. The minimum rate you actually need is application-dependent: a slow approach with a static target tolerates a few Hz, while fast-dynamics setups (such as the moving-target filter tracking a manoeuvring platform) typically need tens of Hz so the prediction does not drift too far between updates. To check whether your rate is sufficient, watch how the state covariance in vte_position grows between fused samples: substantial growth before the next observation means the prediction is doing more work than the data supports, and you should either publish faster or accept the reduced confidence.

Companion computer responsibilities

  • Timestamp each sample at capture in a clock that is synchronised with the autopilot (typically through mavlink_timesync). The OOSM history replay tolerates transport latency, but only if timestamps are consistent.
  • Report consistent variances in pos_std and yaw_std (for TARGET_RELATIVE) and position_std / vel_std (for TARGET_ABSOLUTE). Under-reporting variance is the most common cause of overshoots; the Sensor noise floors only clamp the lower bound and cannot rescue an over-confident sensor.
  • Set the coordinate frame for TARGET_RELATIVE (TARGET_OBS_FRAME) and provide the q_sensor rotation when the camera frame differs from vehicle-carried NED.

If you publish target observations from a different onboard pipeline (for example a ROS 2 node running on the companion), you can write the same data directly to the uORB topics that the estimator subscribes to: fiducial_marker_pos_report and fiducial_marker_yaw_report for vision, and target_gnss for an external receiver. The timing and variance guidance above still applies, and VTE_EN=1 must be set so the estimator runs and consumes those topics.

Message Overview

TARGET_RELATIVE (ID 511)

TARGET_RELATIVE extends the LANDING_TARGET message with a full 3D report that includes orientation and measurement uncertainty:

  • A coordinate frame selector (TARGET_OBS_FRAME) and a sensor quaternion (q_sensor) that, when provided, rotates the measurement into vehicle-carried NED.
  • Target pose (x, y, z, q_target) and variances (pos_std, yaw_std), collected from onboard vision pipelines.

mavlink_receiver validates the frame/type and handles the message differently depending on the Vision Target Estimator status:

  • When VTE_EN=0, the measurement is rotated (using q_sensor or the vehicle attitude for body-frame reports) and published straight to landing_target_pose so precision-landing can operate without the estimator.
  • When VTE_EN=1, the message is split into fiducial_marker_pos_report and fiducial_marker_yaw_report. VisionTargetEst consumes these uORB topics to drive the position and orientation filters (using fiducial_marker_pos_report.q to rotate fiducial_marker_pos_report.rel_pos into NED at timestamp_sample).
TARGET_ABSOLUTE (ID 510)

TARGET_ABSOLUTE reports the target's absolute state when it carries its own GNSS (and optionally IMU). A capability bitmap advertises which fields are valid. PX4 maps the available content into the target_gnss uORB topic:

  • Bit 0 (position) triggers publication of latitude/longitude/altitude along with the horizontal and vertical accuracy estimates (position_std).
  • Bit 1 (velocity) forwards the NED velocity vector (vel) and its standard deviations (vel_std).
  • Additional fields (acceleration, quaternion q_target, rates, uncertainties) are not supported and reserved for future fusion logic once flight testing is available.

Gazebo Classic Simulation

Run the SITL world gazebo-classic_iris_irlock to simulate precision landing using the VTEST fusing vision (ArUco-based) and target GNSS aiding. The world name is retained for historical reasons. The models were introduced in PX4/PX4-SITL_gazebo-classic#950.

TIP

The ArUco vision observation path implemented in Tools/simulation/gazebo-classic/sitl_gazebo-classic/src/gazebo_aruco_plugin.cpp provides a concrete example of how to obtain a vision-based observation of a target and how to publish the TARGET_RELATIVE message.

  1. Launch the simulator with:

    sh
    make px4_sitl gazebo-classic_iris_irlock
  2. If CMake cannot locate the expected OpenCV version, query the version present on your system (opencv_version --short or pkg-config --modversion opencv4) and update Tools/simulation/gazebo-classic/sitl_gazebo-classic/CMakeLists.txt accordingly, for example:

    cmake
    find_package(OpenCV 4.2.0 REQUIRED EXACT)

    Re-run the build after adjusting the find_package line.

TIP

  • Pad visibility: In Tools/simulation/gazebo-classic/sitl_gazebo-classic/models/land_pad/land_pad.sdf, increase the visual box size to 1.5 1.5 0.01 so the pad stays in view longer while the vehicle descends. If vision still detects the pad too late in the descent, complement this by planning a lower-altitude mission so the marker enters the camera field of view earlier.
  • Acceptance radius: If the vehicle hovers above the pad without ever transitioning to descent, raise PLD_HACC_RAD. Try 2 m first to confirm the rest of the chain works, then tighten it once the filter is well tuned.
  • Mission waypoint bias: Enable vision and mission position aiding in VTE_AID_MASK (set bits 2 and 3, disable bit 0). Place the landing waypoint 3 to 4 m away from the pad in QGroundControl to watch the UAV correct towards the pad once it is detected. In the logs, observe how the GNSS bias compensates for the distance between the land waypoint and the actual pad.
  • Measurement noise experiments: The ArUco plugin publishes nominal standard deviations through set_std_x and set_std_y in Tools/simulation/gazebo-classic/sitl_gazebo-classic/src/gazebo_aruco_plugin.cpp. Modify these assignments, and optionally the camera noise block in .../models/aruco_cam/aruco_cam.sdf, to see how innovation gates react to noisier vision.

Monitoring

For detailed log-analysis guidance, see Log analysis and expected plots in the deep dive.

On the autopilot shell, the first command to run is

sh
vision_target_estimator status

which reports whether the module is alive, which task is currently active, which filters are running, and which aid sources are enabled. It is the quickest way to confirm a configuration is taking effect before opening a log.

  • landing_target_pose.rel_pos_valid and .abs_pos_valid indicate whether recent measurements support relative and absolute positioning.
  • vte_position exposes every state component (relative position, vehicle velocity, GNSS bias, and optional target motion) together with diagonal covariance entries.
  • vte_orientation provides yaw, yaw-rate, and their variances.
  • vte_input records the downsampled NED acceleration and attitude quaternion actually fed to the position prediction step.
  • vte_bias_init_status shows raw and filtered GNSS/vision bias while the initial averaging phase is active.
  • Innovations published on vte_aid_* topics include the raw measurement, innovation, innovation variance, chi-squared test ratio, the fusion_status enum (per-axis on the 3D variant), and OOSM diagnostics (time_since_meas_ms, history_steps). See aid-source diagnostics for the full status table.

Operational Notes

  • Accurate timestamp alignment between measurement sources is critical. Large skews will cause innovations to fail the NIS gate and be rejected.
  • Absolute target pose is only published when vehicle_local_position reports a valid local frame.
  • When you expect yaw alignment during landing, enable PLD_YAW_EN and configure the mission land item for precision landing as described in Precision landing. In practice this means setting the QGroundControl land waypoint Precision landing drop-down (or MAV_CMD_NAV_LAND param2) to Opportunistic or Required so the controller requests the estimator output. Otherwise only positional cues are used.
  • For extended parameter tuning see Balancing process and observation noise, for log-analysis checklists see Troubleshooting checklist, and for developer workflows see Development and debugging tips.

Estimator Overview

This section describes how the filter works internally. It is recommended background reading and not required to set up or operate the feature.

The Vision Target Estimator runs two independent estimators at a fixed 50 Hz: a position filter that tracks where the target is relative to the vehicle, and an orientation filter that tracks the target yaw. The position filter is structured as three decoupled 1D Kalman filters, one per NED axis, so each axis can be tuned and validated independently.

Each Kalman filter alternates between two operations:

  • A prediction step propagates the state forward at 50 Hz using the vehicle motion model.
  • An update step corrects the state whenever a new sensor observation arrives and passes outlier detection. Outliers are rejected by a chi-squared gate on the innovation (tunable through VTE_POS_NIS_THRE and VTE_YAW_NIS_THRE, defaulting to a 95% confidence interval).

Fusion starts as soon as the filter is initialized: the position filter needs a recent vehicle velocity estimate together with at least one position-like observation; the orientation filter starts on the first valid vision yaw sample.

The estimators only run while a runtime task is active. VTE_TASK_MASK selects which tasks are eligible (currently precision landing and a debug-always-on bit). Tasks are evaluated in priority order, and the first task whose readiness conditions are satisfied is the one that runs.

If no measurement is fused for a sustained period, the affected filter resets and retries automatically once an enabled fusion source becomes available again. See Timeouts for the relevant tuning knobs.

Dynamic Models

The per-axis position state is x=[r,vuav,b]T: the relative NED displacement (target minus vehicle), the vehicle velocity, and the offset between the absolute target reference (GNSS or mission waypoint) and the vision-derived target position. Assuming constant NED acceleration input auav over the integration interval dt:

rk+1=rkdtvkuav12dt2auavvk+1uav=vkuav+dtauavbk+1=bk

Once the bias is known, GNSS can keep the estimate centred on the target even if vision drops out.

The yaw filter tracks x=[ψ,ψ˙]T with a constant-rate prediction (yaw wrapped to [π,π]):

ψk+1=wrap(ψk+dtψ˙k)ψ˙k+1=ψ˙k

Unknown physical disturbances are modelled as continuous-time Gaussian white noise. The runtime spectral densities (VTE_ACC_D_UNC, VTE_BIAS_UNC, VTE_YAW_ACC_UNC) and the initial-variance parameters are listed in Noise and Gating; for the full derivation see Dynamic model process noise.

For the experimental moving-target mode that adds target velocity and acceleration states, see Moving-target mode.

Bias Estimation

The GNSS bias b becomes observable only when both GNSS and vision are available, and VTE takes one of two paths depending on which source arrived first. When vision is already the active position reference, the bias is activated immediately on the first joint sample. When GNSS is active first, VTE low-pass filters the early raw samples (tuned by VTE_BIA_AVG_THR and VTE_BIA_AVG_TOUT) so that vision is only fused once the offset has settled.

For the full state-reset rules, the LPF exit condition, and the stale-GNSS fallback, see Bias initialization design.

Time Alignment

Vision and GNSS observations can arrive delayed due to transport and processing latency. The position and orientation filters therefore support an Out-of-Sequence Measurements (OOSM) approximation which uses a history-consistent projected correction strategy. For the algorithm, the buffer sizing, and the approximation assumptions, see OOSM Implementation in the deep dive.

Measurement Sources

All measurements are fused sequentially. For each observation z a one-row Jacobian is formed and applied to a single axis (position filter) or to the yaw state (orientation filter). Enabled sensors are defined by the VTE_AID_MASK bitmask.

SourceuORB topicH structureNotes
Target GNSS positiontarget_gnssz=r+b once the bias is observable, otherwise z=rThe vehicle GNSS sample is interpolated to the target timestamp using the vehicle velocity so the two receivers share a common epoch. Requires VTE_AID_MASK bit 0. Before bias activation, this source is held back if the estimator is already vision-referenced.
Mission landing waypointnavigator_mission_item with validated position_setpoint_triplet fallbackz=rProvides a fallback absolute reference when target GNSS is unavailable. At precision-land task start VTEST caches the logical landing waypoint published by Navigator and keeps using that cached point even after precland rewrites the live triplet. The triplet remains a fallback for modes that do not publish navigator_mission_item. Enable VTE_AID_MASK bit 3 and avoid combining it with target GNSS because only one GNSS bias can be estimated. Before bias activation, this source is held back if the estimator is already vision-referenced.
Vision posefiducial_marker_pos_reportz=r after rotating the measurement (rel_pos) into NED using qUses the message variances, lower-bounded by VTE_EVP_NOISE. Recent vision fusions are required for EKF aiding. During the initial GNSS/vision bias averaging phase, valid vision samples update the bias low-pass filter but are not fused into the position state yet. This averaging phase only exists when GNSS became the active reference first.
Vehicle GNSS velocitysensor_gpsz=vuavRemoves rotation-induced velocity using vehicle_gps_position.antenna_offset_{x,y,z}, which are populated from the vehicle GPS antenna offset parameters (SENS_GPS0_OFF*). Enable VTE_AID_MASK bit 1.
Target GNSS velocity (moving mode)target_gnssz=vtOnly used by the experimental Moving-target mode.
Vision yawfiducial_marker_yaw_reportz=ψOnly source used by the orientation filter. Requires VTE_YAW_EN and VTE_AID_MASK bit 2. Variance is taken from the message and lower-bounded by VTE_EVA_NOISE.

All innovation data are published on dedicated topics (vte_aid_gps_pos_target, vte_aid_fiducial_marker, vte_aid_ev_yaw, etc.), making it easy to inspect residuals and test ratios in logs. Every fusion attempt is published, including rejections: the per-axis fusion_status enum records the outcome (fused immediately, fused via OOSM history replay, rejected by the NIS gate, rejected as too old or too new, etc.), so tuning sessions can isolate time skew, noise mismatch, or buffer staleness without guessing.

INFO

UWB and IRLock are candidates for future development once representative test data is available.