Better Visual Object Tracking
Balancing speed, accuracy, and robustness - By Matt Rabinovitch
July 28, 2020
Visual Object Tracking, or tracking for short, is a fundamental component of autonomous vision systems. It allows machines to follow regions of interest through a series of images.
Because of their resource efficiency Region Of Interest Trackers (KCF, CSRT, THOR, RAD, etc) have become a very popular approach to visual object tracking.
Many modern trackers prioritize speed and accuracy at the expense of robustness.
At Teleidoscope, we’ve developed our own tracker (RAD) that attempts to address all three.
Note: Performance in video above is impacted by visualization view rendering. See videos below for more accurate representation.
The Balanced Tracker
A balanced tracker balances the following attributes:
- Speed - How fast can it produce an estimate
- Accuracy - How precisely it estimates the location of the object for a wide variety of objects
- Robustness - How reliably it handles difficult conditions without losing the objects
The impact one attribute has on the others varies by implementation but in general the following is often true.
- Speed can be traded for Accuracy or visa-versa
- Accuracy can be traded for Robustness or visa-versa
It makes sense that speed and accuracy are prioritized considering that’s what trackers are usually supplementing. But it doesn’t explain why robustness is hurt in the process.
The impact to robustness lies in how trackers determine whether or not they’re still tracking an object. Most trackers take a pass/fail approach and compare a confidence score to some fixed threshold to determine if their object is tracked (pass) or lost (failed).
This approach can lead to problems because the range of possible scores differs between objects. This forces many trackers to choose between a high threshold which increases spurious failures or a low threshold which increases the chance of drifting.
The latter is usually chosen because it allows a wider variety of objects to be tracked accurately with this approach. However, this assumption can cause issues in unexpected ways, as demonstrated below.
Speed and Accuracy and Robustness
In the video above, the combination of rapid scaling and background perspective change resulted in CSRT scaling incorrectly and not realizing it had drifted.
Many autonomous systems (e.g drones) rely on correct status reporting to determine if they need to perform more computationally expensive recovery tasks such as detection. If the tracker fails to often (high threshold), resources are wasted and if it doesn’t detect true failures (low threshold), drifting occurs. The latter can have severe affects on autonomous systems because they won’t know anything is wrong.
This make it clear that robustness is equally important as speed and accuracy. This of course makes the task of balancing a tracker very difficult.
This is where Teleidoscope’s RAD (Relocalizable Adaptive Discriminative) Tracker comes in.
RAD auto-calibrates to each object instead of using fixed implementation specific thresholds allowing it to report when tracking becomes unstable and recover itself when it does.
This allows RAD to track and recover, even in situations where there is very little scene detail.
Teleidoscope’s Visual Object Tracking Framework
The RAD tracker is at the core of Teleidoscope’s visual tracking framework which was designed with these issues in mind. Recovery and self-diagnostics are just some of the features that set RAD apart. Designation options for RAD can be seen in the below diagram and will be covered in a future post.
For more information, please reach out to contact@teleidoscope.com.