NEAT: Neural Attention Fields for End-to-End Autonomous Driving

NEAT is a continuous function which maps locations in Bird’s Eye View (BEV) scene coordinates to waypoints and semantics, using intermediate attention maps to iteratively compress high dimensional 2D image features into a compact representation.

CARLA生成的数据集

Imitation Learning (IL)

Given a dataset of expert trajectories, a behavior cloning agent is trained through supervised learning, where the goal is to predict the actions of the expert given some sensory input regarding the scene

Untitled

Inspired by implicit shape representations, NEAT represents large dynamic scenes with

a fixed memory footprint using a multi-layer perceptron (MLP) query function.

利用NLP把input img 转成compact low-dimensional representation relevant to the query location (x, y, t)

interpretable attention maps(但是不需要 attention supervision,整不明白)

the output of this learned MLP can be used for dense prediction in space and time

端到端waypoint预测,同时把bev语义分割作为辅助任务。

3 camera

Untitled