autoencoder:高维CLIP --encoder–> 低维latent space --decoder–> 高维CLIP
3D Language Fields
是什么:modeling a 3D language field allows users to interact with and query 3D worlds using open-ended language, which presents a promising avenue for human-computer interaction and understanding. 如用户输入“椅子”、“桌子”,系统能够在三维场景中识别、定位或分割出与查询相关的对象。将自然语言与三维场景联系起来,支持用户通过语言与三维世界进行交互和查询
应用场景:The field of open-ended language queries in 3D has attracted increasing attention due to its various applications such as:
robotic navigation
manipulation
3D semantic understanding
editing
autonomous driving
augmented/virtual reality
原理:Feature distillation from off-the-shelf vision-language models into a 3D scene