Data-Efficient And Robust Deep Learning From Large Vision And Language Data