Détail d'une fiche   Version PDF


Apprentissage de modèles visuels à partir de données massives


Statut: Décision signée

Responsable : Julien Mairal

Mots-clés de "A - Thèmes de recherche en Sciences du numérique - 2023" : A3.4. Apprentissage et statistiques , A5.3. Analyse et traitement d'images , A5.4. Vision par ordinateur , A5.9. Traitement du signal , A6.2.6. Optimisation , A8.2. Optimisation , A9.2. Apprentissage , A9.3. Analyse de signaux (vision, parole, etc.) , A9.7. Algorithmique de l'intelligence artificielle

Mots-clés de "B - Autres sciences et domaines d'application - 2023" : B9.5.6. Science des données

Domaine : Perception, Cognition, Interaction
Thème : Vision, perception et interprétation multimedia

Période : 01/03/2016 -> 31/12/2027
Dates d'évaluation : 03/10/2018 ,

Etablissement(s) de rattachement : <sans>
Laboratoire(s) partenaire(s) : LJK (UMR5224)

CRI : Centre Inria de l'Université Grenoble Alpes
Localisation : Centre de recherche Inria de l'Université Grenoble Alpes
Code structure Inria : 071126-1

Numéro RNSR : 201622034K
N° de structure Inria: SR0735UR


Thoth is a joint team of Inria and Laboratoire Jean Kuntzmann, and started in January 2016. It is a follow up to the LEAR team (2003-2015).

Thoth is motivated by today's context in which the quantity of digital images and videos available on-line continues to grow at a phenomenal speed: home users put their movies on YouTube and their images on Flickr; journalists and scientists set up web pages to disseminate news and research results; and audiovisual archives from TV broadcasts are opening to the public. Thus, there is a pressing and in fact increasing demand to annotate and index this visual content for home and professional users alike. Current object recognition and scene understanding technology mostly relies on fully supervised classification engines, and visual models are essentially (piecewise) rigid templates learned from hand labeled images. The sheer scale of on-line data and the nature of the embedded annotation call for a departure from this fully supervised scenario. The main objective of the Thoth project-team is to develop a new framework for learning the structure and parameters of visual models by actively exploring large digital image and video sources (off-line archives as well as growing on-line content), and exploiting the weak supervisory signal provided by the accompanying meta-data.

Axes de recherche

The main objectives of the team are:

(i) designing and learning structured models capable of representing this visual information: Developing novel models for a more complete understanding of scenes to address all the component tasks. We propose to incorporate the structure in image and video data explicitly into the models. In other words, our models aim to satisfy the complex sets of constraints that exist in natural images and videos.

(ii) learning visual models from minimal supervision or unstructured meta-data: The approach we propose to address the limitations of the fully supervised learning paradigm aligns with “Big Data” approaches developed in other areas: we rely on the orders-of-magnitude-larger training sets that have recently become available with metadata to compensate for less explicit forms of supervision.

(iii) large-scale learning and optimization: This part of our research concentrates on the design and theoretical justifications of deep architectures, with a focus on weakly supervised and unsupervised learning, and the development of continuous and discrete optimization techniques that push the state of the art in terms of speed and scalability.

An additional focus of Thoth is on collection of appropriate datasets and design of accompanying evaluation protocols.

Relations industrielles et internationales

Thoth team members collaborate with academic research groups at UC Berkeley, MPI Tubingen, University of Washington, Inria WILLOW team, IIIT Hyderabad (India), and also industrial partners such as Naver Labs, Valeo AI, Facebook AI Research, Microsoft Research-Inria Joint Centre, Google and Criteo.