Ad-hoc photographic retrieval task 2006

Printable version

Introduction

The ImageCLEFphoto 2006 task is similar to the classic TREC ad-hoc retrieval task: simulation of the situation in which a system knows the set of documents to be searched, but cannot anticipate the particular topic that will be investigated (i.e. topics are not known to the system in advance). The goal of ImageCLEFphoto 2006 is: given a multilingual statement describing a user information need, find as many relevant images as possible from the IAPR TC-12 photographic collection.
This contains 20,000 photographs on a variety of topics and from worldwide locations collected by Michael Grubinger and viventura (a travel organisation). All photographs contain structured metadata (including a description and location) in both English and German.

Any method can be used to retrieve relevant documents and we encourage the use of both text and content-based retrieval methods. One of the main interested of the ImageCLEFphoto ad-hoc task is investigating various methods of query translation and how features derived from the image captions and images themselves can be combined to enhance retrieval. We would like to determine how text and image attributes can be combined to enhance cross-language image retrieval in this kind of domain. A further aspect of the task this year is to also investigate how the vocabularly mismatch between captions and queries can be bridged.

There are three sets of topics for the 2006 ad-hoc task:
  1. Standard ad-hoc topics which typically require use of both textual and visual information.
  2. Geographically-orientated topics which require textual and visual information, along with spatial awareness (these are part of the GeoCLEF 2006 task).
  3. Visual topics which are aimed to evaluate purely CBIR approaches on the IAPR TC-12 collection.
You can experiment with any set of topics, but we would ask (if possible) that you also provide submissions for the standard ad-hoc set to enable us to compare systems. This is an ImageCLEF task.


Data

The image collection of the IAPR TC-12 Benchmark consists of 20,000 still natural images (plus 20,000 corresponding thumbnails) taken from locations around the world and comprising an assorted cross-section of still natural images. This includes pictures of different sports and actions, photographs of people, animals, cities, landscapes and many other aspects of contemporary life.

Each image is associated with a text caption in two languages (English and German) . These annotations are stored in a semi-structured format including an image title, the location at which the photograph was taken, the name of the photographer and a description of the contents of the image (as determined by the photographer).

<DOC>
<DOCNO>annotations/00/60.eng</DOCNO>
<TITLE>Palma </TITLE>
<DESCRIPTION>two lane street with large shops on the right and smaller shops on the left; people are walking on the sidewalk, some are crossing the street; cars are parked along the left side of the street as well; </DESCRIPTION>
<NOTES>The main shopping street in Paraguay; </NOTES>
<LOCATION>Asunción, Paraguay </LOCATION>
<DATE>March 2002 </DATE>
<IMAGE>images/00/60.jpg </IMAGE>
<THUMBNAIL>thumbnails/00/60.jpg </THUMBNAIL>
</DOC>

Please see this page for more information about the IAPR collection as used in ImageCLEF.

Grubinger, M., Clough, P., Müller, H. and Deselaers, T. (2006), The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems, In Proceedings of International Workshop OntoImage’2006 Language Resources for Content-Based Image Retrieval, held in conjuction with LREC'06, Genoa, Italy (in print).


Relevance Assessments

In the past, relevance assessments have been performed by students and staff at the University of Sheffield and Victoria University. Submissions are used to create image pools which are judged for relevance by assessors. The pools are assessed and the end result is a set of relevance assessments called qrels. These are then used to evaluate system performance and compare submissions. For 2006, we will be asking for help from participating groups to perform relevance assessments.

For more information about this procedure and the qrels sets see this paper: "The CLEF 2003 Cross Language Image Retrieval Track" and "The CLEF 2004 Cross Language Image Retrieval Track"
.
Relevance assessment for the more general topics is based entirely on the visual content of images (e.g. aircraft on the ground). However, certain topics also require the use of the caption to make a confident decision (e.g. "pictures of beaches in northern Peru"). What constitutes a relevant image is a subjective decision, but typically a relevant image will have the subject of the topic in the foreground, the image will not be too dark in contrast, and maybe the caption confirms the judge's decision.

The assessment of images in ImageCLEFphoto is based on using a ternary classification scheme: (1) relevant, (2) partially relevant and (3) not relevant. The aim of the ternary scheme is to help assessors in making their relevance judgements more accurate (e.g. an image is definitely relevant in some way, but maybe the query object is not directly in the foreground: it is therefore considered partially relevant). Various combinations of assessor judgements are used to create the qrels sets and more information can be found from the links given above.
 

Experiments

This text relates to the standard ad-hoc set of topics.

Experiments are performed as follows: participants are given topics, these are used to create a query which is used to perform retrieval on the image collection. This process iterates (e.g. maybe involving relevance feedback) until you are satisfied with your runs. You might try different methods to increase the number of relevant in the top N rank positions (e.g. query expansion). You can repeat these different methods for each query (or source) and collection (or target) language.

You are free to experiment with whatever methods you wish for CLIR and image retrieval, e.g. query expansion based on thesaurus lookup or relevance feedback, indexing and retrieval on only part of the image caption, different models of retrieval, different translation resources (e.g. dictionary-based vs. MT), and combining text and content-based methods for retrieval. Given the many different possible approaches which could be used to perform the ad-hoc retrieval, rather than list all of these we will ask you to indicate which of the following applies to each of your runs (we consider these the "main" dimensions which define the query for this ad hoc task):

   This ... Code ... or this Code
Query/source language English EN Other (please state) See below
Caption/target language English EN German  DE
Query/run type automatic AUTO manual MAN
Feedback/expansion  without NOFB with FB
Modality text TXT image IMG


Query/source language

Specify the query or source language used in the run. We would ask you identify languages using the following language codes: English (EN), German (DE), French (FR), Portuguese (PT), Spanish (ES), Italian (IT), Finnish (FI), Japanese (JA), Chinese-simplified (ZHS), Chinese-traditional (ZHT), Polish (PL), Norwegian (NO), Swedish (SV), Russian (RU), Danish (DA) and Dutch (NL).


Caption/target language

There are two possible target languages: German (DE) and English (EN). You can also use both languages in one run (ALL).


Query/run type

We distinguish between manual (MAN) and automatic (AUTO) submissions. Automatic runs will involve no user interaction; whereby manual runs are those in which a human has been involved in query construction and the iterative retrieval process, e.g. manual relevance feedback is performed. We encourage groups who want to investigate manual intervention further to participate in the interactive evaluation (iCLEF) task.


Modality

This describes the use of visual (image) or text features in your submission. If you want to experiment with a purely visual approach, please also consider using the visual topics. A text-only run will have modality text (TXT); a purely visual run will have modality image (IMG) and a combined submission (e.g. initial text search followed by a possibly combined visual search) will have modality text+image (TXTIMG).


Submission format and guidelines


What to submit

We would ask that you submit a baseline run with which to compare your other submissions. There will be one baseline run for each query language (please include monolingual runs in your submission: English-English and German-German) and according to the previous table these would be classified/identified as:

EN-EN-AUTO-NOFB-TXT for the English-English monolingual run

DE-DE-AUTO-NOFB-TXT for the German-German monolingual run

It is extremely important that we can get a description of the techniques that you use for all runs. This should be as detailed as possible to ease the comparison or classification of techniques and results. It would be helpful if you could also indicate which caption fields you indexed (e.g. all fields, or all fields except the description). The final proceedings will be published in Springer Lecture Notes on Computer Science. It is probably easier if you use the LNCS templates for the submission of your results.

Submission format

For this task, participants are required to submit ranked lists of (up to) the top 1000 images ranked in descending order of similarity (i.e. the highest nearer the top of the list). Participants can submit (via email) as many system runs as they require, but should indicate their best runs for each language as we can only guarantee evaluation of this alone. The format of submissions for this ad-hoc task can be found here and the filenames should distinguish different types of submission according to the table above.

Please note that there should be at least 1 document entry in your results for each topic (i.e. if your system returns no results for a query then insert a dummy entry, e.g. 25 1 annotations/16/16019 0 4238 xyzT10af5 ). The reason for this is to make sure that all systems are compared with the same number of topics and relevant documents.

Ranked lists from participants for this ad hoc task will be evaluated using trec_eval by including recall and precision at various cut-off levels plus single-value summaries derived from precision and recall, i.e. mean average precision and R-precision. We also plan to evaluate submissions based on Precision at 20 (P20) and arithmetic and geometric means.

Provided Data and Systems

Training data

As a new collection, we do not currently have any training data available. We suggest you create your own topics and generate relevance assessments similar to the topics provided.


CBIR Systems


To enable participation to the ad hoc task to those without access to their own CBIR system, we suggest using the GIFT/Viper image retrieval system or the FIRE image retriveal system. For more information about using the GIFT/Viper system in ImageCLEFphoto please contact Henning Müller. For more information about the FIRE system please contact Thomas Desaelars.


Important Dates

Data Release: 15 March 2006
Topic Release: 30 April 2006
Submission of Runs: Monday 19 June 2006



Organisers of ImageCLEFphoto

Paul Clough, Department of Information Studies, University of Sheffield, UK (p.d.clough@sheffield.ac.uk)

Michael Grubinger, School of Computer Science and Mathematics, Victoria University, Australia (michael.grubinger@research.vu.edu.au)


Mailing list

We have set up a mailing list: imageclef@sheffield.ac.uk for participants. Please contact Paul Clough to be added to the list.

Last Modified: March 2006

By: Paul Clough