Printable version
| Introduction |
The ImageCLEFphoto 2006 task is
similar to the classic TREC ad-hoc retrieval
task: simulation of the situation in which a system knows the set of documents
to be searched, but cannot anticipate the particular topic that will be
investigated (i.e. topics are not known to the system in advance).
The
goal of ImageCLEFphoto 2006 is: given a multilingual statement describing a
user information need, find as many relevant images as possible from the
IAPR
TC-12 photographic collection.
This contains 20,000 photographs on a variety of
topics and from worldwide locations collected by Michael Grubinger and
viventura (a travel organisation). All
photographs contain structured metadata (including a description and location)
in both English and German.
Any method can be used to
retrieve relevant documents and we encourage the use of both text and
content-based retrieval methods. One of the main interested of the
ImageCLEFphoto ad-hoc task is investigating various methods of query
translation and how features derived from the image captions and images
themselves can be combined to enhance retrieval. We would like to determine how
text and image attributes can be combined to enhance cross-language image
retrieval in this kind of domain. A further aspect of the task this year is to
also investigate how the vocabularly mismatch between captions and queries can
be bridged. There are three sets of topics for the 2006 ad-hoc task:
- Standard ad-hoc topics which typically
require use of both textual and visual information.
- Geographically-orientated topics which require textual and visual
information, along with spatial awareness (these are part of the
GeoCLEF 2006 task).
- Visual topics which are aimed to evaluate
purely CBIR approaches on the IAPR TC-12 collection.
You can experiment with any set of
topics, but we would ask (if possible) that you also provide submissions
for the standard ad-hoc set to enable us to compare systems. This is an
ImageCLEF task.
|
| Data |
The image
collection of the IAPR TC-12 Benchmark consists of 20,000 still natural
images (plus 20,000 corresponding thumbnails) taken from locations around the
world and comprising an assorted cross-section of still natural images. This
includes pictures of different sports and actions, photographs of people,
animals, cities, landscapes and many other aspects of contemporary
life.
Each image is associated with a text caption in
two languages (English and German) . These annotations are stored in a
semi-structured format including an image title, the location at which the
photograph was taken, the name of the photographer and a description of the
contents of the image (as determined by the photographer).
 |
<DOC> <DOCNO>annotations/00/60.eng</DOCNO>
<TITLE>Palma </TITLE> <DESCRIPTION>two lane street with
large shops on the right and smaller shops on the left; people are walking on
the sidewalk, some are crossing the street; cars are parked along the left side
of the street as well; </DESCRIPTION> <NOTES>The main shopping
street in Paraguay; </NOTES> <LOCATION>Asunción,
Paraguay </LOCATION> <DATE>March 2002 </DATE>
<IMAGE>images/00/60.jpg </IMAGE>
<THUMBNAIL>thumbnails/00/60.jpg </THUMBNAIL> </DOC>
|
Please see
this
page for more information about the IAPR collection as used in ImageCLEF.
Grubinger, M., Clough, P.,
Müller, H. and Deselaers, T. (2006),
The IAPR
TC-12 Benchmark: A New Evaluation Resource for Visual Information
Systems, In Proceedings of International Workshop OntoImage2006
Language Resources for Content-Based Image Retrieval, held in
conjuction with LREC'06, Genoa, Italy (in print).
|
| Relevance
Assessments |
In the past,
relevance assessments have been performed by students and staff at the
University of Sheffield and Victoria University. Submissions are used to create
image pools which are judged for relevance by assessors. The pools are assessed
and the end result is a set of relevance assessments called qrels. These are
then used to evaluate system performance and compare submissions. For 2006,
we will be asking for help from participating groups to perform relevance
assessments. For more information about this procedure and the qrels sets see
this paper: "The CLEF 2003
Cross Language Image Retrieval Track" and "The CLEF 2004
Cross Language Image Retrieval Track" . Relevance
assessment for the more general topics is based
entirely on the visual content of images (e.g. aircraft on the ground).
However, certain topics also require the use of the caption to make a confident
decision (e.g. "pictures of beaches in northern Peru"). What constitutes a
relevant image is a subjective decision, but typically a relevant image will
have the subject of the topic in the foreground, the image will not be too dark
in contrast, and maybe the caption confirms the judge's decision.
The assessment of images in
ImageCLEFphoto is based on using a ternary classification scheme: (1) relevant,
(2) partially relevant and (3) not relevant. The aim of the ternary scheme is
to help assessors in making their relevance judgements more accurate (e.g. an
image is definitely relevant in some way, but maybe the query object is not
directly in the foreground: it is therefore considered partially relevant).
Various combinations of assessor judgements are used to create the qrels sets
and more information can be found from the links given above.
|
| Experiments |
This text relates to the
standard ad-hoc set of topics. Experiments are performed as
follows: participants are given topics, these are used to create a query which
is used to perform retrieval on the image collection. This process iterates
(e.g. maybe involving relevance feedback) until you are satisfied with your
runs. You might try different methods to
increase the number of relevant in the top N rank positions (e.g. query
expansion). You can repeat these different methods for each query (or
source) and collection (or target) language.
You are free to experiment with whatever methods
you wish for CLIR and image retrieval, e.g. query expansion based on thesaurus
lookup or relevance feedback, indexing and retrieval on only part of the image
caption, different models of retrieval, different translation resources (e.g.
dictionary-based vs. MT), and combining text and content-based methods for
retrieval. Given the many different possible approaches which could be used to
perform the ad-hoc retrieval, rather than list all of these we will ask you to
indicate which of the following applies to each of your runs (we
consider these the "main" dimensions which define the query for this ad hoc
task):
| |
This ... |
Code |
... or this |
Code |
| Query/source language |
English |
EN |
Other (please state) |
See below |
| Caption/target language |
English |
EN |
German |
DE |
| Query/run type |
automatic |
AUTO |
manual |
MAN |
| Feedback/expansion |
without |
NOFB |
with |
FB |
| Modality |
text |
TXT |
image |
IMG |
Query/source language
Specify the query or source language used in the
run. We would ask you identify languages using the following language codes:
English (EN), German (DE), French (FR), Portuguese (PT), Spanish (ES), Italian
(IT), Finnish (FI), Japanese (JA), Chinese-simplified (ZHS),
Chinese-traditional (ZHT), Polish (PL), Norwegian (NO), Swedish (SV), Russian
(RU), Danish (DA) and Dutch (NL). Caption/target language
There are two possible target languages: German
(DE) and English (EN). You can also use both languages in one run (ALL).
Query/run
type We distinguish between manual (MAN) and automatic (AUTO) submissions.
Automatic runs will involve no user interaction; whereby manual runs are those
in which a human has been involved in query construction and the iterative
retrieval process, e.g. manual relevance feedback is performed. We encourage
groups who want to investigate manual intervention further to participate in
the interactive evaluation (iCLEF)
task. Modality This describes the use of visual (image) or text features in your
submission. If you want to experiment with a purely visual approach, please also
consider using the visual topics. A
text-only run will have modality text (TXT); a purely visual run will
have modality image (IMG) and a combined submission (e.g. initial text
search followed by a possibly combined visual search) will have modality
text+image (TXTIMG).
|
| Submission format and
guidelines |
|
What to
submit
We would ask that you
submit a baseline run with which to compare your other submissions. There will
be one baseline run for each query language (please include monolingual runs
in your submission: English-English and German-German) and according
to the previous table these would be classified/identified as:
EN-EN-AUTO-NOFB-TXT for
the English-English monolingual run
DE-DE-AUTO-NOFB-TXT for
the German-German monolingual run
It is extremely important
that we can get a description of the techniques that you use for all runs. This
should be as detailed as possible to ease the comparison or classification of
techniques and results. It would be helpful if you could also indicate which
caption fields you indexed (e.g. all fields, or all fields except the
description). The final proceedings will be published in Springer Lecture Notes
on Computer Science. It is probably easier if you use the
LNCS
templates for the submission of your results.
Submission
format
For this task,
participants are required to submit ranked lists of (up to) the top 1000 images
ranked in descending order of similarity (i.e. the highest nearer the top of
the list). Participants can submit (via email) as many system runs as they
require, but should indicate their best runs for each language as we can only
guarantee evaluation of this alone. The format of submissions for this ad-hoc
task can be found here and the filenames
should distinguish different types of submission according to the table above.
Please note that there
should be at least 1 document entry in your results for each topic (i.e. if
your system returns no results for a query then insert a dummy entry, e.g. 25 1
annotations/16/16019 0 4238 xyzT10af5 ). The reason for this is to make sure
that all systems are compared with the same number of topics and relevant
documents.
Ranked lists from
participants for this ad hoc task will be evaluated using trec_eval by
including recall and precision at various cut-off levels plus single-value
summaries derived from precision and recall, i.e. mean average precision and
R-precision. We also plan to evaluate submissions based on Precision at 20
(P20) and arithmetic and geometric means.
|
| Provided Data and
Systems |
Training data As a new collection, we do not currently have any training
data available. We suggest you create your own topics and generate relevance
assessments similar to the topics provided. CBIR
Systems
To enable participation to the ad hoc task to
those without access to their own CBIR system, we suggest using the
GIFT/Viper image retrieval
system or the
FIRE
image retriveal system. For more information about using the GIFT/Viper
system in ImageCLEFphoto please contact
Henning
Müller. For more information about
the FIRE system please contact
Thomas
Desaelars. |
| Important Dates |
Data Release: 15 March 2006 Topic Release: 30
April 2006 Submission of Runs: Monday 19 June 2006 |
|