Home>>Research>>LHI Image Database for the ground truth of vision

LHI Image Database for the ground truth of vision

 

It' s a common recognition in machine vision society to the importance of building up a database containing 'ground truth' annotations parsed by humans for a wide variety of natural images. Our goal with this project is to build up the largest annotated image database in the world. Since specifying detailed localized image regions and semantics is a time consuming task, we hope it be done so that it serves diverse training and evaluation endeavors. To this end, we initialized our dataset with great efforts to accommodate applications ranging from high-level (e.g. scene properties and hierarchical object labeling) to mid/low-level (e.g. sketch primitive, contours and layers), mainly following the Open Ground Truth Initiative meeting held in Lotus Hill Institute, Sept. 2005. Our dataset will distinguish itself from other competitive efforts (see for example: the MIT-CSAIL Database [Torralba et. al.], the UA (Arizona) localized semantics dataset [Barnard et. al], the Berkeley Segmentation Dataset [Martin and Fowlkes] and the CalTech 101 database [Fei-Fei et al.]) with the following features: First, it is the largest and most generic purpose dataset (the proposed collection will contain 1,000,000 natural images of various topics taken from multiple cameras with different resolutions); Second, every image will be decomposed in a hierarchical way for both parsing (syntax) and understanding (semantic/functional); Last, it will be the first dataset that record sketches (2D, 2.1D and 2.5D sketches, 3D geometrical information) together with images.

 

 

There are 3,927,130 POs, 636,748 images and video frames in our database before Feb. 2, 2007, and the number is growing everyday. As illustrated in the following inventory, wide-spread images had been annotated. You can see the examples by clicking the corresponding button.

scence aerial image face text other generic object video

 

scence aerial image face text other generic object video