System documentation of the GNU Image-Finding Tool

Public Member Functions | Protected Types | Protected Member Functions | Protected Attributes | List of all members
CAcIFFileSystem Class Reference

An accessor to an inverted file. More...

#include <CAcIFFileSystem.h>

Inheritance diagram for CAcIFFileSystem:
CAcInvertedFile CAcURL2FTS CAccessor CAccessorImplementation CAccessor

Public Member Functions

bool operator() () const
 for testing if the inverted file is correctly constructed
 
 CAcIFFileSystem (const CXMLElement &inCollectionElement)
 This opens an exsisting inverted file, and then inits this structure. More...
 
bool init (bool)
 called by constructors
 
 ~CAcIFFileSystem ()
 Destructor.
 
string IDToURL (TID inID) const
 Translate a DocumentID to a URL (for output)
 
virtual pair< bool, TID > URLToID (const string &inURL) const
 Translate an URL to its document ID. More...
 
void getAllIDs (list< TID > &) const
 List of the IDs of all documents present in the inverted file.
 
void getAllAccessorElements (list< CAccessorElement > &) const
 List of triplets (ID,imageURL,thumbnailURL) of all the documents present in the inverted file.
 
void getRandomIDs (list< TID > &, list< TID >::size_type) const
 get a given number of random C-AccessorElement-s More...
 
void getRandomAccessorElements (list< CAccessorElement > &outResult, list< CAccessorElement >::size_type inSize) const
 For drawing random sets. More...
 
int size () const
 The number of images in this accessor.
 
TID getMaximumFeatureID () const
 This is interesting for browsing.
 
list< TID > * getAllFeatureIDs () const
 Getting a list of all features contained in this. More...
 
virtual pair< bool, CAccessorElementIDToAccessorElement (TID inID) const
 Translate a DocumentID to an accessor Element. More...
 
 operator bool () const
 is this well constructed?
 
The proper inverted file access
CDocumentFrequencyListFeatureToList (TFeatureID) const
 List of documents containing the feature.
 
CDocumentFrequencyListURLToFeatureList (string inURL) const
 List of features contained by a document.
 
CDocumentFrequencyListDIDToFeatureList (TID inDID) const
 List of features contained by a document with ID inDID.
 
Accessing information about features
double FeatureToCollectionFrequency (TFeatureID) const
 Collection frequency for a given feature.
 
unsigned int getFeatureDescription (TID inFeatureID) const
 What kind of feature is the feature with ID inFeatureID?
 
Accessing additional document information
double DIDToMaxDocumentFrequency (TID) const
 returns the maximum document frequency for one document ID
 
double DIDToDFSquareSum (TID) const
 Returns the document-frequency square sum for a given document ID.
 
double DIDToSquareDFLogICFSum (TID) const
 Returns this function for a given document ID.
 
bool generateInvertedFile ()
 Generating an inverted File, if there is none. More...
 
bool newGenerateInvertedFile ()
 Generating an inverted File, if there is none. More...
 
bool checkConsistency ()
 Check the consistency of the inverted file system accessed by this accessor. More...
 
bool findWithinStream (TID inFeatureID, TID inDocumentID, double inDocumentFrequency) const
 Is the Document with inDocumentID contained in the document frequency list of the feature inFeatureID and is the associated document frequency the same? More...
 
- Public Member Functions inherited from CAcInvertedFile
bool operator() () const
 for testing if the inverted file is correctly constructed
 
 CAcInvertedFile (const CXMLElement &inCollectionElement)
 This opens an exsisting inverted file, and then inits this structure. More...
 
bool init (bool)
 called by constructors
 
 ~CAcInvertedFile ()
 Destructor.
 
string IDToURL (TID inID) const
 Translate a DocumentID to a URL (for output)
 
TID URLToID (const string &inURL) const
 Translate an URL to its document ID.
 
TID getMaximumFeatureID () const
 This is interesting for browsing.
 
list< TID > * getAllFeatureIDs () const
 Getting a list of all features contained in this. More...
 
CDocumentFrequencyListFeatureToList (TFeatureID) const
 List of documents containing the feature.
 
CDocumentFrequencyListURLToFeatureList (string inURL) const
 List of features contained by a document.
 
CDocumentFrequencyListDIDToFeatureList (TID inDID) const
 List of features contained by a document with ID inDID.
 
double FeatureToCollectionFrequency (TFeatureID) const
 Collection frequency for a given feature.
 
unsigned int getFeatureDescription (TID inFeatureID) const
 What kind of feature is the feature with ID inFeatureID?
 
double DIDToMaxDocumentFrequency (TID) const
 returns the maximum document frequency for one document ID
 
double DIDToDFSquareSum (TID) const
 Returns the document-frequency square sum for a given document ID.
 
double DIDToSquareDFLogICFSum (TID) const
 Returns this function for a given document ID.
 
bool generateInvertedFile ()
 Generating an inverted File, if there is none. More...
 
bool newGenerateInvertedFile ()
 Generating an inverted File, if there is none. More...
 
bool checkConsistency ()
 Check the consistency of the inverted file system accessed by this accessor. More...
 
bool findWithinStream (TID inFeatureID, TID inDocumentID, double inDocumentFrequency) const
 Is the Document with inDocumentID contained in the document frequency list of the feature inFeatureID and is the associated document frequency the same?
 
- Public Member Functions inherited from CAcURL2FTS
const string & getURLToFeatureFileName () const
 gives back the content of mURLToFeatureFileName
 
 CAcURL2FTS (const CXMLElement &inContentElement)
 Constructor: slurp in an url2fts file and fill the maps. More...
 
pair< bool, string > URLToFFN (const string &inURL) const
 gives the feature file name which corresponds to a given URL return value: pair of bool (does the feature file exsist) string (the feature file name)
 
pair< bool, string > IDToFFN (TID inID) const
 gives the feature file name which corresponds to a given URL return value: pair of bool (does the feature file exsist) string (the feature file name)
 
- Public Member Functions inherited from CAccessor
virtual ~CAccessor ()
 virtual accessor for clean destruction
 
virtual CXMLElementprepareDatabase ()
 If a new collection is created during runtime, this function prepares the indexing structures such that they are able to accept new objects. More...
 
virtual bool isPreparedDatabase () const
 Is the database accessed by this accessor prepared? In other words: is there an index structure to access?
 

Protected Types

typedef HASH_MAP< TID, streampos > CIDToOffset
 map from feature id to the offset for this feature
 
- Protected Types inherited from CAcInvertedFile
typedef hash_map< TID, unsigned int > CIDToOffset
 map from feature id to the offset for this feature
 

Protected Member Functions

void writeOffsetFileElement (TID inFeatureID, streampos inPosition, ostream &inOpenOffsetFile)
 add a pair of FeatureID,Offset to the open offset file (helper function for inverted file construction)
 
CDocumentFrequencyListgetFeatureFile (string inFileName) const
 loads a *.fts file. More...
 
- Protected Member Functions inherited from CAcInvertedFile
void writeOffsetFileElement (TID inFeatureID, int inPosition, ostream &inOpenOffsetFile)
 add a pair of FeatureID,Offset to the open offset file (helper function for inverted file construction)
 
CDocumentFrequencyListgetFeatureFile (string inFileName) const
 loads a *.fts file. More...
 
- Protected Member Functions inherited from CAccessor
virtual void dummy () const
 without this function things like upcasting etc. More...
 

Protected Attributes

CMutex mMutex
 the mutex for multi threading
 
CSelfDestroyPointer< CAcURL2FTSmURL2FTS
 In order to have just one parent, I have to limit on single inheritance. More...
 
TID mMaximumFeatureID
 the maximum feature ID arising in this file
 
string mInvertedFileBuffer
 A buffer, if the inverted file is to be held in ram.
 
string mTemporaryIndexingFileBase
 Some place for putting temporary indexing data.
 
CSelfDestroyPointer< istream > mInvertedFile
 The inverted file.
 
ifstream mOffsetFile
 Feature -> Offset in inverted file.
 
ifstream mFeatureDescriptionFile
 File of feature descriptions.
 
string mInvertedFileName
 Name of the inverted file.
 
string mOffsetFileName
 Name of the Offset file.
 
string mFeatureDescriptionFileName
 Name for the file with the feature description.
 
CIDToOffset mIDToOffset
 map from feature id to the offset for this feature
 
HASH_MAP< TID, double > mFeatureToCollectionFrequency
 map from feature to the collection frequency
 
for fast access...
HASH_MAP< TID, unsigned int > mFeatureDescription
 map from the feature ID to the feature description
 
CADIHash mDocumentInformation
 additional information about the document like, e.g. More...
 
- Protected Attributes inherited from CAcInvertedFile
TID mMaximumFeatureID
 the maximum feature ID arising in this file
 
CArraySelfDestroyPointer< char > mInvertedFileBuffer
 A buffer, if the inverted file is to be held in ram.
 
CSelfDestroyPointer< istream > mInvertedFile
 The inverted file.
 
ifstream mOffsetFile
 Feature -> Offset in inverted file.
 
ifstream mFeatureDescriptionFile
 File of feature descriptions.
 
string mInvertedFileName
 Name of the inverted file.
 
string mOffsetFileName
 Name of the Offset file.
 
string mFeatureDescriptionFileName
 Name for the file with the feature description.
 
CIDToOffset mIDToOffset
 map from feature id to the offset for this feature
 
hash_map< TID, double > mFeatureToCollectionFrequency
 map from feature to the collection frequency
 
hash_map< TID, unsigned int > mFeatureDescription
 map from the feature ID to the feature description
 
CADIHash mDocumentInformation
 additional information about the document like, e.g. More...
 
- Protected Attributes inherited from CAcURL2FTS
TID mID
 the ID of the next element
 
string mURLPrefix
 the url-prefix for the image list
 
string mThumbnailURLPrefix
 the thumbnail-url-prefix for the image list
 
CMutex mMutexURL2FTS
 the mutex for multithreading the name is intended to be unique and immune against inheritance...
 
string_string_map mURLToFFN
 map from the url of an image to the name of the feature file for this image
 
TID_string_map mIDToFFN
 map from the id of an image to the name of the feature file for this image
 
ifstream mURLToFeatureFile
 URL -> FeatureFileName.
 
string mURLToFeatureFileName
 Name of the file that contains pairs of URL and the Feature file that belongs to the URL.
 
- Protected Attributes inherited from CAccessorImplementation
string_TID_map mURLToID
 map the url of an image to the id of this image
 
TID_CAccessorElement_map mIDToAccessorElement
 maps the ID of an image to the URL of this image
 

Detailed Description

An accessor to an inverted file.

This access is done "by hand".

For a long time we wanted to move to memory mapped files (like SWISH++) but currently I think this is not the best idea.

Constructor & Destructor Documentation

CAcIFFileSystem::CAcIFFileSystem ( const CXMLElement inCollectionElement)

This opens an exsisting inverted file, and then inits this structure.

After that it is fully usable

As a paramter it takes an XMLElement which contains a "collection" element and its content.

If the attribute cui-generate-inverted-file is true, then a new inverted file will be generated using the parameters given in inCollectionElement. you will NOT be able to use *this afterwards.

Like every accessor, this accessor takes a <collection> MRML element as input (

See also
CXMLElement for how to access the attributes of this element). Currently this accessor understands the following attributes

cui-base-dir: the directory containing the following files cui-inverted-file-location: the location of the inverted file cui-offset-file-location: a file containing offsets into the inverted file cui-feature-file-location: the location of the "url2fts" file which translates urls to feature file names.

Member Function Documentation

bool CAcIFFileSystem::checkConsistency ( )
virtual

Check the consistency of the inverted file system accessed by this accessor.

Implements CAcInvertedFile.

bool CAcIFFileSystem::findWithinStream ( TID  inFeatureID,
TID  inDocumentID,
double  inDocumentFrequency 
) const

Is the Document with inDocumentID contained in the document frequency list of the feature inFeatureID and is the associated document frequency the same?

Parameters
inFeature<idthe
bool CAcIFFileSystem::generateInvertedFile ( )
virtual

Generating an inverted File, if there is none.

Fast but stupid in-memory method. This method is very fast, if all the inverted file (and a bit more) can be kept in memory at runtime. If this is not the case, extensive swapping is the result, virtually halting the inverted file creation.

Implements CAcInvertedFile.

list<TID>* CAcIFFileSystem::getAllFeatureIDs ( ) const
virtual

Getting a list of all features contained in this.

This function is necessary, because in the present system only about 50 percent of the features are really used.

A feature is considered used if it arises in mIDToOffset.

Implements CAcInvertedFile.

CDocumentFrequencyList* CAcIFFileSystem::getFeatureFile ( string  inFileName) const
protected

loads a *.fts file.

and returns the feature list

void CAcIFFileSystem::getRandomAccessorElements ( list< CAccessorElement > &  outResult,
list< CAccessorElement >::size_type  inSize 
) const
virtual

For drawing random sets.

Why is this part of an CAccessorImplementation? The way the accessor is organised might influence the way random sets can be drawn. At present everything happens in RAM, but we do not want to be fixed on that.

Parameters
inoutResultListthe list which will contain the result
inSizethe desired size of the inoutResultList

Implements CAccessor.

void CAcIFFileSystem::getRandomIDs ( list< TID > &  ,
list< TID >::size_type   
) const
virtual

get a given number of random C-AccessorElement-s

Parameters
inoutResultListthe list which will contain the result
inSizethe desired size of the inoutResultList

Implements CAccessor.

virtual pair<bool,CAccessorElement> CAcIFFileSystem::IDToAccessorElement ( TID  inID) const
virtual

Translate a DocumentID to an accessor Element.

Implements CAccessor.

bool CAcIFFileSystem::newGenerateInvertedFile ( )

Generating an inverted File, if there is none.

Employing the two-way-merge method described in "managing gigabytes", chapter 5.2. Sort-based inversion. (Page 181)

virtual pair<bool,TID> CAcIFFileSystem::URLToID ( const string &  inURL) const
virtual

Translate an URL to its document ID.

Implements CAcInvertedFile.

Member Data Documentation

CADIHash CAcIFFileSystem::mDocumentInformation
protected

additional information about the document like, e.g.

the euclidean length of the feature list.

CSelfDestroyPointer<CAcURL2FTS> CAcIFFileSystem::mURL2FTS
protected

In order to have just one parent, I have to limit on single inheritance.

I cannot use virtual base classes, because then I cannot downcast


The documentation for this class was generated from the following file:

Need for discussion? Want to contribute? Contact
help-gift@gnu.org Generated using Doxygen