Class DataSource
- Direct Known Subclasses:
ByteArrayDataSource
,FileDataSource
,ProcessDataSource
,ResourceDataSource
,URLDataSource
As well as the ability to return a stream, a DataSource may also have a position, which corresponds to the 'ref' or 'frag' part of a URL (the bit after the #). This is an indication of a location in the stream; it is a string, and its interpretation is entirely up to the application (though may be specified by the documentation of specific DataSource subclasses).
As well as providing the facility for several different objects to
get their own copy of the underlying input stream, this class also
handles decompression of the stream.
Compression types are as understood by the associated Compression
class.
For efficiency, a buffer of the bytes at the start of the stream called the 'intro buffer' is recorded the first time that the stream is read. This can then be used for magic number queries cheaply, without having to open a new input stream. In the case that the whole input stream is shorter than the intro buffer, the underlying input stream never has to be read again.
Any implementation which implements getRawInputStream()
in such
a way as to return different byte sequences on different occasions
may lead to unpredictable behaviour from this class.
- Author:
- Mark Taylor (Starlink)
- See Also:
-
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionConstructs a DataSource with a default size of intro buffer.DataSource
(int introLimit) Constructs a DataSource with a given size of intro buffer. -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Closes any open streams owned and not yet dispatched by this DataSource.forceCompression
(Compression compress) Returns a DataSource representing the same underlying stream, but with a forced compression mode compress.Returns an object which will handle any required decompression for this stream.Returns an input stream which appears just the same as the one returned bygetInputStream()
, but only incurs the expense of obtaining an actual input stream (by callinggetRawInputStream()
if more bytes are read than the cached magic number.Returns an InputStream containing the whole of this DataSource.static InputStream
getInputStream
(String location, boolean allowSystem) Returns an input stream based on the given location string.byte[]
getIntro()
Returns the intro buffer, first reading it if this hasn't been done before.int
Returns the maximum length of the intro buffer.long
Returns the length of the stream returned by getInputStream in bytes, if known.static boolean
Returns true if we are working around potential bugs in InputStreamInputStream.mark(int)
/InputStream.reset()
methods (common, including in J2SE classes).getName()
Returns a name for this source.Returns the position associated with this source.protected abstract InputStream
Provides a new InputStream for this data source.long
Returns the length in bytes of the stream returned by getRawInputStream, if known.Returns a System ID for this DataSource; this is a string representation of a file name or URL, as used bySource
and friends.getURL()
Returns a URL which corresponds to this data source, if one exists.static DataSource
makeDataSource
(String loc) Attempts to make a source given a string identifying its location as a file, URL or system command output.static DataSource
makeDataSource
(String loc, boolean allowSystem) Attempts to make a source given a string identifying its location as a file, URL or optionally a system command output.static DataSource
makeDataSource
(URL url) Makes a source from a URL.void
setCompression
(Compression compress) Sets the compression to be associated with this data source.void
setIntroLimit
(int limit) Sets the maximum size of the intro buffer to a new value.static void
setMarkWorkaround
(boolean workaround) Sets whether we want to work around bugs in InputStream mark/reset methods.void
Sets the name of this source.void
setPosition
(String position) Sets the position associated with this source.toString()
Returns a short description of this source (name plus compression type).
-
Field Details
-
DEFAULT_INTRO_LIMIT
public static final int DEFAULT_INTRO_LIMIT- See Also:
-
MARK_WORKAROUND_PROPERTY
- See Also:
-
-
Constructor Details
-
DataSource
public DataSource(int introLimit) Constructs a DataSource with a given size of intro buffer.- Parameters:
introLimit
- the maximum number of bytes in the intro buffer
-
DataSource
public DataSource()Constructs a DataSource with a default size of intro buffer.
-
-
Method Details
-
getRawInputStream
Provides a new InputStream for this data source. This method should be implemented by subclasses to provide a new InputStream giving the raw content of the source each time it is called. The general contract of this method is that each time it is called it will return a stream with the same content.- Returns:
- an InputStream containing the data of this source
- Throws:
IOException
-
getURL
Returns a URL which corresponds to this data source, if one exists. AnURL.openConnection()
method call on the URL returned by this method should provide a stream with the same content as thegetRawInputStream()
method of this data source. If no such URL exists or is known, then null should be returned.If this source has a non-null position value, it will be appended to the main part of the URL after a '#' character (as the URL's ref part).
- Returns:
- a URL corresponding to this source, or null
-
getIntroLimit
public int getIntroLimit()Returns the maximum length of the intro buffer.- Returns:
- maximum length of the intro buffer
-
setIntroLimit
public void setIntroLimit(int limit) Sets the maximum size of the intro buffer to a new value. Setting the intro limit to a new value will discard any state which this source has, so for reasons of efficiency it's not a good idea to call this method except immediately after the source has been constructed and before any reads have taken place.- Parameters:
limit
- the new maximum length of the intro buffer
-
getRawLength
public long getRawLength()Returns the length in bytes of the stream returned by getRawInputStream, if known. If the length is not known then -1 should be returned. The implementation of this method in DataSource returns -1; subclasses should override it if they can determine their length.- Returns:
- the length of the raw input stream, or -1
-
getLength
public long getLength()Returns the length of the stream returned by getInputStream in bytes, if known. A return value of -1 indicates that the length is unknown. The return value of this method may change from -1 to a positive value during the life of this object if it happens to work out how long it is.- Returns:
- the length of the stream in bytes, or -1
-
getName
Returns a name for this source. This name is mainly intended as a label identifying the source for use in informational messages; it is not in general intended to be used to provide an absolute reference to the source. Thus, for instance, if the source references a file, its name might be a relative pathname or simple filename, rather than its absolute pathname. To identify the source absolutely, thegetURL()
method (or some suitable class-specific method) should be used. If this source has a position, it should probably form part of this name.- Returns:
- a name
-
setName
Sets the name of this source.- Parameters:
name
- a name- See Also:
-
getPosition
Returns the position associated with this source. It is a string giving an indication of the part of the stream which is of interest. Its interpretation is up to the application.- Returns:
- the position string, or null
-
setPosition
Sets the position associated with this source. It is a string giving an indication of the part of the stream which is of interest. Its interpretation is up to the application.- Parameters:
position
- the new posisition (may be null)
-
getSystemId
Returns a System ID for this DataSource; this is a string representation of a file name or URL, as used bySource
and friends. The return value may be null if none is known. This does not contain any reference to the position.- Returns:
- the System ID string for this source, or null
-
getCompression
Returns an object which will handle any required decompression for this stream. A raw data stream is read and its magic number (first few bytes) matched against known patterns to determine if any known compression method is in use. If no known compression is being used, the value Compression.NONE is returned.- Returns:
- a Compression object encoding this stream
- Throws:
IOException
-
getIntro
Returns the intro buffer, first reading it if this hasn't been done before. The intro buffer will contain the first few bytes of the decompressed stream. The number of bytes it contains (the size of the returned byte[] array) will be the smaller of introLimit and the length of the underlying uncompressed stream.The returned buffer is the original not a copy - don't change its contents!
- Returns:
- the first few bytes of the uncompressed stream, up to a limit of introLimit
- Throws:
IOException
-
setCompression
Sets the compression to be associated with this data source. In general it will not be necessary or advisable to call this method, since this object will figure it out using magic numbers of the underlying stream. It can be used if the compression method is known, or to force use of a particular compression; in particular setCompression(Compression.NONE) can be used to force direct examination of the underlying stream without decompression, even if the underlying stream is in fact compressed.The effects of setting a compression to a mode (other than NONE) which does not match the actual compression mode of the underlying stream are undefined, so this method should be used with care.
- Parameters:
compress
- the compression mode encoding the underlying stream
-
forceCompression
Returns a DataSource representing the same underlying stream, but with a forced compression mode compress. The returned DataSource object may be the same object as this one, but if it has a different compression mode from compress a new one will be created. As withsetCompression(uk.ac.starlink.util.Compression)
, the consequences of using a different value of compress than the correct one (other thanCompression.NONE
are unpredictable.- Parameters:
compress
- the compression mode to be used for the returned data source- Returns:
- a data source with the same underlying stream as this, but a compression mode given by compress
-
getInputStream
Returns an InputStream containing the whole of this DataSource. If compression is detected in the underlying stream, it will be decompressed. The returned stream should be closed by the user when no longer required.- Returns:
- an input stream that reads from the beginning of the underlying data source, decompressing it if appropriate
- Throws:
IOException
-
getHybridInputStream
Returns an input stream which appears just the same as the one returned bygetInputStream()
, but only incurs the expense of obtaining an actual input stream (by callinggetRawInputStream()
if more bytes are read than the cached magic number. This is an efficient way to read if you need an InputStream but may only end up reading the first few bytes of it.- Returns:
- an input stream that reads from the beginning of the underlying data source, decompressing it if appropriate
- Throws:
IOException
-
close
public void close()Closes any open streams owned and not yet dispatched by this DataSource. Should be called if this object is no longer required, or if it may not be required for some while. Calling this method does not prevent any other method being called on this object in the future. This method throws no checked exceptions; any IOException thrown during closing any owned streams are simply discarded. -
toString
Returns a short description of this source (name plus compression type). -
makeDataSource
Attempts to make a source given a string identifying its location as a file, URL or system command output. This may be one of the following options:- filename
- URL
- a string preceded by "<" or followed by "|", giving a shell command line (may not work on all platforms)
If a '#' character exists in the string, text after it will be interpreted as a position value. Otherwise, the position is considered to be null.
Note: this method presents a security risk if the
loc
string is vulnerable to injection. Consider using the variant methodmakeDataSource
(loc,false) in such cases. This method just callsmakeDataSource(loc,true)
.- Parameters:
loc
- the location of the data, with optional position- Returns:
- a DataSource based on the data at loc
- Throws:
IOException
- if loc does not name an existing readable file or valid URL
-
makeDataSource
Attempts to make a source given a string identifying its location as a file, URL or optionally a system command output.The supplied
loc
may be one of the following:- filename
- URL
- only if
allowSystem=true
: a string preceded by "<" or followed by "|", giving a shell command line (may not work on all platforms)
If a '#' character exists in the string, text after it will be interpreted as a position value. Otherwise, the position is considered to be null.
Note: setting
allowSystem=true
may introduce a security risk if theloc
string is vulnerable to injection.- Parameters:
loc
- the location of the data, with optional positionallowSystem
- whether to allow system commands using the format above- Returns:
- a DataSource based on the data at loc
- Throws:
IOException
- if loc does not name an existing readable file or valid URL
-
makeDataSource
Makes a source from a URL. If url is a file-protocol URL referencing an existing file then a FileDataSource will be returned, otherwise it will be a URLDataSource. Under certain circumstances, it may be more efficient to use a FileDataSource than a URLDataSource, which is why this method may be worth using.- Parameters:
url
- location of the data stream- Returns:
- data source which returns the data at url
-
getInputStream
Returns an input stream based on the given location string. The content of the stream may be compressed or uncompressed data; the returned stream will be an uncompressed version. The following options are allowed for the location:- filename
- URL
- "-" meaning standard input
- only if
allowSystem=true
: a string preceded by "<" or followed by "|", giving a shell command line (may not work on all platforms)
Note: setting
allowSystem=true
may introduce a security risk if theloc
string is vulnerable to injection.- Parameters:
location
- URL, filename, "cmdline|"/"<cmdline", or "-"allowSystem
- whether to allow system commands using the format above- Returns:
- uncompressed stream containing the data at location
- Throws:
FileNotFoundException
- if location cannot be interpreted as a source of bytesIOException
- if there is an error obtaining the stream
-
getMarkWorkaround
public static boolean getMarkWorkaround()Returns true if we are working around potential bugs in InputStreamInputStream.mark(int)
/InputStream.reset()
methods (common, including in J2SE classes). The return value is dependent on the system property namedMARK_WORKAROUND_PROPERTY
.- Returns:
- true iff we are working around mark/reset bugs
-
setMarkWorkaround
public static void setMarkWorkaround(boolean workaround) Sets whether we want to work around bugs in InputStream mark/reset methods.- Parameters:
workaround
- true to employ the workaround
-