DOWARC: a domain ontology for web archiving and web preservation

This is a domain ontology created to support the Semantic modelling of WARC files. RDF representations of web archiving data objects can play a critical role in addressing sustainable versioning practices in web archiving and web preservation.


Version
Current version, draft status; 2025-03-29
Creators
Pallotto Strickland, Manuela, Storrar, Tom


Classes 45 RDF Properties 0 Object Properties 2 Datatype Properties 0 Named Individuals 50

Classes

CrawlLog

https://github.com/DOWARC/dowarc#CrawlLog

has super-classes
Entity

Crawling

https://github.com/DOWARC/dowarc#Crawling

The act of creating an archive through the process of capturing data into a WARC file.

has super-classes
WARCevent

Derivative

https://github.com/DOWARC/dowarc#Derivative

An object created as a derivative of a web archiving data entity. During the derivation process, modification of the information contained in the original object is allowed; however, some of the original data must be maintained, at a representational and/or content level. An example of Derivative is a list of URLs extracted from a WARC file. This class must be used only when describing web archiving data object. Subclass of prov-o:Entity.

has super-classes
Entity
is in domain of
identifiedBy

Extracting

https://github.com/DOWARC/dowarc#Extracting

The act of extracting data from a WARC file and/or a CDX file.

has super-classes
WARCevent

HardwareAgent

https://github.com/DOWARC/dowarc#HardwareAgent

has super-classes
WebArchivingAgent

Identifier

https://github.com/DOWARC/dowarc#Identifier

An unambiguous reference (character string or numeric) to a data object.

has super-classes
Entity
is in range of
identifiedBy

Indexing

https://github.com/DOWARC/dowarc#Indexing

The creation of an index (CDX or CDXJ) file of a WARC file. Source: https://pywb.readthedocs.io/en/master/manual/indexing.html

has super-classes
WARCevent

OrganizationAgent

https://github.com/DOWARC/dowarc#OrganizationAgent

has super-classes
Organization
WebArchivingAgent

PersonAgent

https://github.com/DOWARC/dowarc#PersonAgent

has super-classes
Person
WebArchivingAgent

PreservationEvent

https://github.com/DOWARC/dowarc#PreservationEvent

An action undertaken, or an activity occurring, within or outside the repository that influence its capability to preserve web archiving data objects. Subclass of premis:Event

has super-classes
Event

PreservationObject

https://github.com/DOWARC/dowarc#PreservationObject

Web archiving data object subject to digital preservation. Subclass of premis:Object.

has super-classes
Object

Publishing

https://github.com/DOWARC/dowarc#Publishing

The act of making a WARC file, or resources within a WARC file, availavable for use

has super-classes
WARCevent

QA

https://github.com/DOWARC/dowarc#QA

The act of checking the quality of a web archive using a combination of data-driven and manual checks. This process includes the act of patching, during which time missing content is added to the archive to imprve its completeness or quality.

has super-classes
WARCevent

Replaying

https://github.com/DOWARC/dowarc#Replaying

Retrieval and rendering of the data stored in a WARC file over HTTP. This may be through a Wayback-style system via a browser, or programatically replayed in some other way.

has super-classes
WARCevent

SoftwareAgent

https://github.com/DOWARC/dowarc#SoftwareAgent

has super-classes
WebArchivingAgent

WARC-Block-Digest

https://github.com/DOWARC/dowarc#WARC-Block-Digest

An optional parameter indicating the algorithm name and calculated value of a digest applied to the full block of the record. An example is a SHA-1 labelled Base32 ([RFC3548]) value. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-Concurrent-To

https://github.com/DOWARC/dowarc#WARC-Concurrent-To

The WARC-Record-IDs of any records created as part of the same capture event as the current record. A capture event comprises the information automatically gathered by a retrieval against a single target-URI; for example, it might be represented by a "response"or "revisit"record plus its associated "request"record. This field may be used to associate records of types "request", "response", "resource", "metadata", and "revisit"with one another when they arise from a single capture event (When so used, any WARC Concurrent-To association shall be considered bidirectional even if the header only appears on one record.) As an exception to the general rule, it is allowed to repeat several WARC-Concurrent-To fields within the same WARC record. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-Date

https://github.com/DOWARC/dowarc#WARC-Date

A 14-digit UTC timestamp formatted according to YYYY-MM-DDThh:mm:ssZ, described in the W3C profile of ISO8601 [W3CDTF]. The timestamp shall represent the instant that data capture for record creation began. Multiple records written as part of a single capture event (see section 5.7) use the same WARC-Date, even though the times of their writing will not be exactly synchronized. All records have a WARC-Date field. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-Filename

https://github.com/DOWARC/dowarc#WARC-Filename

The filename containing the current "warcinfo"record.The WARC-Filename field might be used in "warcinfo"type records and is not used for other record types. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-IP-Address

https://github.com/DOWARC/dowarc#WARC-IP-Address

The numeric Internet address contacted to retrieve any included content. An IPv4 address is written as a "dotted quad"; an IPv6 address is written as per [RFC1884]. For an HTTP retrieval, this is the IP address used at retrieval time corresponding to the hostname in the record"s target-Uri. The WARC-IP-Address field is used on "response", "resource", "request", "metadata", and "revisit"records, but is not used on ‘warcinfo’, "conversion"or "continuation"records. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-Identified-Payload-Type

https://github.com/DOWARC/dowarc#WARC-Identified-Payload-Type

The content-type of the record"s payload as determined by an independent check. The WARC-Identified-Payload-Type field might be used on WARC records with a well-defined payload and is not used on records without a well-defined payload. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-Payload-Digest

https://github.com/DOWARC/dowarc#WARC-Payload-Digest

An optional parameter indicating the algorithm name and calculated value of a digest applied to the payload referred to or contained by the record - which is not necessarily equivalent to the record block. The payload of an application/http block is its ‘entity-body’ (per [RFC2616]). In contrast to WARC-Block-Digest, the WARC-Payload-Digest field may also be used for data not actually present in the current record block, for example when a block is left off in accordance with a ‘revisit’ profile (see ‘revisit’), or when a record is segmented (the WARC-Payload-Digest recorded in the first segment of a segmented record shall be the digest of the payload of the logical record). The WARC-Payload-Digest field might be used on WARC records with a well-defined payload and is not used on records without a well-defined payload.

has super-classes
WARCrecordNamedField

WARC-Profile

https://github.com/DOWARC/dowarc#WARC-Profile

A URI signifying the kind of analysis and handling applied in a "revisit"record. (Like an XML namespace, the URI may, but need not, return human-readable or machine-readable documentation.) If reading software does not recognize the given URI as a supported kind of handling, it does not attempt to interpret the associated record block.The section "revisit"defines two initial profile options for the WARC-Profile header for "revisit"records. The WARC-Profile field is mandatory on "revisit"type records and undefined for other record types.(Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-Record-ID

https://github.com/DOWARC/dowarc#WARC-Record-ID

An identifier assigned to the current record that is globally unique for its period of intended use. No identifier scheme is mandated by this specification, but each record-id shall be a legal URI and clearly indicate a documented and registered scheme to which it conforms (e.g., via a URI scheme prefix such as "http:"or "urn:"). All records have a WARC-Record-ID field. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-Refers-To

https://github.com/DOWARC/dowarc#WARC-Refers-To

The WARC-Record-ID of a single record for which the present record holds additional content. The WARC-Refers-To field might be used to associate a "metadata"record to another record it describes. The WARC-Refers-To field might also be used to associate a record of type "revisit"or "conversion"with the preceding record which helped determine the present record content. The WARC-Refers-To field is not used in "warcinfo", "response", ‘resource’, "request", and "continuation"records.(Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-Segment-Number

https://github.com/DOWARC/dowarc#WARC-Segment-Number

Reports the current record"s relative ordering in a sequence of segmented records. In the first segment of any record that is completed in one or more later "continuation"WARC records, this parameter is mandatory. Its value there is "1". In a "continuation"record, this parameter is also mandatory. Its value is the sequence number of the current segment in the logical whole record, increasing by 1 in each next segment. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-Segment-Origin-ID

https://github.com/DOWARC/dowarc#WARC-Segment-Origin-ID

Identifies the starting record in a series of segmented records whose content blocks are reassembled to obtain a logically complete content block. This field is mandatory on all "continuation"records, and is not used in other records.(Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-Segment-Total-Length

https://github.com/DOWARC/dowarc#WARC-Segment-Total-Length

In the final record of a segmented series, reports the total length of all segment content blocks when concatenated together. This field is mandatory on the last "continuation"record of a series, and is not used elsewhere. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-Target-URI

https://github.com/DOWARC/dowarc#WARC-Target-URI

The original URI whose capture gave rise to the information content in this record. In the context of web harvesting, this is the URI that was the target of a crawler’s retrieval request. For a ‘revisit’ record, it is the URI that was the target of a retrieval request. Indirectly, such as for a ‘metadata’, or ‘conversion’ record, it is a copy of the WARC-Target-URI appearing in the original record to which the newer record pertains. The URI in this value shall be properly escaped according to [RFC3986] and written with no internal whitespace. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-Truncated

https://github.com/DOWARC/dowarc#WARC-Truncated

For practical reasons, writers of the WARC format may place limits on the time or storage allocated to archiving a single resource. As a result, only a truncated portion of the original resource may be available for saving into a WARC record. Any record might indicate that truncation of its content block has occurred and give the reason with a "WARC-Truncated"field. For example, if the capture of what appeared to be a multi-gigabyte resource was cut short after a transfer time limit was reached, the partial resource could be saved to a WARC record with this field. The WARC-Truncated field might be used on any WARC record. The WARC field Content-Length still reports the actual truncated size of the record block. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-Type

https://github.com/DOWARC/dowarc#WARC-Type

The type of WARC record: one of "warcinfo", "response", "resource", "request", "metadata", "revisit","conversion", or "continuation". Other types of WARC records may be defined in extensions of the core format. Types are further described in WARC Record Types. All records have a WARC-Type field. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARC-Warcinfo-ID

https://github.com/DOWARC/dowarc#WARC-Warcinfo-ID

When present, indicates the WARC-Record-ID of the associated "warcinfo"record for this record. Typically, the Warcinfo-ID parameter is used when the context of the applicable "warcinfo"record is unavailable, such as after distributing single records into separate WARC files. WARC writing applications (such web crawlers) might choose to always record this parameter.The WARC-Warcinfo-ID field value overrides any association with a previously occurring (in the WARC) "warcinfo"record, thus providing a way to protect the true association when records are combined from different WARCs. The WARC-Warcinfo-ID field might be used in any record type except "warcinfo". (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField

WARCcdxField

https://github.com/DOWARC/dowarc#WARCcdxField

The fields contained within a CDX or CDXJ file. Source: https://iipc.github.io/warc-specifications/specifications/cdx-format/cdx-2015/

has super-classes
Entity

WARCcdxFile

https://github.com/DOWARC/dowarc#WARCcdxFile

An index (normally a CDX or CDXJ) file of a WARC file. Source: https://pywb.readthedocs.io/en/master/manual/indexing.html

has super-classes
Aggregation
Entity

WARCcoll

https://github.com/DOWARC/dowarc#WARCcoll

A collection of WARC records aggregating all the resources harvested by web crawling activities, which, taken together, provide an exhaustive representation of a live web resource (e.g., a web site, a web page). Subclass of ore:Aggregation.

has super-classes
Aggregation
Entity
is in domain of
identifiedBy
is in range of
identifies

WARCevent

https://github.com/DOWARC/dowarc#WARCevent

An activity which relates to or involves creating, indexing, modifying or replaying a WARC file.

has sub-classes
Crawling
Extracting
Indexing
Publishing
QA
Replaying
has super-classes
Activity

WARCfile

https://github.com/DOWARC/dowarc#WARCfile

A container file that aggregates a sequence of structured information chunks (WARC records), storing data and metadata. WARC files are the output of web crawling automated activities. A WARC file can contain one or more digital object with multiple media types (i.e., MIME). Subclass of ore:Aggregation.

has super-classes
Aggregation
Entity
is in domain of
identifiedBy
is in range of
identifies

WARCgraph

https://github.com/DOWARC/dowarc#WARCgraph

An RDF graph that represents and describes the relationships between the WARC file and the resources it aggregates, as well as the relationships between the aggregated resources. A WARC graph can be used to represent the relationship between the resources stored in a WARCcollection and the web page(s) or web site(s) from which the resources were harvested. In this respect, a WARCgraph can provide a representation of the live web object being archived. A WARCgraph can be a subgraph of another WARCgraph (hasPart/isPartOf).

has super-classes
ResourceMap
Graph
Entity
is in domain of
identifiedBy

WARCrecord

https://github.com/DOWARC/dowarc#WARCrecord

A WARC record is a component of a WARC file. A WARC record includes data retrieved from a source in the live Wide World Web, together with synthetic (meta)data providing information about the retrieved content. A WARC record has a header and a content block carrying the payload for the live web data object that was harvested over the HTTP. A WARC record is both an aggregated resource (component of a WARC file) and an aggregation (it includes sub-elements of its own).

has super-classes
AggregatedResource
Aggregation
Entity
is in domain of
identifiedBy
is in range of
identifies

WARCrecordContentBlock

https://github.com/DOWARC/dowarc#WARCrecordContentBlock

Part (zero or more octets) of a WARC record that follows the header and that forms the main body of a WARC record. (Source: https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.0/#terms-and-definitions)

has super-classes
WARCrecordElement

WARCrecordElement

https://github.com/DOWARC/dowarc#WARCrecordElement

A resource aggregated in a WARC record.

WARCrecordHeader

https://github.com/DOWARC/dowarc#WARCrecordHeader

Beginning of a WARC record, consisting of one first line declaring the record to be in the WARC format with a given version number, followed by lines of named fields up to a blank line. (Source: https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.0/#terms-and-definitions)

has super-classes
WARCrecordElement

WARCrecordNamedField

https://github.com/DOWARC/dowarc#WARCrecordNamedField

Set of elements consisting of a name, a colon, and a value, with long values continued on indented lines. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

WARCrecordPayload

https://github.com/DOWARC/dowarc#WARCrecordPayload

Data object referred to, or contained by a WARC record as a meaningful subset of the content block. (Source: https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.0/#terms-and-definitions)

has super-classes
WARCrecordElement

WebArchivingAgent

https://github.com/DOWARC/dowarc#WebArchivingAgent

Subclass of "foaf:Agent"and "prov:Agent", specified in a web archiving environment. "Class: foaf:Agent. An agent (eg. person, group, software or physical artifact)."(Source: http://xmlns.com/foaf/spec/#term_Agent) - "Class: prov:Agent. An agent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent"s activity."(Source: https://www.w3.org/TR/prov-o/#Agent)

Object Properties

identifiedBy

https://github.com/DOWARC/dowarc#identifiedBy

identifies

https://github.com/DOWARC/dowarc#identifies

Named Individuals

ArcDocumentLength

https://github.com/DOWARC/dowarc#ArcDocumentLength

CanonizedFrame

https://github.com/DOWARC/dowarc#CanonizedFrame

CanonizedHost

https://github.com/DOWARC/dowarc#CanonizedHost

CanonizedImage

https://github.com/DOWARC/dowarc#CanonizedImage

CanonizedJumpPoint

https://github.com/DOWARC/dowarc#CanonizedJumpPoint

CanonizedLink

https://github.com/DOWARC/dowarc#CanonizedLink

CanonizedPath

https://github.com/DOWARC/dowarc#CanonizedPath

CanonizedRedirect

https://github.com/DOWARC/dowarc#CanonizedRedirect

CanonizedUrl

https://github.com/DOWARC/dowarc#CanonizedUrl

Canonicalization is applied to URIs to remove trivial differences in the URIs that do not reflect that the URI reference different resources. Examples include removing session ID parameters, unneccessary port declerations (e.g. :80 when crawling HTTP). (Source: https://nlevitt.github.io/warc-specifications/specifications/cdx-format/openwayback-cdxj/#appendix-a--canonicalization)

CanonizedUrlFoundInScript

https://github.com/DOWARC/dowarc#CanonizedUrlFoundInScript

CanonizedUrlInOtherHrefTags

https://github.com/DOWARC/dowarc#CanonizedUrlInOtherHrefTags

CanonizedUrlInOtherSrcTags

https://github.com/DOWARC/dowarc#CanonizedUrlInOtherSrcTags

CdxFrame

https://github.com/DOWARC/dowarc#CdxFrame

CdxImage

https://github.com/DOWARC/dowarc#CdxImage

CdxLink

https://github.com/DOWARC/dowarc#CdxLink

CdxTitle

https://github.com/DOWARC/dowarc#CdxTitle

CompressedARCFileOffset

https://github.com/DOWARC/dowarc#CompressedARCFileOffset

The "offset"is the number of bytes from the start of the WARC when the relevant record begins. (Source: https://nlevitt.github.io/warc-specifications/specifications/cdx-format/openwayback-cdxj/)

CompressedDatFileOffset

https://github.com/DOWARC/dowarc#CompressedDatFileOffset

CompressedRecordSize

https://github.com/DOWARC/dowarc#CompressedRecordSize

The number of octets in the compressed record in the WARC file. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)

Content-TypeType

https://github.com/DOWARC/dowarc#Content-TypeType

Date

https://github.com/DOWARC/dowarc#Date

A timestamp for this record, stored as the 14 digits from the text form of a WARC::Date without the associated marker characters, i.e. YYYYmmddHHMMSS instead of YYYY-mm-ddTHH:MM:SSZ. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)

FileName

https://github.com/DOWARC/dowarc#FileName

The name of the WARC volume containing this record. This value is assumed to be relative to the directory containing the CDX file and to always be a Unix-style filename, regardless of the local file name conventions. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)

IP

https://github.com/DOWARC/dowarc#IP

LanguageString

https://github.com/DOWARC/dowarc#LanguageString

MassagedURL

https://github.com/DOWARC/dowarc#MassagedURL

The URL used in the request, as for the "a"field, but with the hostname translated to SURT form and the scheme component removed so that the value starts with the TLD. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)

MetaTagsAIF

https://github.com/DOWARC/dowarc#MetaTagsAIF

This field seems to be very common in CDX files, but no values have been observed and this field appears to be completely undocumented. Ignored on read, but written as "-"if included when building an index. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)

MimeType

https://github.com/DOWARC/dowarc#MimeType

The MIME type reported in the Content-Type header of the response. Written as "-"if the record does not contain an HTTP response with an entity body. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)

MimeTypeOfOriginalDocument

https://github.com/DOWARC/dowarc#MimeTypeOfOriginalDocument

Content-Type of the resource record - in alexa-made dat file (Source: https://archive.org/web/researcher/dat_file_format.php)

MultiColumnLanguageDescription

https://github.com/DOWARC/dowarc#MultiColumnLanguageDescription

NewStyleChecksum

https://github.com/DOWARC/dowarc#NewStyleChecksum

Typically the base32-encoded SHA1 digest of the response payload. This value is copied from the WARC-Payload-Digest header of a record if available, otherwise "-"is written. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)

NewsGroup

https://github.com/DOWARC/dowarc#NewsGroup

OldStyleChecksum

https://github.com/DOWARC/dowarc#OldStyleChecksum

OriginalHost

https://github.com/DOWARC/dowarc#OriginalHost

OriginalJumpPoint

https://github.com/DOWARC/dowarc#OriginalJumpPoint

OriginalPath

https://github.com/DOWARC/dowarc#OriginalPath

OriginalURL

https://github.com/DOWARC/dowarc#OriginalURL

The URL that was used in the request that produced this response. The url_prefix search key matches a prefix of this value. This value is copied from the WARC-Target-URI header and is written as "-"if the record does not have that header. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)

Port

https://github.com/DOWARC/dowarc#Port

Redirect

https://github.com/DOWARC/dowarc#Redirect

The contents of the Location header of the response, URL-escaped. Some implementations do not properly set this field. Written as "-"if not present or if the record does not contain an HTTP response. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)

ResponseCode

https://github.com/DOWARC/dowarc#ResponseCode

The HTTP status code used in the response. Written as "-"if the record does not contain an HTTP response. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)

RulespaceCategory

https://github.com/DOWARC/dowarc#RulespaceCategory

UncompressedArcFileOffset

https://github.com/DOWARC/dowarc#UncompressedArcFileOffset

UncompressedDatFileOffset

https://github.com/DOWARC/dowarc#UncompressedDatFileOffset

Uniqueness

https://github.com/DOWARC/dowarc#Uniqueness

UrlFoundInScript

https://github.com/DOWARC/dowarc#UrlFoundInScript

UrlInOtherHrefTages

https://github.com/DOWARC/dowarc#UrlInOtherHrefTages

UrlInOtherSrcTags

https://github.com/DOWARC/dowarc#UrlInOtherSrcTags

WARC-Payload-Digest

https://github.com/DOWARC/dowarc#WARC-Payload-Digest

An optional parameter indicating the algorithm name and calculated value of a digest applied to the payload referred to or contained by the record - which is not necessarily equivalent to the record block. The payload of an application/http block is its ‘entity-body’ (per [RFC2616]). In contrast to WARC-Block-Digest, the WARC-Payload-Digest field may also be used for data not actually present in the current record block, for example when a block is left off in accordance with a ‘revisit’ profile (see ‘revisit’), or when a record is segmented (the WARC-Payload-Digest recorded in the first segment of a segmented record shall be the digest of the payload of the logical record). The WARC-Payload-Digest field might be used on WARC records with a well-defined payload and is not used on records without a well-defined payload.

has super-classes
WARCrecordNamedField

WARC-Refers-To-Date

https://github.com/DOWARC/dowarc#WARC-Refers-To-Date

The WARC-Refers-To-Date field might be used in ‘revisit’ records and is not used in "warcinfo", "response", "metadata", "conversion", ‘resource’, "request", and "continuation"records. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

WARC-Refers-To-Target-URI

https://github.com/DOWARC/dowarc#WARC-Refers-To-Target-URI

The WARC-Refers-To-Target-URI field might be used in ‘revisit’ records and is not used in "warcinfo", "response", "metadata", "conversion", ‘resource’, "request", and "continuation"records. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

WARC-Target-URI

https://github.com/DOWARC/dowarc#WARC-Target-URI

The original URI whose capture gave rise to the information content in this record. In the context of web harvesting, this is the URI that was the target of a crawler’s retrieval request. For a ‘revisit’ record, it is the URI that was the target of a retrieval request. Indirectly, such as for a ‘metadata’, or ‘conversion’ record, it is a copy of the WARC-Target-URI appearing in the original record to which the newer record pertains. The URI in this value shall be properly escaped according to [RFC3986] and written with no internal whitespace. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)

has super-classes
WARCrecordNamedField