DOWARC: a domain ontology for web archiving and web preservation
This is a domain ontology created to support the Semantic modelling of WARC files. RDF representations of web archiving data objects can play a critical role in addressing sustainable versioning practices in web archiving and web preservation.
- Version
- Current version, draft status; 2025-03-29
- Creators
- Pallotto Strickland, Manuela, Storrar, Tom
Classes 45 RDF Properties 0 Object Properties 2 Datatype Properties 0 Named Individuals 50
Classes
CrawlLog
https://github.com/DOWARC/dowarc#CrawlLog
Crawling
https://github.com/DOWARC/dowarc#Crawling
The act of creating an archive through the process of capturing data into a WARC file.
- has super-classes
- WARCevent
Derivative
https://github.com/DOWARC/dowarc#Derivative
An object created as a derivative of a web archiving data entity. During the derivation process, modification of the information contained in the original object is allowed; however, some of the original data must be maintained, at a representational and/or content level. An example of Derivative is a list of URLs extracted from a WARC file. This class must be used only when describing web archiving data object. Subclass of prov-o:Entity.
- has super-classes
- Entity
- is in domain of
- identifiedBy
Extracting
https://github.com/DOWARC/dowarc#Extracting
HardwareAgent
https://github.com/DOWARC/dowarc#HardwareAgent
Identifier
https://github.com/DOWARC/dowarc#Identifier
An unambiguous reference (character string or numeric) to a data object.
- has super-classes
- Entity
- is in range of
- identifiedBy
Indexing
https://github.com/DOWARC/dowarc#Indexing
The creation of an index (CDX or CDXJ) file of a WARC file. Source: https://pywb.readthedocs.io/en/master/manual/indexing.html
- has super-classes
- WARCevent
OrganizationAgent
https://github.com/DOWARC/dowarc#OrganizationAgent
PersonAgent
https://github.com/DOWARC/dowarc#PersonAgent
PreservationEvent
https://github.com/DOWARC/dowarc#PreservationEvent
An action undertaken, or an activity occurring, within or outside the repository that influence its capability to preserve web archiving data objects. Subclass of premis:Event
- has super-classes
- Event
PreservationObject
https://github.com/DOWARC/dowarc#PreservationObject
Web archiving data object subject to digital preservation. Subclass of premis:Object.
- has super-classes
- Object
Publishing
https://github.com/DOWARC/dowarc#Publishing
The act of making a WARC file, or resources within a WARC file, availavable for use
- has super-classes
- WARCevent
QA
https://github.com/DOWARC/dowarc#QA
The act of checking the quality of a web archive using a combination of data-driven and manual checks. This process includes the act of patching, during which time missing content is added to the archive to imprve its completeness or quality.
- has super-classes
- WARCevent
Replaying
https://github.com/DOWARC/dowarc#Replaying
Retrieval and rendering of the data stored in a WARC file over HTTP. This may be through a Wayback-style system via a browser, or programatically replayed in some other way.
- has super-classes
- WARCevent
SoftwareAgent
https://github.com/DOWARC/dowarc#SoftwareAgent
WARC-Block-Digest
https://github.com/DOWARC/dowarc#WARC-Block-Digest
An optional parameter indicating the algorithm name and calculated value of a digest applied to the full block of the record. An example is a SHA-1 labelled Base32 ([RFC3548]) value. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-Concurrent-To
https://github.com/DOWARC/dowarc#WARC-Concurrent-To
The WARC-Record-IDs of any records created as part of the same capture event as the current record. A capture event comprises the information automatically gathered by a retrieval against a single target-URI; for example, it might be represented by a "response"or "revisit"record plus its associated "request"record. This field may be used to associate records of types "request", "response", "resource", "metadata", and "revisit"with one another when they arise from a single capture event (When so used, any WARC Concurrent-To association shall be considered bidirectional even if the header only appears on one record.) As an exception to the general rule, it is allowed to repeat several WARC-Concurrent-To fields within the same WARC record. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-Date
https://github.com/DOWARC/dowarc#WARC-Date
A 14-digit UTC timestamp formatted according to YYYY-MM-DDThh:mm:ssZ, described in the W3C profile of ISO8601 [W3CDTF]. The timestamp shall represent the instant that data capture for record creation began. Multiple records written as part of a single capture event (see section 5.7) use the same WARC-Date, even though the times of their writing will not be exactly synchronized. All records have a WARC-Date field. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-Filename
https://github.com/DOWARC/dowarc#WARC-Filename
The filename containing the current "warcinfo"record.The WARC-Filename field might be used in "warcinfo"type records and is not used for other record types. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-IP-Address
https://github.com/DOWARC/dowarc#WARC-IP-Address
The numeric Internet address contacted to retrieve any included content. An IPv4 address is written as a "dotted quad"; an IPv6 address is written as per [RFC1884]. For an HTTP retrieval, this is the IP address used at retrieval time corresponding to the hostname in the record"s target-Uri. The WARC-IP-Address field is used on "response", "resource", "request", "metadata", and "revisit"records, but is not used on ‘warcinfo’, "conversion"or "continuation"records. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-Identified-Payload-Type
https://github.com/DOWARC/dowarc#WARC-Identified-Payload-Type
The content-type of the record"s payload as determined by an independent check. The WARC-Identified-Payload-Type field might be used on WARC records with a well-defined payload and is not used on records without a well-defined payload. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-Payload-Digest
https://github.com/DOWARC/dowarc#WARC-Payload-Digest
An optional parameter indicating the algorithm name and calculated value of a digest applied to the payload referred to or contained by the record - which is not necessarily equivalent to the record block. The payload of an application/http block is its ‘entity-body’ (per [RFC2616]). In contrast to WARC-Block-Digest, the WARC-Payload-Digest field may also be used for data not actually present in the current record block, for example when a block is left off in accordance with a ‘revisit’ profile (see ‘revisit’), or when a record is segmented (the WARC-Payload-Digest recorded in the first segment of a segmented record shall be the digest of the payload of the logical record). The WARC-Payload-Digest field might be used on WARC records with a well-defined payload and is not used on records without a well-defined payload.
- has super-classes
- WARCrecordNamedField
WARC-Profile
https://github.com/DOWARC/dowarc#WARC-Profile
A URI signifying the kind of analysis and handling applied in a "revisit"record. (Like an XML namespace, the URI may, but need not, return human-readable or machine-readable documentation.) If reading software does not recognize the given URI as a supported kind of handling, it does not attempt to interpret the associated record block.The section "revisit"defines two initial profile options for the WARC-Profile header for "revisit"records. The WARC-Profile field is mandatory on "revisit"type records and undefined for other record types.(Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-Record-ID
https://github.com/DOWARC/dowarc#WARC-Record-ID
An identifier assigned to the current record that is globally unique for its period of intended use. No identifier scheme is mandated by this specification, but each record-id shall be a legal URI and clearly indicate a documented and registered scheme to which it conforms (e.g., via a URI scheme prefix such as "http:"or "urn:"). All records have a WARC-Record-ID field. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-Refers-To
https://github.com/DOWARC/dowarc#WARC-Refers-To
The WARC-Record-ID of a single record for which the present record holds additional content. The WARC-Refers-To field might be used to associate a "metadata"record to another record it describes. The WARC-Refers-To field might also be used to associate a record of type "revisit"or "conversion"with the preceding record which helped determine the present record content. The WARC-Refers-To field is not used in "warcinfo", "response", ‘resource’, "request", and "continuation"records.(Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-Segment-Number
https://github.com/DOWARC/dowarc#WARC-Segment-Number
Reports the current record"s relative ordering in a sequence of segmented records. In the first segment of any record that is completed in one or more later "continuation"WARC records, this parameter is mandatory. Its value there is "1". In a "continuation"record, this parameter is also mandatory. Its value is the sequence number of the current segment in the logical whole record, increasing by 1 in each next segment. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-Segment-Origin-ID
https://github.com/DOWARC/dowarc#WARC-Segment-Origin-ID
Identifies the starting record in a series of segmented records whose content blocks are reassembled to obtain a logically complete content block. This field is mandatory on all "continuation"records, and is not used in other records.(Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-Segment-Total-Length
https://github.com/DOWARC/dowarc#WARC-Segment-Total-Length
In the final record of a segmented series, reports the total length of all segment content blocks when concatenated together. This field is mandatory on the last "continuation"record of a series, and is not used elsewhere. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-Target-URI
https://github.com/DOWARC/dowarc#WARC-Target-URI
The original URI whose capture gave rise to the information content in this record. In the context of web harvesting, this is the URI that was the target of a crawler’s retrieval request. For a ‘revisit’ record, it is the URI that was the target of a retrieval request. Indirectly, such as for a ‘metadata’, or ‘conversion’ record, it is a copy of the WARC-Target-URI appearing in the original record to which the newer record pertains. The URI in this value shall be properly escaped according to [RFC3986] and written with no internal whitespace. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-Truncated
https://github.com/DOWARC/dowarc#WARC-Truncated
For practical reasons, writers of the WARC format may place limits on the time or storage allocated to archiving a single resource. As a result, only a truncated portion of the original resource may be available for saving into a WARC record. Any record might indicate that truncation of its content block has occurred and give the reason with a "WARC-Truncated"field. For example, if the capture of what appeared to be a multi-gigabyte resource was cut short after a transfer time limit was reached, the partial resource could be saved to a WARC record with this field. The WARC-Truncated field might be used on any WARC record. The WARC field Content-Length still reports the actual truncated size of the record block. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-Type
https://github.com/DOWARC/dowarc#WARC-Type
The type of WARC record: one of "warcinfo", "response", "resource", "request", "metadata", "revisit","conversion", or "continuation". Other types of WARC records may be defined in extensions of the core format. Types are further described in WARC Record Types. All records have a WARC-Type field. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARC-Warcinfo-ID
https://github.com/DOWARC/dowarc#WARC-Warcinfo-ID
When present, indicates the WARC-Record-ID of the associated "warcinfo"record for this record. Typically, the Warcinfo-ID parameter is used when the context of the applicable "warcinfo"record is unavailable, such as after distributing single records into separate WARC files. WARC writing applications (such web crawlers) might choose to always record this parameter.The WARC-Warcinfo-ID field value overrides any association with a previously occurring (in the WARC) "warcinfo"record, thus providing a way to protect the true association when records are combined from different WARCs. The WARC-Warcinfo-ID field might be used in any record type except "warcinfo". (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField
WARCcdxField
https://github.com/DOWARC/dowarc#WARCcdxField
The fields contained within a CDX or CDXJ file. Source: https://iipc.github.io/warc-specifications/specifications/cdx-format/cdx-2015/
- has super-classes
- Entity
WARCcdxFile
https://github.com/DOWARC/dowarc#WARCcdxFile
An index (normally a CDX or CDXJ) file of a WARC file. Source: https://pywb.readthedocs.io/en/master/manual/indexing.html
- has super-classes
- Aggregation
- Entity
WARCcoll
https://github.com/DOWARC/dowarc#WARCcoll
A collection of WARC records aggregating all the resources harvested by web crawling activities, which, taken together, provide an exhaustive representation of a live web resource (e.g., a web site, a web page). Subclass of ore:Aggregation.
- has super-classes
- Aggregation
- Entity
- is in domain of
- identifiedBy
- is in range of
- identifies
WARCevent
https://github.com/DOWARC/dowarc#WARCevent
An activity which relates to or involves creating, indexing, modifying or replaying a WARC file.
- has sub-classes
- Crawling
- Extracting
- Indexing
- Publishing
- QA
- Replaying
- has super-classes
- Activity
WARCfile
https://github.com/DOWARC/dowarc#WARCfile
A container file that aggregates a sequence of structured information chunks (WARC records), storing data and metadata. WARC files are the output of web crawling automated activities. A WARC file can contain one or more digital object with multiple media types (i.e., MIME). Subclass of ore:Aggregation.
- has super-classes
- Aggregation
- Entity
- is in domain of
- identifiedBy
- is in range of
- identifies
WARCgraph
https://github.com/DOWARC/dowarc#WARCgraph
An RDF graph that represents and describes the relationships between the WARC file and the resources it aggregates, as well as the relationships between the aggregated resources. A WARC graph can be used to represent the relationship between the resources stored in a WARCcollection and the web page(s) or web site(s) from which the resources were harvested. In this respect, a WARCgraph can provide a representation of the live web object being archived. A WARCgraph can be a subgraph of another WARCgraph (hasPart/isPartOf).
- has super-classes
- ResourceMap
- Graph
- Entity
- is in domain of
- identifiedBy
WARCrecord
https://github.com/DOWARC/dowarc#WARCrecord
A WARC record is a component of a WARC file. A WARC record includes data retrieved from a source in the live Wide World Web, together with synthetic (meta)data providing information about the retrieved content. A WARC record has a header and a content block carrying the payload for the live web data object that was harvested over the HTTP. A WARC record is both an aggregated resource (component of a WARC file) and an aggregation (it includes sub-elements of its own).
- has super-classes
- AggregatedResource
- Aggregation
- Entity
- is in domain of
- identifiedBy
- is in range of
- identifies
WARCrecordContentBlock
https://github.com/DOWARC/dowarc#WARCrecordContentBlock
Part (zero or more octets) of a WARC record that follows the header and that forms the main body of a WARC record. (Source: https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.0/#terms-and-definitions)
- has super-classes
- WARCrecordElement
WARCrecordElement
https://github.com/DOWARC/dowarc#WARCrecordElement
A resource aggregated in a WARC record.
- has sub-classes
- WARCrecordContentBlock
- WARCrecordHeader
- WARCrecordNamedField
- WARCrecordPayload
- has super-classes
- AggregatedResource
- Entity
- is in domain of
- identifiedBy
- is in range of
- identifies
WARCrecordHeader
https://github.com/DOWARC/dowarc#WARCrecordHeader
Beginning of a WARC record, consisting of one first line declaring the record to be in the WARC format with a given version number, followed by lines of named fields up to a blank line. (Source: https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.0/#terms-and-definitions)
- has super-classes
- WARCrecordElement
WARCrecordNamedField
https://github.com/DOWARC/dowarc#WARCrecordNamedField
Set of elements consisting of a name, a colon, and a value, with long values continued on indented lines. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has sub-classes
- WARC-Block-Digest
- WARC-Concurrent-To
- WARC-Date
- WARC-Filename
- WARC-IP-Address
- WARC-Identified-Payload-Type
- WARC-Payload-Digest
- WARC-Profile
- WARC-Record-ID
- WARC-Refers-To
- WARC-Segment-Number
- WARC-Segment-Origin-ID
- WARC-Segment-Total-Length
- WARC-Target-URI
- WARC-Truncated
- WARC-Type
- WARC-Warcinfo-ID
- has super-classes
- WARCrecordElement
WARCrecordPayload
https://github.com/DOWARC/dowarc#WARCrecordPayload
Data object referred to, or contained by a WARC record as a meaningful subset of the content block. (Source: https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.0/#terms-and-definitions)
- has super-classes
- WARCrecordElement
WebArchivingAgent
https://github.com/DOWARC/dowarc#WebArchivingAgent
Subclass of "foaf:Agent"and "prov:Agent", specified in a web archiving environment. "Class: foaf:Agent. An agent (eg. person, group, software or physical artifact)."(Source: http://xmlns.com/foaf/spec/#term_Agent) - "Class: prov:Agent. An agent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent"s activity."(Source: https://www.w3.org/TR/prov-o/#Agent)
- has sub-classes
- HardwareAgent
- OrganizationAgent
- PersonAgent
- SoftwareAgent
- has super-classes
- Agent
- Agent
Object Properties
identifiedBy
https://github.com/DOWARC/dowarc#identifiedBy
- has domain
- Agent
- Derivative
- WARCcoll
- WARCfile
- WARCgraph
- WARCrecord
- WARCrecordElement
- has range
- Identifier
identifies
https://github.com/DOWARC/dowarc#identifies
Named Individuals
ArcDocumentLength
https://github.com/DOWARC/dowarc#ArcDocumentLength
CanonizedFrame
https://github.com/DOWARC/dowarc#CanonizedFrame
CanonizedHost
https://github.com/DOWARC/dowarc#CanonizedHost
CanonizedImage
https://github.com/DOWARC/dowarc#CanonizedImage
CanonizedJumpPoint
https://github.com/DOWARC/dowarc#CanonizedJumpPoint
CanonizedLink
https://github.com/DOWARC/dowarc#CanonizedLink
CanonizedPath
https://github.com/DOWARC/dowarc#CanonizedPath
CanonizedRedirect
https://github.com/DOWARC/dowarc#CanonizedRedirect
CanonizedUrl
https://github.com/DOWARC/dowarc#CanonizedUrl
Canonicalization is applied to URIs to remove trivial differences in the URIs that do not reflect that the URI reference different resources. Examples include removing session ID parameters, unneccessary port declerations (e.g. :80 when crawling HTTP). (Source: https://nlevitt.github.io/warc-specifications/specifications/cdx-format/openwayback-cdxj/#appendix-a--canonicalization)
CanonizedUrlFoundInScript
https://github.com/DOWARC/dowarc#CanonizedUrlFoundInScript
CanonizedUrlInOtherHrefTags
https://github.com/DOWARC/dowarc#CanonizedUrlInOtherHrefTags
CanonizedUrlInOtherSrcTags
https://github.com/DOWARC/dowarc#CanonizedUrlInOtherSrcTags
CdxFrame
https://github.com/DOWARC/dowarc#CdxFrame
CdxImage
https://github.com/DOWARC/dowarc#CdxImage
CdxLink
https://github.com/DOWARC/dowarc#CdxLink
CdxTitle
https://github.com/DOWARC/dowarc#CdxTitle
CompressedARCFileOffset
https://github.com/DOWARC/dowarc#CompressedARCFileOffset
The "offset"is the number of bytes from the start of the WARC when the relevant record begins. (Source: https://nlevitt.github.io/warc-specifications/specifications/cdx-format/openwayback-cdxj/)
CompressedDatFileOffset
https://github.com/DOWARC/dowarc#CompressedDatFileOffset
CompressedRecordSize
https://github.com/DOWARC/dowarc#CompressedRecordSize
The number of octets in the compressed record in the WARC file. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)
Content-TypeType
https://github.com/DOWARC/dowarc#Content-TypeType
Date
https://github.com/DOWARC/dowarc#Date
A timestamp for this record, stored as the 14 digits from the text form of a WARC::Date without the associated marker characters, i.e. YYYYmmddHHMMSS instead of YYYY-mm-ddTHH:MM:SSZ. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)
FileName
https://github.com/DOWARC/dowarc#FileName
The name of the WARC volume containing this record. This value is assumed to be relative to the directory containing the CDX file and to always be a Unix-style filename, regardless of the local file name conventions. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)
IP
https://github.com/DOWARC/dowarc#IP
LanguageString
https://github.com/DOWARC/dowarc#LanguageString
MassagedURL
https://github.com/DOWARC/dowarc#MassagedURL
The URL used in the request, as for the "a"field, but with the hostname translated to SURT form and the scheme component removed so that the value starts with the TLD. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)
MetaTagsAIF
https://github.com/DOWARC/dowarc#MetaTagsAIF
This field seems to be very common in CDX files, but no values have been observed and this field appears to be completely undocumented. Ignored on read, but written as "-"if included when building an index. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)
MimeType
https://github.com/DOWARC/dowarc#MimeType
The MIME type reported in the Content-Type header of the response. Written as "-"if the record does not contain an HTTP response with an entity body. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)
MimeTypeOfOriginalDocument
https://github.com/DOWARC/dowarc#MimeTypeOfOriginalDocument
Content-Type of the resource record - in alexa-made dat file (Source: https://archive.org/web/researcher/dat_file_format.php)
MultiColumnLanguageDescription
https://github.com/DOWARC/dowarc#MultiColumnLanguageDescription
NewStyleChecksum
https://github.com/DOWARC/dowarc#NewStyleChecksum
Typically the base32-encoded SHA1 digest of the response payload. This value is copied from the WARC-Payload-Digest header of a record if available, otherwise "-"is written. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)
NewsGroup
https://github.com/DOWARC/dowarc#NewsGroup
OldStyleChecksum
https://github.com/DOWARC/dowarc#OldStyleChecksum
OriginalHost
https://github.com/DOWARC/dowarc#OriginalHost
OriginalJumpPoint
https://github.com/DOWARC/dowarc#OriginalJumpPoint
OriginalPath
https://github.com/DOWARC/dowarc#OriginalPath
OriginalURL
https://github.com/DOWARC/dowarc#OriginalURL
The URL that was used in the request that produced this response. The url_prefix search key matches a prefix of this value. This value is copied from the WARC-Target-URI header and is written as "-"if the record does not have that header. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)
Port
https://github.com/DOWARC/dowarc#Port
Redirect
https://github.com/DOWARC/dowarc#Redirect
The contents of the Location header of the response, URL-escaped. Some implementations do not properly set this field. Written as "-"if not present or if the record does not contain an HTTP response. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)
ResponseCode
https://github.com/DOWARC/dowarc#ResponseCode
The HTTP status code used in the response. Written as "-"if the record does not contain an HTTP response. (Source: https://metacpan.org/pod/WARC::Index::File::CDX)
RulespaceCategory
https://github.com/DOWARC/dowarc#RulespaceCategory
UncompressedArcFileOffset
https://github.com/DOWARC/dowarc#UncompressedArcFileOffset
UncompressedDatFileOffset
https://github.com/DOWARC/dowarc#UncompressedDatFileOffset
Uniqueness
https://github.com/DOWARC/dowarc#Uniqueness
UrlFoundInScript
https://github.com/DOWARC/dowarc#UrlFoundInScript
UrlInOtherHrefTages
https://github.com/DOWARC/dowarc#UrlInOtherHrefTages
UrlInOtherSrcTags
https://github.com/DOWARC/dowarc#UrlInOtherSrcTags
WARC-Payload-Digest
https://github.com/DOWARC/dowarc#WARC-Payload-Digest
An optional parameter indicating the algorithm name and calculated value of a digest applied to the payload referred to or contained by the record - which is not necessarily equivalent to the record block. The payload of an application/http block is its ‘entity-body’ (per [RFC2616]). In contrast to WARC-Block-Digest, the WARC-Payload-Digest field may also be used for data not actually present in the current record block, for example when a block is left off in accordance with a ‘revisit’ profile (see ‘revisit’), or when a record is segmented (the WARC-Payload-Digest recorded in the first segment of a segmented record shall be the digest of the payload of the logical record). The WARC-Payload-Digest field might be used on WARC records with a well-defined payload and is not used on records without a well-defined payload.
- has super-classes
- WARCrecordNamedField
WARC-Refers-To-Date
https://github.com/DOWARC/dowarc#WARC-Refers-To-Date
The WARC-Refers-To-Date field might be used in ‘revisit’ records and is not used in "warcinfo", "response", "metadata", "conversion", ‘resource’, "request", and "continuation"records. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
WARC-Refers-To-Target-URI
https://github.com/DOWARC/dowarc#WARC-Refers-To-Target-URI
The WARC-Refers-To-Target-URI field might be used in ‘revisit’ records and is not used in "warcinfo", "response", "metadata", "conversion", ‘resource’, "request", and "continuation"records. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
WARC-Target-URI
https://github.com/DOWARC/dowarc#WARC-Target-URI
The original URI whose capture gave rise to the information content in this record. In the context of web harvesting, this is the URI that was the target of a crawler’s retrieval request. For a ‘revisit’ record, it is the URI that was the target of a retrieval request. Indirectly, such as for a ‘metadata’, or ‘conversion’ record, it is a copy of the WARC-Target-URI appearing in the original record to which the newer record pertains. The URI in this value shall be properly escaped according to [RFC3986] and written with no internal whitespace. (Source: ISO 28500; https://iipc.github.io/warc-specifications/)
- has super-classes
- WARCrecordNamedField