class: center, middle, inverse, title-slide # Processing MARC ## … with open source tools ### Johann Rolschewski ### ELAG 2022 --- ## Links Slides: https://jorol.github.io/2022-elag/slides Exercises: https://jorol.github.io/2022-elag/#/Exercises Files: https://jorol.github.io/2022-elag/files/processing-marc.zip Software: https://jorol.github.io/2022-elag/#/Software VM: https://jorol.github.io/2022-elag/#/VM --- class: middle ## "When MARC was created, the Beatles were a hot new group ..." --- ## MARC Must Die In 2002 Roy Tennant declared ["MARC Must Die"](https://www.libraryjournal.com/?detailStory=marc-must-die). Today the [MARC 21](https://www.loc.gov/marc/) format is still the workhorse of library metadata. Even our "Next Generation Library Systems" heavily rely on this standard from the ‘60s. Since we will continue to work with MARC 21 in the coming years, this tutorial will give an introduction to MARC 21. --- ## Agenda - MARC 21 - Introduction - Record elements - Serializations - Tools - Validation of MARC 21 records and common errors - Statistical analysis of MARC 21 data sets - Conversion of MARC 21 records - Metadata extraction from MARC 21 records --- ## MARC 21 Format for Bibliographic Data [MARC 21 format for Bibliographic Data](https://www.loc.gov/marc/bibliographic/) is a standard designed to be a carrier for bibliographic information about printed and manuscript textual materials, computer files, maps, music, continuing resources, visual materials, and mixed materials. Bibliographic data commonly includes titles, names, subjects, notes, publication data, and information about the physical description of an item. The standard defines [formats](https://www.loc.gov/marc/marcdocz.html) for the representation and exchange of [bibliographic](https://www.loc.gov/marc/bibliographic/), [authority](https://www.loc.gov/marc/authority/ecadhome.html), [holdings](https://www.loc.gov/marc/holdings/echdhome.html), [classification](https://www.loc.gov/marc/classification/eccdhome.html) and [community information](https://www.loc.gov/marc/community/eccihome.html) data in machine-readable form. --- ## A MARC record is composed of three elements: * *Record structure*: an implementation of the international standard Format for Information Exchange (ISO 2709) and its American counterpart, Bibliographic Information Interchange (ANSI/NISO Z39.2). * *Content designation*: the codes and conventions established explicitly to identify and further characterize the data elements within a record. * *Data content of the record*: the content of the data elements that comprise a MARC record is usually defined by standards outside the formats (e.g. [ISBD](https://www.ifla.org/publications/international-standard-bibliographic-description), [AACR2](http://www.aacr2.org/), [RDA](http://www.rda-jsc.org/archivedsite/rdaprospectus.html) ). --- ## Code lists The MARC 21 standard also provides [lists of source codes](https://www.loc.gov/standards/sourcelist/index.html) for vocabularies, rules and schemes. --- ## Agency The MARC 21 standard is maintained by the [The Network Development and MARC Standards Office](https://www.loc.gov/marc/ndmso.html) and documented in detail: https://www.loc.gov/marc/marcdocz.html. --- ## Introduction For a short introduction to MARC 21 see the OCLCs ["Introduction"](https://www.oclc.org/bibformats/en/introduction.html) or ["Understanding MARC Bibliographic: Machine-Readable Cataloging"](https://www.loc.gov/marc/umb/) for a more detailed one. The history of MARC is documented in ["MARC, its history and implications"](https://babel.hathitrust.org/cgi/pt?id=mdp.39015034388556). --- class: middle ## MARC 21 serializations --- ## MARC (ISO 2709) A "MARC (ISO 2709)" record ([ISO 2709:2008](https://www.iso.org/standard/41319.html) & [ANSI/NISO Z39.2-1994](https://www.niso.org/publications/ansiniso-z392-1994-r2016)) consists of three parts: * leader * directory * variable fields --- ## Leader The [leader](https://www.loc.gov/marc/specifications/specrecstruc.html#leader) has a fixed length of 24 ASCII characters which provide some basic information for processing the record. Data elements are positionally defined, see https://www.loc.gov/marc/bibliographic/bdleader.html. Leader positions 00-05 define the length of the records. The total length of a "MARC (2709)" record is limited to 99999 bytes. Position 09 defines the "character coding scheme" ([MARC-8](https://www.loc.gov/marc/specifications/specchartables.html) or [Unicode](https://www.iso.org/standard/69119.html)). --- ## Directory The [directory](https://www.loc.gov/marc/specifications/specrecstruc.html#direct) is variable sequence of entries, describing the tag, length and the starting position of each field. Each directory entry has a length of 12 characters: * tag: 00-02 * length of field: 03-06 * starting postion: 07-11 The length of a "MARC (2709)" record field is limited to 9999 bytes. --- ## Variable fields The [variable fields](https://www.loc.gov/marc/specifications/specrecstruc.html#varifields) are [control fields](https://www.loc.gov/marc/bibliographic/bd00x.html) followed by data fields. Data fields consist of two indicators and a sequence of subfields. Indicators can be used interpret or supplement the data found in the field. Their meaning varies by field. Each subfield consists of a subfield code and the corresponding value. Data fields and subfields could be repeatable. --- ## Separators A MARC record is terminated with a record terminator (Unicode character 'INFORMATION SEPARATOR THREE' [U+001D](https://www.fileformat.info/info/unicode/char/001d/index.htm)). Each part of a record is terminated with a field terminator (Unicode character 'INFORMATION SEPARATOR TWO' [U+001E](https://www.fileformat.info/info/unicode/char/001e/index.htm)). Each subfield of the data fields is terminated with a subfield terminator (Unicode character 'INFORMATION SEPARATOR ONE' [U+001F](https://www.fileformat.info/info/unicode/char/001f/index.htm)). --- ## Example "MARC (ISO 2709)" record ```no-highlight 00998nas a2200325 c 4500001001000000003000700010005001700017 007001500034008004100049016002200090016002200112022001400134 035002500148035002100173040002800194041000800222082002400230 245002700254246000900281264001800290300002100308336002600329 337003200355338003700387362001300424363001900437655009900456 856005300555856006400608^^987874829^^DE-101^^20171201121143. 0^^cr||||||||||||^^080311c20079999|||u||p|o ||| 0||||1eng c^ ^7 ^_2DE-101^_a987874829^^7 ^_2DE-600^_a2415107-5^^ ^_a1940 -5758^^ ^_a(DE-599)ZDB2415107-5^^ ^_a(OCoLC)502377032^^ ^ _a8999^_bger^_cDE-101^_d9999^^ ^_aeng^^74^_a020^_qDE-600^_2 22sdnb^^00^_aCode4Lib journal^_bC4LJ^^3 ^_aC4LJ^^31^_a[S.l.] ^_c2007-^^ ^_aOnline-Ressource^^ ^_aText^_btxt^_2rdaconten t^^ ^_aComputermedien^_bc^_2rdamedia^^ ^_aOnline-Ressource ^_bcr^_2rdacarrier^^0 ^_a1.2007 -^^01^_81.1\x^_a1^_i2007^^ 7 ^_0(DE-588)4067488-5^_0http://d-nb.info/gnd/4067488-5^_0(DE- 101)040674886^_aZeitschrift^_2gnd-content^^4 ^_uhttp://journ al.code4lib.org/^_xVerlag^_zkostenfrei^^4 ^_uhttp://www.bibl iothek.uni-regensburg.de/ezeit/?2415107^_xEZB^^^] ``` --- ## Leader, directory and fields ```no-highlight 00251nas a2200121 c 4500 ``` ```no-highlight 001001000000 007001500010 022001400025 041000800039 245002700047 246000900074 362001300083 856003300096^^ ``` ```no-highlight 987874829^^ cr||||||||||||^^ ^_a1940-5758^^ ^_aeng^^ 00^_aCode4Lib journal^_bC4LJ^^ 3 ^_aC4LJ^^ 0 ^_a1.2007 -^^ 4 ^_uhttp://journal.code4lib.org/^^ ^] ``` --- ## MARC XML The Library of Congress provides a [framework](https://www.loc.gov/standards/marcxml/) for working with MARC data in XML environments. The framework consists of a XML schema for MARC data ([XSD](https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd), [XSD illustration](https://www.loc.gov/standards/marcxml/xml/spy/spy.html)), [XSL stylesheets](https://www.loc.gov/standards/marcxml/#stylesheets) and some [tools](https://www.loc.gov/standards/marcxml/marcxml.zip) for transformation and validation of "MARC XML" data. "MARC XML" is often used to provide MARC data via APIs like [SRU](https://www.loc.gov/standards/sru/index.html) & [OAI](https://www.openarchives.org/pmh/). "MARC XML" defines several ["MARC XML design considerations"](https://www.loc.gov/standards/marcxml/marcxml-design.html), one is the "roundtripability from XML back to MARC". The schema doesn't limit the length of records and fields, so many data providers use "MARC XML" to circumvent the length restriction of "MARC (2709)". --- ## Example "MARC XML" record ```xml <?xml version="1.0" encoding="UTF-8"?> <collection xmlns="http://www.loc.gov/MARC21/slim"> <record> <leader>00251nas a2200121 c 4500</leader> <controlfield tag="001">987874829</controlfield> <controlfield tag="007">cr||||||||||||</controlfield> <datafield tag="022" ind1=" " ind2=" "> <subfield code="a">1940-5758</subfield> </datafield> <datafield tag="041" ind1=" " ind2=" "> <subfield code="a">eng</subfield> </datafield> <datafield tag="245" ind1="0" ind2="0"> <subfield code="a">Code4Lib journal</subfield> <subfield code="b">C4LJ</subfield> </datafield> ... </record> </collection> ``` --- ## Turbomarc [Index Data](https://www.indexdata.com/) developed "Turbomarc", another XML serialization for MARC data. The primary development goal of "Turbomarc" was [to speed up](https://www.indexdata.com/turbomarc-faster-xml-marc-records/) the processing of MARC data. --- ## Example "Turbomarc" record ```xml <?xml version="1.0" encoding="UTF-8"?> <collection xmlns="http://www.indexdata.com/turbomarc"> <r> <l>00251nas a2200121 c 4500</l> <c001>987874829</c001> <c007>cr||||||||||||</c007> <d022 i1=" " i2=" "> <sa>1940-5758</sa> </d022> <d041 i1=" " i2=" "> <sa>eng</sa> </d041> <d245 i1="0" i2="0"> <sa>Code4Lib journal</sa> <sb>C4LJ</sb> </d245> ... </r> </collection> ``` --- ## Line-based MARC formats There are several line-based MARC formats. These formats offer a more human-readable serialization of MARC records and are often used to examine, create or update MARC records. Several records are divided by a blank line. The formats differ slightly in the representation of MARC tags, indicators and subfield. --- ## MARC Line "MARC Line" is a simple line-by-line format also developed by Index Data. It is suitable for display but not recommended for further (machine) processing. --- ## Example "MARC Line" record ```no-highlight 00251nas a2200121 c 4500 001 987874829 007 cr|||||||||||| 022 $a 1940-5758 041 $a eng 245 00 $a Code4Lib journal $b C4LJ 246 3 $a C4LJ 362 0 $a 1.2007 - 856 4 $u http://journal.code4lib.org/ ``` --- ## MARCMaker This format was developed to create MARC records without having to use a MARC-based system. It is the most widely used line-based format and supported by several software tools (e.g. Catmandu, MarcEdit) and libraries (e.g. marc4j, pymarc). --- ## Example "MARCMaker" record ```no-highlight =LDR 00251nas a2200121 c 4500 =001 987874829 =007 cr|||||||||||| =022 \\$a1940-5758 =041 \\$aeng =245 00$aCode4Lib journal$bC4LJ =246 3\$aC4LJ =362 0\$a1.2007 - =856 4\$uhttp://journal.code4lib.org/ ``` --- ## MicroLIF "[MicroLIF](http://web.sonoma.edu/users/h/huangp/MARC_MicroLIF.htm)" is a MARC compatible record format created by a group of publishers and vendors in the '80s. --- ## Example "MircoLIF" record ```no-highlight LDR00251nas a2200121 c 4500^ 001987874829^ 007cr||||||||||||^ 022 _a1940-5758^ 041 _aeng^ 24500_aCode4Lib journal_bC4LJ^ 2463 _aC4LJ^ 3620 _a1.2007 -^ 8564 _uhttp://journal.code4lib.org/^ ``` --- ## Aleph Sequential "Aleph Sequential" is a line-based serialization format used by Ex Libris Ltd. integrated library systems "[Aleph](https://exlibrisgroup.com/products/aleph-integrated-library-system/)". --- ## Example "Aleph Sequential" record ```no-highlight 987874829 FMT L BK 987874829 LDR L 00251nas^a2200121^c^4500 987874829 001 L 987874829 987874829 007 L cr|||||||||||| 987874829 022 L $$a1940-5758 987874829 041 L $$aeng 987874829 24500 L $$aCode4Lib journal$$bC4LJ 987874829 2463 L $$aC4LJ 987874829 3620 L $$a1.2007 - 987874829 8564 L $$uhttp://journal.code4lib.org/ ``` --- ## MARC in JSON (MiJ) [JSON](https://www.json.org/) is a common lightweight data-interchange format which is also easy for humans to read and write. "MARC in JSON" (MiJ) defines a standard how to store MARC data as JSON objects. --- Example "MARC in JSON" record ```json { "leader":"00251nas a2200121 c 4500", "fields": [ { "001":"987874829" }, { "245": { "subfields": [ { "a":"Code4Lib journal" }, { "b":"C4LJ" } ], "ind1":"0", "ind2":"0" } } ] } ``` --- ## Catmandu JSON The [Catmandu](http://librecat.org/Catmandu/) data toolkit converts MARC records internally as an "[array of arrays](https://metacpan.org/pod/Catmandu::Importer::MARC#EXAMPLE-ITEM)", which can be exported as JSON or YAML objects. --- ## Example "Catmandu JSON" record ```json { "_id": "987874829", "record": [ [ "LDR", " ", " ", "_", "00251nas a2200121 c 4500" ], [ "245", "0", "0", "a", "Code4Lib journal", "b", "C4LJ" ] ] } ``` --- class: middle ## Software --- ## Software For this tutorial we need some command-line tools to process MARC data. You can download a [VirtualBox](https://www.virtualbox.org/) image containing most of the required tools: [elag.ova](https://jorol.de/elag/elag.ova). Please follow the installation instructions at the [Catmandu](https://librecatproject.wordpress.com/get-catmandu/) project site. Or install all tools on your own system. All necessary steps are described using a [Debian](https://jorol.github.io/2022-elag/#/Software) based system. --- ## Perl Some of these tools require a [Perl](https://www.perl.org/) interpreter. I recommend to install a local Perl environment on your system using [`perlbrew`](https://perlbrew.pl/): ```bash # install perlbrew $ \curl -L https://install.perlbrew.pl | bash # edit .bashrc $ echo -e '\nsource ~/perl5/perlbrew/etc/bashrc\n' >> ~/.bashrc $ source ~/.bashrc # initialize $ perlbrew init # see what versions are available $ perlbrew available # install a Perl version $ perlbrew install -j 2 -n perl-5.34.1 # see installed versions $ perlbrew list # switch to an installation and set it as default $ perlbrew switch perl-5.34.1 # install cpanm $ perlbrew install-cpanm ``` --- ## Catmandu [Catmandu](https://librecat.org/Catmandu) is data toolkit which can be used for [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load) processes. The project website provides some detailed [instructions](https://librecat.org/Catmandu/#installation) on how to install `catmandu` on different systems. ```bash # install dependencies $ sudo apt install autoconf build-essential dconf-cli \ libexpat1-dev ibgdbm-dev libssl-dev libxml2-dev libxslt1-dev \ libyaz-dev parallel perl-doc yaz zlib1g zlib1g-dev # install Catmandu modules $ cpanm Catmandu Catmandu::Breaker Catmandu::Exporter::Table \ Catmandu::Identifier Catmandu::Importer::getJSON Catmandu::MARC \ Catmandu::OAI Catmandu::PICA Catmandu::PNX Catmandu::RDF \ Catmandu::SRU Catmandu::Stat Catmandu::Template Catmandu::VIAF \ Catmandu::Validator::JSONSchema Catmandu::Wikidata Catmandu::XLS \ Catmandu::XSD Catmandu::Z3950 ``` --- ## marcvalidate [MARC::Schema](https://metacpan.org/pod/MARC::Schema) provides the command-line utility `marcvalidate` to validate MARC records. ```bash $ cpanm MARC::Schema ``` --- ## marcstats.pl [MARC::Record::Stats](https://metacpan.org/pod/MARC::Record::Stats) provides the command-line utility `marcstats.pl` to generate statistics for your MARC records. ```bash $ cpanm MARC::Record::Stats ``` --- ## uconv For [Unicode](https://home.unicode.org/) [normalizations](https://en.wikipedia.org/wiki/Unicode_equivalence) we need the command-line utility `uconv`. ```bash $ sudo apt install libicu-dev ``` --- ## YAZ [YAZ](https://www.indexdata.com/resources/software/yaz/) is a free open source toolkit from [Index Data](https://www.indexdata.com/), that includes command-line utility programs like `yaz-client` and `yaz-marcdump`. ```bash $ sudo apt install yaz ``` --- ## xmllint `xmllint` is a command-line tool to process XML data. ```bash $ sudo apt install libxml2-utils ``` --- ## xsltproc For transformation of XML data with XSL stylesheets we need a [XSLT](https://en.wikipedia.org/wiki/XSLT) processor. ```bash sudo apt install xsltproc ``` --- ## Documenation For more information of these tools you can read their `man` or `help` pages, e.g.: ```bash $ man yaz-marcdump $ xmllint --help ``` --- class: middle ## Get MARC 21 data --- ## Open Data Several libraries and library networks publish their data as "[open data](https://en.wikipedia.org/wiki/Open_data)". [Péter Király](https://github.com/pkiraly) created a list of international open MARC 21 data sets at <https://github.com/pkiraly/metadata-qa-marc#datasources>. The Internet Archive's [Open Library](http://openlibrary.org/) project is making thousands of library records freely available for anyone's use, see <https://archive.org/details/ol_data>. You can download the data sets via the command line, e.g.: ```bash $ wget http://ered.library.upenn.edu/data/opendata/pau.zip $ unzip pau.zip ``` --- ## API Many libraries offer MARC 21 data via public [APIs](https://en.wikipedia.org/wiki/API) like Z39.50, SRU, OAI. --- ## Z39.50 Z39.50 is a standard ([ANSI/NISO Z39.50-2003](https://www.loc.gov/z3950/agency/Z39-50-2003.pdf)) that defines a client/server based service and protocol for information retrieval. Like MARC 21 Z39.50 has a quite long history ([Lynch, 1997](http://www.dlib.org/dlib/april97/04lynch.html)) and is maintained by Library of Congress. Many libraries offer access to their Online Public Access Catalogues (OAPC) via Z39.50 server, e.g. [Library of Congress](https://www.loc.gov/z3950/lcserver.html) or [kobv](https://www.kobv.de/services/recherche/z39-50/). See "[Bath Profile](http://www.ukoln.ac.uk/interop-focus/activities/z3950/int_profile/bath/draft/stable1.html#5.A.1.%20Functional%20Area%20A:%20Level%201%20Basic%20Bibliographic%20Search%20and%20Retrieval%20Emphasizing%20Precision)" for common search and retrieval operations and attribute sets. --- ## Z39.50 To retrieve data from Z39.50 servers you need a client software like `yaz-client` from [Index Data](https://www.indexdata.com/), which is part of the free open source toolkit "[YAZ](https://www.indexdata.com/resources/software/yaz/)": ```bash # open client $ yaz-client # connect to database Z> open lx2.loc.gov/LCDB # set format to MARC Z> format 1.2.840.10003.5.10 # set element set Z> element F # append retrieved records to file Z> set_marcdump loc.z3950.mrc # find record for subject Z> find @attr 5=100 @attr 1=21 "Perl" # get first 50 records Z> show 1+50 # close client Z> exit ``` --- ## Z39.50 The Catmandu toolkit provides a Z39.50 client "[Catmandu::Importer::Z3950](https://metacpan.org/pod/Catmandu::Importer::Z3950)": ```bash $ catmandu convert -v Z3950 \ --host z3950.kobv.de \ --port 210 \ --databaseName k2 \ --preferredRecordSyntax usmarc \ --queryType PQF \ --query '@attr 1=1016 code4lib' \ --handler USMARC \ to MARC > code4lib.mrc ``` --- ## SRU [SRU](https://www.loc.gov/standards/sru/) (Search/Retrieve via URL) is another standard protocol for information retrival. It uses HTTP as application layer protocol and XML for data serialization. Search queries are expressed with [CQL](https://www.loc.gov/standards/sru/cql/index.html) (Contextual Query Language), a formal language for representing queries. --- ## SRU You can use the `yaz-client` to search and retrive data from a SRU server: ```bash # open client $ yaz-client # connect to database Z> open http://sru.k10plus.de/gvk # append retrieved records to file Z> set_marcdump gvk.sru.xml # find record for subject Z> find pica.sw=Perl # get first 50 records Z> show 1+50 # close client Z> exit ``` --- ## SRU The Catmandu toolkit also provides a SRU client "[Catmandu::Importer::SRU](https://metacpan.org/pod/Catmandu::Importer::SRU)": ```bash $ catmandu convert -v SRU \ --base https://services.dnb.de/sru/zdb \ --recordSchema MARC21-xml \ --query 'dnb.iss = 1940-5758' \ --parser marcxml \ to MARC --type XML > code4lib.sru.xml ``` --- ## OAI-PMH [OAI-PMH](https://www.openarchives.org/OAI/openarchivesprotocol.html) (Open Archives Initiative Protocol for Metadata Harvesting) is a protocol for metadata replication and distribution. _Data providers_ host metadata records and their changes over time, so _service providers_ can harvest it. As SRU it uses HTTP as application layer protocol and XML for data serialization. --- ## OAI-PMH The Catmandu toolkit provides an OAI-PMH harvester "[Catmandu::Importer::OAI](https://metacpan.org/pod/Catmandu::Importer::SRU)": ```bash $ catmandu convert -v OAI \ --url https://lib.ugent.be/oai \ --metadataPrefix marcxml \ --from 2022-02-01 \ --until 2022-02-01 \ --handler marcxml \ to MARC > gent.oai.mrc ``` --- class: middle ## MARC 21 validation --- ## ... with yaz-marcdump The command-line tool `yaz-marcdump` can be used for several MARC related tasks. To validate the structure of "MARC (ISO 2709)" records use the option `-n`, which will omit any other output: ```bash $ yaz-marcdump -n loc.mrc ``` --- If you want to validate records in other MARC formats you have to specify the format with option `-i`: ```bash $ yaz-marcdump -n -i marcxml loc.mrc.xml ``` --- If `yaz-marcdump` finds any errors it will output an error message: ```bash $ yaz-marcdump -n bad_hathi_records.mrc <!-- EOF while searching for RS --> ``` --- To narrow down the error use option `-p`, which will print the record numbers and offsets: ```bash $ yaz-marcdump -np bad_hathi_records.mrc <!-- Record 1 offset 0 (0x0) --> <!-- Record 2 offset 1293 (0x50d) --> <!-- Record 3 offset 5259 (0x148b) --> <!-- Record 4 offset 6343 (0x18c7) --> <!-- EOF while searching for RS --> ``` --- class: middle ## [Common structural problems](https://bibwild.wordpress.com/2010/02/02/structural-marc-problems-you-may-encounter/) in MARC records --- ## Invalid leader bytes ```bash $ yaz-marcdump -np bad_leaders_10_11.mrc <!-- Record 1 offset 0 (0x0) --> Indicator length at offset 10 should hold a number 1-9. Assuming 2 Identifier length at offset 11 should hold a number 1-9. Assuming 2 Length data entry at offset 20 should hold a number 3-9. Assuming 4 Length starting at offset 21 should hold a number 4-9. Assuming 5 Length implementation at offset 22 should hold a number. Assuming 0 ``` --- ## Record exceeds the maximum length ```bash $ yaz-marcdump -np bad_hathi_records.mrc <!-- Record 1 offset 0 (0x0) --> <!-- Record 2 offset 1293 (0x50d) --> <!-- Record 3 offset 5259 (0x148b) --> <!-- Record 4 offset 6343 (0x18c7) --> <!-- EOF while searching for RS --> ``` --- ## Field exceeds the maximum length ```bash $ yaz-marcdump -np bad_oversize_field_bad_directory.mrc <!-- Record 1 offset 0 (0x0) --> <!-- Record 2 offset 1571 (0x623) --> Directory offset 240: Bad value for data length and/or length starting (0\x1Edrd-348919) Base address not at end of directory, base 242, end 241 Directory offset 216: Data out of bounds 21398 >= 11833 <!-- Record 3 offset 13404 (0x345c) --> <!-- Record 4 offset 16133 (0x3f05) --> <!-- Record 5 offset 18823 (0x4987) --> ``` --- ## Invalid subfield element ```bash $ yaz-marcdump -np -i marcxml chabon-bad-subfields-element.xml yaz_marc_read_xml failed ``` --- ## MARC control character in internal data value ```bash $ yaz-marcdump -np bad_data_value.mrc <!-- Record 1 offset 0 (0x0) --> Separator but not at end of field length=37 ``` --- ## Wrong encoded character ```bash $ yaz-marcdump -np bad_encoding.mrc <!-- Record 1 offset 0 (0x0) --> No separator at end of field length=53 No separator at end of field length=64 ``` --- ## ... with xmllint Use `xmllint` to validate "MARC XML" data against the MARC [XSD schema](https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd). If you just want validate the structure of "MARC XML" records, use the options `--noout` (which will omit any other output) and `--schema` (path to XSD file): ```bash $ xmllint --noout \ --schema http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd \ loc.mrc.xml loc.mrc.xml validates $ xmllint --noout \ --schema http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd \ chabon-bad-subfields-element.xml chabon-bad-subfields-element.xml:8: element subfields: Schemas validity error : Element '{http://www.loc.gov/MARC21/slim}subfields': This element is not expected. Expected is ( {http://www.loc.gov/MARC21/slim}subfield ). chabon-bad-subfields-element.xml fails to validate ``` --- ## ... with marcvalidate While `yaz-marcdump` and `xmllint` are useful to identify structural problems within MARC records, `marcvalidate` can be used to validate MARC tags and subfield against a [Avram](https://format.gbv.de/schema/avram/specification) specification. The default specification was build by [Péter Király](https://pkiraly.github.io/2018/01/28/marc21-in-json/) based on the MARC documentation of the Library of Congress. The specification can be enhanced with local defined fields. By default `marcvalidate` expects "MARC (ISO 2709)" records: ```bash $ marcvalidate loc.mrc 12360325 906 unknown field 1180649 035 unknown subfield 9 ... ``` --- To validate "MARC XML" data use option `--type`: ```bash $ marcvalidate --type XML loc.mrc.xml 12360325 906 unknown field 1180649 035 unknown subfield 9 ... ``` --- To validate against a local Avram schema use option `--schema`: ```bash $ marcvalidate --schema my_schema.json loc.mrc ``` --- ## Avram schema for MARC ``` ... "022": { "historical-subfields": { "b": { "label": "Form of issue [OBSOLETE] [CAN/MARC only]" }, "c": { "label": "Price [OBSOLETE] [CAN/MARC only]" } }, "indicator1": { "codes": { " ": { "label": "No level specified" }, "0": { "label": "Continuing resource of international interest" }, "1": { "label": "Continuing resource not of international interest" } }, "label": "Level of international interest" }, "indicator2": null, "label": "International Standard Serial Number", "repeatable": true, "subfields": { "2": { "label": "Source", "repeatable": false }, "6": { "label": "Linkage", "repeatable": false }, "8": { "label": "Field link and sequence number", "repeatable": true }, "a": { "label": "International Standard Serial Number", "repeatable": false }, "l": { "label": "ISSN-L", "repeatable": false }, "m": { "label": "Canceled ISSN-L", "repeatable": true }, "y": { "label": "Incorrect ISSN", "repeatable": true }, "z": { "label": "Canceled ISSN", "repeatable": true } }, "tag": "022", "url": "https://www.loc.gov/marc/bibliographic/bd022.html" }, ... ``` --- ## QA catalogue If you want to run more detailed analyses check "[QA catalogues - a metadata quality assessment tool for MARC records](https://github.com/pkiraly/metadata-qa-marc)". --- class: middle ## MARC statistics --- ## ... with marcstats.pl To generate statistics for tags and subfield codes of "MARC (ISO 2709)" records use `marcstats.pl`. ```bash $ marcstats.pl loc.mrc Statistics for 50 records Tag Rep. Occ.,% 001 100.00 005 100.00 006 2.00 020 76.00 a 76.00 q 2.00 035 [Y] 48.00 9 [Y] 18.00 a [Y] 30.00 ... ``` --- ## ... with Catmandu If you want generate statistics for other MARC serialization use [Catmandu::Breaker](https://metacpan.org/pod/Catmandu::Breaker). First you need to "break" the MARC records into pieces. Afterwards you can calculate statistics for MARC tags and subfield codes. ```bash $ catmandu convert MARC --type XML to Breaker --handler marc \ < loc.mrc.xml > loc.breaker $ catmandu breaker loc.breaker ``` --- With option `--fields` you can calculate statistics for specific tags and subfield codes: ```bash $ catmandu breaker --fields 245a,020a loc.breaker | name | count | zeros | zeros% | min | max | mean | variance | stdev | uniq~ | uniq% | entropy | |------|-------|-------|--------|-----|-----|------|----------|-------|-------|-------|---------| | # | 50 | | | | | | | | | | | | 245a | 50 | 0 | 0.0 | 1 | 1 | 1 | 0.0 | 0.0 | 45 | 90.1 | 5.4/5.6 | | 020a | 52 | 12 | 24.0 | 0 | 4 | 1.04 | 0.8 | 0.9 | 51 | 98.2 | 5.3/6.0 | ``` --- Use option `--as` to specify a tabular output format (CSV, TSV, XLS(X)): ```bash $ catmandu breaker --as XLSX loc.breaker > loc.xlsx ``` --- class: middle ## Unicode --- ## MARC-8 and Unicode "MARC (ISO 2709)" records could be encoded in two different character coding schemes: [MARC-8](https://www.loc.gov/marc/specifications/specchartables.html) or [UCS/Unicode](https://www.iso.org/standard/69119.html). Use `yaz-marcdump` to convert the encoding of MARC records. Specify the encoding with options `-f` and `-t`. With option `-l` you can set the character coding scheme in the MARC leader position 09. ```bash $ yaz-marcdump -f MARC-8 -t UTF-8 -o marc -l 9=97 marc21.raw \ > marc21.utf8.raw ``` A conversion from UTF-8 to MARC-8 is not recommended, because it could be "lossy". --- ## Unicode normalization Unicode provides single code points for many characters that could be viewed as combinations of two or more characters, e.g. German umlauts: | Composed/NFC | Decomposed/NFD | |----------|------------| | ä ([Latin Small Letter A with Diaeresis](https://www.compart.com/en/unicode/U+00E4) U+00E4) | a ([Latin Small Letter A](https://www.compart.com/en/unicode/U+0061) U+0061) + ◌̈ ([Combining Diaeresis](https://www.compart.com/en/unicode/U+0308) U+0308) | --- ## uconv With the command-line utility `uconv` you can transliterate data between different Unicode [normalization forms](https://unicode.org/reports/tr15/#Norm_Forms): ```bash $ uconv -x NFC marc21.nfd.xml > marc21.nfc.xml $ uconv -x NFD marc21.nfc.xml > marc21.nfd.xml ``` You should only normalize "MARC XML" data, as the normalization of "MARC (ISO 2709)" would result in corrupted records, due to different field length. --- Use option `-x Any-Name` to show Unicode names of characters: ```bash $ echo -en 'ÅÅ' | uconv -x Any-Name \N{ANGSTROM SIGN}\N{LATIN CAPITAL LETTER A WITH RING ABOVE} ``` --- class: middle ## Transformation of MARC data --- ## ... with yaz-marcdump `yaz-marcdump` can be used to tranform MARC data between different serializations. Use options `-i` and `-o` to specfiy the input and output formats. --- ## "MARC (ISO 2709)" to "MARC XML" ```bash $ yaz-marcdump -i marc -o marcxml code4lib.mrc > code4lib.xml ``` --- ## "MARC (ISO 2709)" to "Turbomarc" ```bash $ yaz-marcdump -i marc -o turbomarc code4lib.mrc > code4lib.turbo.xml ``` --- ## "MARC (ISO 2709)" to "MARC Line" ```bash $ yaz-marcdump -i marc -o line code4lib.mrc > code4lib.line ``` --- ## "MARC XML" to "MARC-in-JSON" ```bash $ yaz-marcdump -i marcxml -o json code4lib.mrc.xml > code4lib.json ``` --- ## ... with Catmandu The command-line interface of the Catmandu toolkit also offers several tranformations of MARC data. The default MARC serialization is "MARC (ISO 2709)". --- ## "MARC (ISO 2709)" to "MARC XML" ```bash $ catmandu convert MARC to MARC --type XML < code4lib.mrc \ > code4lib.xml ``` --- ## "MARC XML" to "MARC (ISO 2709)" ```bash $ catmandu convert MARC --type XML to MARC < code4lib.xml \ > code4lib.mrc ``` --- ## "MARC (ISO 2709)" to "MARCMaker" ```bash $ catmandu convert MARC to MARC --type MARCMaker < code4lib.mrc \ > code4lib.mrk ``` --- ## "MARC XML" to "MARC-in-JSON" ```bash $ catmandu convert MARC --type XML to MARC --type MiJ \ < code4lib.xml > code4lib.json ``` --- ## "MARC XML" to YAML ```bash $ catmandu convert MARC to YAML < code4lib.mrc \ > code4lib.yml ``` --- ## Breaker The [Catmandu::Breaker](https://metacpan.org/pod/Catmandu::Breaker) module "breaks" data into smaller components and exports them line by line: ```bash $ catmandu convert MARC to Breaker --handler marc < code4lib.mrc 987874829 LDR 01031nas a2200337 c 4500 987874829 001 987874829 987874829 003 DE-101 987874829 005 20200306093601.0 987874829 007 cr|||||||||||| 987874829 008 080311c20079999|||u||p|o ||| 0||||1eng c 987874829 0162 DE-101 987874829 016a 987874829 987874829 0162 DE-600 987874829 016a 2415107-5 987874829 022a 1940-5758 987874829 035a (DE-599)ZDB2415107-5 ... ``` --- You can process this output with other command-line utilities like `grep`, `sort` and `uniq`. For example, to extract all ISBN from a MARC data sets, we can build a command-line [pipeline](https://en.wikipedia.org/wiki/Pipeline_(Unix)) like this: ```bash $ catmandu convert MARC to Breaker --handler marc < loc.mrc \ | grep -P '\t020a' | cut -f 3 | grep -oP '^[\dX]+' | sort | uniq -c 1 0072123397 1 0130284181 1 0201422190 1 0470176431 2 0596002270 ... ``` --- ## Generic file formats With Catmandu you can export data to generic data formats like CSV, JSON, TSV, XLSX and YAML. MARC serializations are "complex/nested data structures" which cannot be stored in flat data structures like tables. You can export MARC records to nested formats like JSON and YAML: ```bash $ catmandu convert MARC to YAML < code4lib.mrc $ catmandu convert MARC to JSON < code4lib.mrc ``` --- This will **not** work: ```bash $ catmandu convert MARC to CSV < code4lib.mrc $ catmandu convert MARC to TSV < code4lib.mrc $ catmandu convert MARC to XLSX < code4lib.mrc ``` You need to use "[Catmandu::Fix](https://metacpan.org/pod/Catmandu::Fix)" to extract and map your data to a tabular data structure: ```bash $ catmandu convert MARC to CSV \ --fix 'marc_map(245abc,dc_title,join:" ");retain_field(dc_title)' \ < code4lib.mrc ``` --- ## ... with XLST If you want transform MARC records to other formats, you have to map MARC (sub)fields to corresponding fields of the other format. The Libary of Congress provides several crosswalks: * [MARC to MODS](https://www.loc.gov/standards/mods/mods-mapping.html) * [MODS to MARC](https://www.loc.gov/standards/mods/v3/mods2marc-mapping.html) * [MARC to Dublin Core](https://www.loc.gov/marc/marc2dc.html) * [Dublin Core to MARC](https://www.loc.gov/marc/dccross.html) * [ONIX to MARC](https://www.loc.gov/marc/onix2marc.html) Based on these crosswalks the Library of Congress published several [XLS stylesheets](https://www.loc.gov/standards/marcxml/#stylesheets), which can be used with a XSLT processor to transform "MARC XML" records to other formats like BIBFRAME, HTML, MODS, OAI-DC and RDF. --- ## "MARC XML" to MODS ```bash $ xsltproc MARC21slim2MODS3-7.xsl loc.mrc.xml > loc.mods.xml ``` --- ## "MARC XML" to HTML ```bash $ xsltproc MARC21slim2HTML.xsl loc.mrc.xml > loc.html ``` --- ## "MARC XML" to "OAI-DC" ```bash $ xsltproc MARC21slim2OAIDC.xsl loc.mrc.xml > loc.oaidc.xml ``` --- ## "MARC XML" to "RDF-DC" ```bash $ xsltproc MARC21slim2RDFDC.xsl loc.mrc.xml > loc.rdfdc.xml ``` --- ## "MARC XML" to "[BIBFRAME](https://github.com/lcnetdev/marc2bibframe2)" ```bash $ xsltproc bibframe-xsl/marc2bibframe2.xsl loc.mrc.xml \ > loc.bibframe.xml ``` --- class: middle ## Extract data from MARC records --- ## ... with xmllint First check if a [XML namespace](https://www.w3.org/TR/xml-names/) is declared in the document: ```bash $ head loc.mrc.xml <collection xmlns="http://www.loc.gov/MARC21/slim"> <record> <leader>01227cam a22002894a 4500</leader> <controlfield tag="001">12360325</controlfield> <controlfield tag="005">20070126075126.0</controlfield> <controlfield tag="008">010327s2001 nyua 001 0 eng </controlfield> <datafield tag="906" ind1=" " ind2=" "> <subfield code="a">7</subfield> <subfield code="b">cbc</subfield> <subfield code="c">orignew</subfield> ``` --- If a namespace is set use the "local" XML element name in the [XPATH](https://www.w3.org/TR/2017/REC-xpath-31-20170321/) expression: ```bash $ xmllint --xpath '//*[local-name()="controlfield"]/@tag' \ loc.mrc.xml ``` --- ## Extract all tags and count them ```bash $ xmllint --xpath '//@tag' loc.mrc.xml | sort | uniq -c ``` --- ## Extract all IDs from MARC 001: ```bash $ xmllint --xpath '//*[local-name()="controlfield"][@tag="001"]/text()' loc.mrc.xml ``` --- ## Extract all subfields from MARC 245 fields ```bash $ xmllint --xpath '//*[local-name()="datafield"][@tag="245"]' loc.mrc.xml ``` --- ## Extract subfield "a" from MARC 245 fields ```bash $ xmllint --xpath '//*[local-name()="datafield"][@tag="245"]/*[local-name()="subfield"][@code="a"]/' loc.mrc.xml ``` --- ## Extract content from subfield "a" from MARC 245 fields ```bash $ xmllint --xpath '//*[local-name()="datafield"][@tag="245"]/*[local-name()="subfield"][@code="a"]/text()' loc.mrc.xml ``` --- ## Extraxt all ISBNs ```bash $ xmllint --xpath '//*[local-name()="datafield"][@tag="020"]/*[local-name()="subfield"][@code="a"]/text()' loc.mrc.xml ``` --- ## Extract all DDC numbers ```bash $ xmllint --xpath '//*[local-name()="datafield"][@tag="082"]/*[local-name()="subfield"][@code="a"]/text()' loc.mrc.xml ``` --- ## ... Catmandu Catmandu uses a [domain specific language](https://en.wikipedia.org/wiki/Domain-specific_language) (DSL) called "fix" to extract, map and tranform data. Several "fixes" for library specifc data format like [MARC](https://metacpan.org/pod/Catmandu::MARC) and [PICA](https://metacpan.org/pod/Catmandu::PICA) are available. Most common "fixes" are documented on [cheat sheet](https://librecat.org/assets/catmandu_cheat_sheet.pdf). "Fixes" can be used as command-line options or stored in a "fix" file: ```bash $ catmandu convert MARC to CSV \ --fix 'marc_map(001,id); retain_field(id)' < loc.mrc $ catmandu convert MARC to YAML --fix marc2dc.fix < loc.mrc ``` --- ## marc_map With [`marc_map`](https://metacpan.org/pod/Catmandu::Fix::marc_map) you can extract (sub)fields from MARC records and map them to your own data model: ```no-highlight marc_map(001,dc_identifier) # {"dc_identifier":"12360325"} ``` --- ## Extract part of field MARC uses several "fixed-length" fields, where data elements are positionally defined. E.g. if you want to extract the language code from MARC 008 specify the positions with `/35-37`: ```no-highlight marc_map(008/35-37,dc_language) # {"dc_language":"eng"} ``` --- ## Extract fields with specific indicators If you want to extract fields with certain indicators specify them within sqare brackes `[1,4]` ```no-highlight marc_map("246[1,4]",marc_varyingFormOfTitle) # {"marc_varyingFormOfTitle":"Games, diversions & Perl culture"} ``` --- ## Extract subfields To extract certain subfields from a MARC data field use the subfield codes. By default several subfields will be joined to one string. Use option `join` to join them with another string. With option `split:1` you cal split the subfields to a list. Use option `pluck` if you want to extract the subfields in a certain order. ```no-highlight marc_map(245ab,dc_title,join:' ') # {"dc_title":"Perl : the complete reference /"} marc_map(245ab,dc_title,split:1) # {"dc_title":["Perl :","the complete reference /"]} marc_map(245ba,dc_title,split:1,pluck:1) # {"dc_title":["the complete reference /","Perl :"]} ``` --- ## Extract repeatable fields MARC data fields could be repeatable. Use option `split:1` to create a list from all fields. ```no-highlight marc_map(650a,dc_subject,split:1) # {"dc_subject":["Data mining.","Text processing (Computer science)","Perl (Computer program language)"]} ``` --- ## Extract repeatable subfields MARC subfields could be repeatable within a MARC data field. Use option `split:1` to create a list from all fields. To create a list for all subfields within one data field use option `nested_arrays:1` which will return a "list of lists" of subfields, one list for each data field. ```no-highlight marc_map(655ay,marc_indexTermGenre,split:1) # {"marc_indexTermGenre":["Portrait photographs","1910-1920.","Photographic prints","1910-1920."]} marc_map(655ay,marc_indexTermGenre,split:1,nested_arrays:1) # {"marc_indexTermGenre":[["Portrait photographs","1910-1920."],["Photographic prints","1910-1920."]]} ``` --- ## Extract subfields by value To extract a subfield only if another subfield in the same data field has a certain value use a [loop](https://metacpan.org/pod/Catmandu::Fix::Bind::marc_each) with a [condition](https://metacpan.org/pod/Catmandu::Fix::Condition). ```no-highlight =856 4\$uhttp://journal.code4lib.org/$xVerlag$zkostenfrei =856 4\$uhttp://www.bibliothek.uni-regensburg.de/ezeit/?2415107$xEZB ``` ```no-highlight do marc_each() if marc_match(856x,EZB) marc_map(856u,ezb_uri) end end # {"ezb_uri":"http://www.bibliothek.uni-regensburg.de/ezeit/?2415107"} ``` --- ## Conditions Use conditions [`marc_has`](https://metacpan.org/pod/Catmandu::Fix::Condition::marc_has), [`marc_has_many`](https://metacpan.org/pod/Catmandu::Fix::Condition::marc_has_many) or [`marc_match`](https://metacpan.org/pod/Catmandu::Fix::Condition::marc_match) to check if an record has certain fields or match certain conditions. ```no-highlight set_array(errors) # Check if a 245 field is present unless marc_has('245') set_field(errors.$append,"no 245 field") end # Check if there is more than one 245 field if marc_has_many('245') set_field(errors.$append,"more than one 245 field?") end # Check if in 008 position 7 to 10 contains a # 4 digit number ('\d' means digit) unless marc_match('008/07-10','\d{4}') set_field(errors.$append,"no 4-digit year in 008 position 7->10") end ``` --- ## Add fields to a record You can add field to MARC records with [`marc_add`](https://metacpan.org/pod/Catmandu::Fix::marc_add). ```no-highlight marc_add(999,a,my,b,local,c,field) marc_add(900,a,$.my.field) ``` --- ## Append values to (sub)fields Use [`marc_append`](https://metacpan.org/pod/Catmandu::Fix::marc_append) to append values to a (sub)field ```no-highlight marc_append(001,'-X') marc_append(100a,' [author]') ``` --- ## Assign a value to (sub)fields Assign a new value to a MARC field with [`marc_set`](https://metacpan.org/pod/Catmandu::Fix::marc_set). ```no-highlight marc_set(001,123456789) marc_set(245a,'Perl - battle tested.') ``` --- ## Remove (sub)fields Use [`marc_remove`](https://metacpan.org/pod/Catmandu::Fix::marc_remove) to remove (sub)fields from MARC records. ```no-highlight marc_remove(991) marc_remove(9..) marc_remove(0359) ``` --- ## Replace strings in (sub)fields Use [`marc_replace_all`](https://metacpan.org/pod/Catmandu::Fix::marc_replace_all) to replace a string in MARC (sub)fields. ```no-highlight marc_replace_all(001,1,X) marc_replace_all(245a,Perl,"Perl [programming language]") ``` --- ## Filter MARC records You can filter MARC records from a dataset with [`reject`](https://metacpan.org/pod/Catmandu::Fix::reject) or `select`. ```no-highlight reject marc_has_many(245) select marc_match(245a,Perl) ``` --- ## Validate MARC records You can [`validate`](https://metacpan.org/pod/Catmandu::Fix::validate) MARC records and collect the error messages or filter [`valid`](https://metacpan.org/pod/Catmandu::Fix::Condition::valid) records. ```no-highlight validate(.,MARC,error_field: errors) select valid(.,MARC) ``` --- ## Dictionaries MARC uses codes for [languages](https://www.loc.gov/marc/languages/language_code.html) and [countries](https://www.loc.gov/marc/countries/countries_code.html). You can build dictionaries based on these list and [lookup](https://metacpan.org/pod/Catmandu::Fix::lookup) names for these codes. ```csv $ less languages.csv eng,English enm,English, Middle (1100-1500) epo,Esperanto esk,Eskimo languages est,Estonian .., ``` ```no-highlight # { "dc_language": "eng" } lookup(dc_language,languages.csv) lookup(dc_language,languages.csv,default:English) lookup(dc_language,languages.csv,delete:1) # { "dc_language": "English" } ``` --- ## Normalize ISBNs and ISSNs Use [`issn`](https://metacpan.org/pod/Catmandu::Fix::issn), [`isbn10`](https://metacpan.org/pod/Catmandu::Fix::isbn10) or [`isbn13`](https://metacpan.org/pod/Catmandu::Fix::isbn13) to normalize international identifier. ```no-highlight # { "issn" : "1553667x" } issn(issn) # { "issn" : "1553-667X" } # { "isbn" : "1565922573" } isbn10(isbn) # {"isbn" : "1-56592-257-3" } isbn13(isbn) # { "isbn" : "978-1-56592-257-0" } ``` --- ## Links - [Avram schema for MARC 21](https://pkiraly.github.io/2018/01/28/marc21-in-json/) - [Catmandu cheat sheet](http://librecat.org/assets/catmandu_cheat_sheet.pdf) - [Catmandu mapping rules](https://github.com/LibreCat/Catmandu-MARC/wiki/Mapping-rules) - [Catmandu::MARC::Tutorial](https://metacpan.org/dist/Catmandu-MARC/view/lib/Catmandu/MARC/Tutorial.pod) - [MARC Standards](https://www.loc.gov/marc/) - [MARC 21 format for Bibliographic Data](https://www.loc.gov/marc/bibliographic/) - [Tutorial "Processing MARC ... with open source tools"](https://jorol.github.io/processing-marc/#/) --- ## Literature - Henriette Avram (1975): *MARC; its History and implications.*
- Bernhard Eversberg (1999): *Was sind und was sollen Bibliothekarische Datenformate* [urn:nbn:de:gbv:084-1103231323](https://nbn-resolving.org/urn%3Anbn%3Ade%3Agbv%3A084-11032313237) - Roy Tennant (2002): *MARC Must Die.*
- William E. Moen, Penelope Benardino (2003): *Assessing Metadata Utilization: An Analysis of MARC Content Designation Use*
- Karen Smith-Yoshimura, Catherine Argus, Timothy J. Dickey, Chew Chiat Naun, Lisa Rowlinson de Oritz & Hugh Taylor (2010): *Implications of MARC Tag Usage on Library Metadata Practices*
- Roy Tennant (2013-2018): *MARC Usage in WorldCat*
(no longer available) - Péter Király (2019): *Validating 126 million MARC records* [10.1145/3322905.3322929](https://doi.org/10.1145/3322905.3322929) - Péter Király (2019): *Measuring Metadata Quality* [10.13140/RG.2.2.33177.77920](https://doi.org/10.13140/RG.2.2.33177.77920) --- ## Thanks ... to all open source developers and the Catmandu community for creating the tools ... to Roy Tennant and Péter Király for their research on MARC data ... to Jakob Voß for creating the tutorial "[Einführung in die Verarbeitung von PICA-Daten](https://pro4bib.github.io/pica/#/)", which I used as a template for "[Processing MARC](https://jorol.github.io/processing-marc/#/get_records)" --- ## My contact details Johann Rolschewski Email: johann.rolschewski@sbb.spk-berlin.de Github: jorol CPAN: JOROL