AMICO  
Enabling Educational use of Museum Multimedia MEMBERSContributeUniversitySchool
How ToLibraryCommitteesDiscussionGovernanceManagementTimelineProjectsDocumentsMembershipContact Us

AMICO Technology Committee

AMICO Contribution Management System
Import/Export Procedures
6/15/2001

1) Import Procedures

This section outlines the procedures used by the AMICO staff to gather and process all of the data to be distributed annually as part of The AMICO Library.

Unless otherwise noted, scripts are found in /amico/bin/

1.1) Receiving/Validating Images

AMICO member institutions who are contributing works for the coming AMICO Library year are instructed to send, by a fixed deadline, media containing the image and multimedia files for those works. CDs are the preferred media, but other forms of media are acceptable if first approved by the AMICO technical staff.

When media is received by AMICO, the following procedure is followed:

  • An acknowledgement email is sent to the contacts for the member institution
  • The media is physically labeled according to that member’s AMICO member prefix, in the following form: MEMB.YEAR.MEDIA# (ex LACM.2001.01). The date received is also written on the media itself.
  • The media and date received is logged on paper.
  • As early as possible, the files from the media are loaded onto the server into a dir named after the media in /amico/scratch
  • Command for mounting a cd to an empty dir “DIR”:

mount /dev/cdrom DIR

  • The files are then examined to be sure that the files are:

Named correctly: files must be prefaced by the member prefix and not contain any illegal characters

Correct format: files must be TIFFs

  • acknowledgeMedia.pl is run on the files. This script will create a row in the media table for this media and a row in the file table for each tiff file contained on the media and will thumbnail the images. Thumbnails are stored automatically in /home/httpd/html/amico/apw/thumbnails/MEMB.
  • acknowledgeMedia.pl MEMB MEDIANAME SOURCEDIR “comments”

MEMB is the 4-char member prefix

MEDIANAME is the label given to the media as outlined above

SOURCEDIR is the working dir in /amico/scratch where the media is currently stored

“comments” must be in quotes and would contain any special notes regarding that media (ie media is a replacement for an earlier submission, filenames were changed, etc)

Once the acknowledgeMedia script is run, there will be a media log entry for that media in the CMS, under View Files->View Media Log. Members are asked to view these logs to verify that image orientation and quality meet expectations.

  • If metadata creation has been requested by the member, a custom script for that member is made based on the mkmeta.base.pl template script, with the XPU, XRI, and XRS set to the values supplied by the member. The script is then run in the dir for that media on the server:

mkmeta.MEMB.pl > MEDIANAME.mkmeta.log

The log shown above (MEDIANAME.mkmeta.log)will indicate whether the script was successful, then the resulting output.meta.txt file is then given a name appropriate to that media and moved to /amico/work-space/incoming/MEMB and validated through the CMS.

Procedure is repeated for all media received until it is time for export processing.

1.2) Receiving/Validating Multimedia

1.2.1) Member-Submitted Multimedia

Multimedia is received from members in the same form as images, but the procedure is different as there is no program in place to validate or thumbnail multimedia files. When a multimedia file is found on submitted media, that file is played to verify that the file is not truncated or corrupted. The file is then copied to /amico/data/multimedia. When the time for export arrives, all of the multimedia files are checked for valid metadata records and those with valid metadata are moved to /amico/data/totape.

1.2.2) Antenna Audio Files

Antenna Audio files are sent on media directly from Antenna to AMICO. When the media are received,  they are labeled similarly to the other media, but with added notation to indicate that they are Antenna media (ex. MIA_.AA.2001.01). Files are matched to catalog records based on lists provided by Antenna and annotated by the members. Next, the files are sampled, appropriately named, and recorded in the file database and saved in /amico/work-space/incoming/MEMB/audio/. At this point metadata is created by the sampling process populated with data from the sampled file and the Antenna/member list. These files can now be reviewed by members with the Antenna tool in the CMS. This tool allows the member to listen to the sampled file, view the catalog record, and create the related multimedia description. Once members have reviewed all of these elements, they must accept or reject each link.

If approved, a Related Multimedia Group (RMG)is added to the catalog record using the values entered by the member. The status row in the file table for that audio file is changed to ‘Approved’ and the file is moved from /amico/work-space/incoming/MEMB/audio/ to /amico/data/totape.

If rejected, no changes are made to the catalog record and the status row in the file table is changed to ‘Deleted’, and the file is immediately moved to /amico/data/MEMB/DELETED and will no longer appear in the Antenna linking tool.

1.3) Receiving/Validating Data

Data for submissions can be received in one of two ways:

  •  AMICO data text records following the AMICO data specification

or as

  •  AMICOlite tab-delimited text files.

Proper submission procedure for either of these file types is  to upload via the File Submission menu of the CMS either using browser  Browser upload is preferred to ftp because it ensures that we have files in the correct directory.

1.3.1) AMICO data files

Data files, as described in the AMICO data specification on the AMICO website (http://www.amico.org/AMICOlibrary/dataspec.html), can be validated by the member via the View Files or Validate Text files options in the CMS. If AMICO staff choose to validate or revalidate these files, the CMS routines are also used.

The browser calls the following script:

The script pulls tags from the data file, separated by the }~ delimiter. Before doing anything, the script checks whether or not the AID tag matches any existing AID in the database. If there is a match, it is assumed to be a record that is being updated. Data pulled from the data file will overwrite any preexisting data in the database for a given tag in that record. AIDs that do not match any existing AID in the database are assumed to be new data and new rows will be created in the database as the data from the tags is loaded.

Records that are loaded have the following information added to them.

Warning/Error Messages – APD field

Catalogstatus - user and timestamp

Validated files are logged in the file table with the following entries:

filename
datevalidated (timestamp)
size (in bytes)
type (Text in this case)
status

Resubmitted files must be renamed to prevent confusion with database records of previously validated files, even if the previous file did not validate successfully.

If a member is having problems validating a file not attributable to data, the file is examined for problems. Some common problems are missing newline characters, characters from other filesystems (eg. DOS and Mac newlines), and missing or broken delimiters.

Files not yet validated are stored in /amico/work-space/incoming/MEMB/ and validated files are stored in /amico/work-space/saved/MEMB. The file is then revalidated and the member informed via email of the changes made.

Validated records are stored in the database and can subsequently be viewed and edited with the CMS editing tools.

1.3.2) AMICOlite data files

AMICOlite files are tab-delimited text files constructed as described in the AMICOlite specification on the AMICO members website (http://members.amico.org/comm/tech/ AMICOlite.htm). The specification must be followed closely, as field order is very important in these files. Members must inform AMICO staff of submission of AMICOlite files, as these cannot be validated interactively. These files must be run by staff through the parseAMICOlite.pl script to convert tab delimited files to the AMICO Data Specification:

parseAMICOlite.pl FILE > NEWFILE

The new file, which is an AMICO data text file is then examined to be sure that the tags and data have been aligned correctly. The new file is then moved to the member’s incoming dir and validated through the CMS.

1.4) Data Processing

At this point the AMICO editorial and technical staff work together and with the members to correct as many data problems and invalid records as possible before the time arrives for the annual export to distributors.

Editorial tools used in this process include:

otyNormalize.cgi – helps members fix invalid Object Types.

dateNormalize.cgi – helps members index dates.

fieldFormat.pl – removes leading and trailing spaces, tabs, single/double quotes and standardizes capitization for a specified data value.

2) Export Procedures

AMICO Library records are exported from the CMS to send to AMICO Distributors and to release via the AMICO Thumbnail Catalog:

2.1) Image/Multimedia Export

Moving one by one through the dirs in the /amico/scratch tree for media submitted for the current library year (the dir naming scheme contains the year, so this is simple), the following script is run in each dir:

writeTiffHeaders.pl /amico/scratch/MEDIANAME > /amico/data/log/MEMB.writeTiffHeaders.log

This will write AMICO rights and record data to the header of each tiff file and place the resulting file in /amico/data/totape. This script also verifies that the file has a metadata record and is linked to by a valid record. Files that fail this validation test are moved by the script to /amico/data/unlinked. Files that fail this procedure due to factors unrelated to the database will be noted in the log.

Files from /amico/data/totape are written to tape. This can be done at any point in the process of moving through the media dirs but should be done before the combined size of the files in /amico/data/totape exceeds the size of a single tape (less than 70GB at this time on our DLT7000 drive). A blank tape should be loaded and labeled at this point and the density set on the tape drive to ensure maximum size will be used, unless a distributor has requested a specific tape density.

The tape write command is:

cd /amico/data/totape

tar cvvf /dev/nst0 * >> /amico/data/log/TAPENAME.log

Once all of the files in the directory have been written to tape, the files are moved into /amico/data/MEMB, MEMB being the member prefix for any given file. Overwrites should be confirmed in this move to allow for replacement images from members. Once a tape is full, a log sheet should be made for that tape, recording contents, density, and date completed.

These tapes are shipped to distributors as a set for a given library year.

2.2) Data Export

In a file, write an SQL command to get a list of AIDs for the records to be exported. The following example would result in all valid 2001 records for export:

SELECT aid FROM catalog WHERE  aid not like ‘TEST%’ and ALY = ‘2001’ and avv > ‘0’ and aly != ‘0’ and del not like ‘Y’ order by AID

With that command placed in a file called AID2001.sql, run the following command:

sqsh –U amico –I AID2001.sql –o AID2001.txt

AID2001.txt will contain the AIDs for the records to be exported, and should be edited to remove the extraneous SQL output at the beginning and end of the file.

Next run the following script:

exportFullCatMet.pl AID2001.txt > AMICO.2001.full.txt

AMICO.2001.full.txt will contain the full export in AMICO data format, containing both data and metadata.

To export to XML, run:

xml.pl <full text datafile> <list of AIDs>.

The script will create an XML directory with a folder for each member containing all corresponding XML files.  Parsing errors are placed in a file called parsing_errors.txt in the same directory as the XML folder.

3) Data Distribution

AMICO data exports, including the XML export, are placed in /home/export and may be downloaded at any time by distributors by using their individual ftp logins. These distributor accounts are ftp-only and all point to /home/export as their home.

The exports are named to indicate the date of export and will all be available indefinitely to allow complete and accurate reconstruction of the data.

4) Weekly Procedures

4.1) Backups

Weekly backups of the update.amico.org server are automated and happen weekly. A backup tape should be placed in the DLT drive for the weekend and rewound. A cron process drops a database dump in /extra and backs up the following directory trees to tape late Saturday nights:

/extra
/home
/amico/work-space

/amico/data is backed up once for a set of local masters following the completion of a year’s library export.

4.2) Data Export

On Thursdays a text file containing data changes from the prior week is prepared for distributors. The following script is run:

exportChanges.pl MM/DD/YY > /home/export/AMICO.update.YYYYMMDD.txt

where MM/DD/YY is the date when the last update file was created.

The update file will then be available for all distributors via ftp on update.amico.org using their respective ftp logins.

The AMICO public web database (APW) must also be updated at this point. Tar the thumbnails in /home/httpd/html/amico/thumbnails and ftp them to search.amico.org. Then extract this tar file to /home/httpd/html/amico/apw/thumbnails on search.amico.org. Next, run the following set of commands on update.amico.org:

cd /home/export
grep –v DELY AMICO.update.YYYYMMDD.txt > updates.YYYYMMDD.txt
grep DELY AMICO.update.YYYYMMDD.txt > deletions YYYYMMDD.txt

ftp to search.amico.org and upload those two files to /home/updates

login to search.amico.org and run the following commands:

cd /home/httpd/html/amico/apw/search/admin/cmd
importAPWbatch.pl  /home/updates/updates.YYYYMMDD.txt
deleteAPWbatch.pl /home/updates/deletions YYYYMMDD.txt

The APW will now be up to date for the week.


5.0) Development Proposals:

The CMS needs to be picked apart and checked for accuracy at all levels given the statistical inconsistencies we’ve been seeing lately.

Additional enhancements need to be made to the script that validates records:

Date indexing
Check member prefix
Deny overwrites for specified protected fields
Autocorrect duplicate file names, warn of change
More to come

Automate weekly data exports, including APW

Not surprisingly, times required for any routine involving database access are growing noticeably longer as the size of the library grows. Certain scripts involved in import, search and export would benefit from a rewrite in C.

The record linking reports are broken in some cases and, even when working, are unclear as to the nature of a problem with a given record.

Distribution tape writes should be automated.

Many ADP field entries could probably be removed from the database, first saved externally of course. This would cut dump file sizes and improve db performance when editing.

Need to update load scripts/ record editor scripts to purge redundant entries

There is a definite need for a set of tools for simplifying certain large-scale direct database changes we commonly see, like rights updates, preferably accessible through the CMS.

CMS browser issues should be definitively addressed.

An entry should be added to the database to note when a given file has been sent to a distributor, preferably as part of the tape writing automation mentioned above.

(Low priority) a Win32 client for record editing would be very useful.


Last modified on  October 10, 2001