BL hackathon - WW1
Saturday, 29 June 2013 from 10:00 to 16:30
London, United Kingdom
On Saturday 29 June 2013, the British Library will host a workshop to start making some of their material from the Europeana Collections 1914-1918 project publicly available. There will be a public talk in the morning, to introduce the collections and the project, and in the afternoon we will have a chance to work with the material.
The collections are (mostly) not yet published online, but a significant amount of them will be made available on the day prior to release. The project covers about 10,000 items published during and after the war, including a large number of books now out of print and difficult to obtain, as well as wartime ephemera from the UK and elsewhere.
Schedule schedule currently provisional
- 10.00 - doors open
- 10.30 - talk outlining the work done on the Europeana project, followed by a discussion
- 12.00 - break for lunch (not provided)
- 1.00 - afternoon workshop begins
- 4.30 - afternoon workshop closes
We're hoping to give people a first chance to work with the digitised collections. Only a small fraction of these are currently online, and most of it won't be posted until some point in 2014. However, we will have the scans on-site to work with!
Things we have had interest in:
- This is Wikipedia, after all, and these are digitised books! In particular, the collection includes a large number of unit histories; for an example of what can be done with them, have a look at 15th (Imperial Service) Cavalry Brigade, built around digitised material from the India Office
- Image selection
- The books themselves represent a very large and diverse image collection; as well as the text, most have several maps or plates, scanned in high quality and usually well-labelled - and often not published elsewhere. We could look at extracting some of these images and uploading them individually to Commons.
- OCR\Text analysis
- While we don't expect to have a large amount of the OCRed text ready for the workshop, the scans themselves will be available for running through (eg) Tesseract, and we plan to OCR some in advance. There has been some interest in mining these for names or places - perhaps trying to build indexes, or get a sense of geographic coverage?
Any other suggestions appreciated!
The overall planned collection consists of about 10,000 items, of which most are English-language books, pamphlets and journals. There are large amounts of sheet music, manuscript items, and around 1000 non-English published books, with a small collection of photographs and maps. Not all of this will be available on the day - they have not all been digitised yet, and some material has a complex copyright situation that limits its availability.
A detailed list of the collections available is being worked out, but will not be ready for a short while. However, here are some samples of what is likely to be available:
- India Office Records: outlines the material digitised (now all available online through the BL's Digitised Manuscripts site)
- A photograph collection from the India Office Records is now on Commons as the Girdwood Collection
- Canadian Collections: outlines the material to be digitised
Some samples of titles from the main printed collection:
- 07942.a.2 - Charles Herman Senn (d. 1934) : Senn's War Time Cooking Guide; 94pp
- 09083.dd.17 - Sir Henry Mortimer Durand [d. 1924] : The Thirteenth Hussars in the Great War; W. Blackwood & Sons; 392pp; 1921
- 09084.cc.38 - Everard Wyrall, etc : The History of the 62nd [amalgamated with the 49th] - West Riding - Division, 1914-1919 ; 2 vol. John Lane: London, [1924-25.]
- T 35177 - Frederick Arthur Hook (d. 1930) : Merchant Adventurers, 1914-1918; [war records of the P&O, British India, and Associated shipping lines]; A&C Black; 319pp; 1920