Skip to content

Big changes for CC Search beta: updates released today!

About CC, Technology

Today, we’ve released a significant update to our working beta of the CC Search product. We launched the project in February 2017 to provide a new “front door” to the Commons with the ultimate goal to find and index all 1.4 billion+ CC licensed works on the web. Since then, our newly formed tech team – myself, Alden Page, Sophine Clachar, and Steven Bellamy – have been working to move this project toward its next iteration, which I am proud to share today.

More providers, better metadata

search-screenshot

This is a work in progress — it has great new features, and also has a few bugs, which we’re working on as we go (you can leave feedback here or file issues at Github). This iteration of CC Search integrates access to more than 10 million images across 13 content providers. The data was obtained by processing 36 months of web crawl data from the Common Crawl corpus (an open repository of web crawl data maintained by the Common Crawl Foundation).

The full list of providers:

Provider Domain # CC Licensed Works
Animal Diversity Web https://animaldiversity.org/ 14,839
Behance https://www.behance.net/ 5,245,785
Deviantart https://www.deviantart.com/ 206,506
Digitalt Museum https://digitaltmuseum.org/ 88,970
Encyclopedia of Life http://eol.org/ 547,488
Flickr https://www.flickr.com/ 426,214
Flora-On http://flora-on.pt/ 26,498
Geograph UK http://www.geograph.org.uk/ 1,018,560
IHA Holiday Ads http://www.iha.com/ 2,058,272
McCord Museum http://www.musee-mccord.qc.ca/en/ 108,800
The Metropolitan Museum of Art https://www.metmuseum.org/ 96,260
Museums Victoria https://collections.museumvictoria.com.au/ 64,719
Science Museum – UK https://www.sciencemuseum.org.uk/ 14,280

In addition, the new release contains several new features, including AI image tags generated from our collaborator, Clarifai. Clarifai is a best in class image classification software that provides tagging support and visual recognition. Clarifai’s API was integrated in the process-flow as a means to automatically generate tags for the new and existing images. This means that CC search has machine generated tags, user-defined tags, and platform-defined tags that were obtained from the web crawl data. Collectively, these will enhance the user’s search experience and improve the quality of the results. Currently, 10.3 million images have their respective Clarifai tags and the outstanding images will be integrated on an ongoing basis. Thank you to Clarifai for their support.

clarifai

A New Look


gif-searchThe new design allows users to search by category, see popular images, and search more accurately across a wide range of content.

Users can also now share content and create public lists of images without an account using an anonymous authentication scheme. Shares.cc is a new a link shortening system that makes it easy to share cool stuff you find on our platform to social media – users can share both images and lists, no login required. In addition, the new platform provides the ability to filter by provider, license, creator, tag (including those generated by Clarifai), or title.

(Please note: If you made private lists in the previous system, they will not carry over to this release. We’re sorry for any inconvenience this may have caused. If there is a list you would like us to recover, please email us at info@creativecommons.org.)

With gratitude

CC Search is made possible by a number of institutional and individual sponsors. Specifically, we would like to thank Arcadia – a charitable fund of Lisbet Rausing and Peter Baldwin, Mozilla, and the Brin Wojcicki Foundation for their support. With the generous support of our funders, Creative Commons is able to significantly advance its work in pursuit of a more open and sharing world that illuminates the Commons and recognizes the major potential of transformative human knowledge.

Full release notes available here.

 

Posted 24 September 2018

Tags