groovy

Browsing iTunesU without intalling iTunes

groovy, itunes, itunesU, proxy, StreamingMarkupBuilder, XmlSlurper / Wednesday, 13 May, 2009

Updated version of the script

On some pages the location-bar information is available (third-level pages), but 1st and 2nd level pages do not have a location-bar. Even in iTunes it has been removed in favor of a location-bar managed by iTunes. I suppose this is done in order to make use of a cached version of the page and speed up loading. Having no location-bar at 2nd level pages is an awkward design choice from Apple. Therefore location-bar has to be managed by the script and 3th level pages need also to be re-adapted to be displayed correctly.

Fixed version can be tested here: http://1.latest.unil-podcast.appspot.com/index.groovy?handle=[itunesU url]

the new source files are available here:
complete source code at github http://github.com/bfritscher/www-itunesu

index_fixed.groovy – new version with navigation fixed
proxy.groovy – proxy class because iTunes checks user-agent to link to some video files behind a redirect.

The problem

More and more universities publish courses as podcasts on iTunesU and neglect to give a direct access to their content. This hinders sharing of knowledge, because of the obligatory Appleâ€™s iTunes application needed to access iTunesU.

The solution

Emulating the iTunes client: the script can be tested at

http://www2.unil.ch/itunesu/index.groovy?handle=[itunesU url]

It only needs the url which would open in iTunes and converts it to a displayable webpage.

For example for mit.edu: http://deimos.apple.com/WebObjects/Core.woa/Browsev2/mit.edu

http://www2.unil.ch/itunesu/index.groovy?handle=http://deimos.apple.com/WebObjects/Core.woa/Browsev2/mit.edu

Source download

index.groovy version 1 iTunesU xml to html groovy script

Why groovy?

Having experience with Java and Grails development using Groovy especially due to its powerful closures, xml navigation and html building was a obvious choice. Also this project started as a Google AppEngine prototype, but hit the no-user-agent change limitation.

Limitations

Of the three types of layout used in iTunes U only the two column standard university page view and detailed course view (track list) are supported. The iTunes main site listing and main page three column layout is not supported.

Solution explained

iTunes uses a custom xml page to receive and render its content. This file can be access when switching the user-agent to â€œiTunesâ€.
Switching user agent is easy under Firefox with the user-agent switcher add-on
Part of the xml file

<FontStyleSet setName="basic22" normalStyle="helvetica22"
	linkStyle="link" linkPressedStyle="linkPressed" linkRolloverStyle="linkRollover" />
<FontStyleSet setName="normal40" normalStyle="lucida40"
	linkStyle="link" linkPressedStyle="linkPressed" linkRolloverStyle="linkRollover" />
<FontStyleSet setName="normal40Bold" normalStyle="lucida40Bold"
	linkStyle="link" linkPressedStyle="linkPressed" linkRolloverStyle="linkRollover" />
<MatrixView rightInset="0" bottomInset="0" leftInset="0"
	topInset="0" rowFormat="100%,*" viewName="Form">
	<View rightInset="0" bottomInset="0" leftInset="0" topInset="0"
		minWidth="948" minHeight="400">
		<VBoxView>
			<!-- Layer 1 = Banner Image -->
			<!-- BEGIN ReplaceableImageXMLView -->
			<View alt="" height="400" width="600">
				<PictureView proportional="1" verticalAlignment="top"
					shouldBeVisible="true" shadowHeight="0"
					alt="MIT offers a selection of video and audio from several of our groundbreaking projects: MIT OpenCourseWare, MIT World, MIT TechTV, and Visualizing Cultures."
					shadowWidth="0" height="400" width="600"
					url="http://deimos3.apple.com/indigo//2c/c5/f3/93/2cc5f393018ecb9b1a3991063e10376b093cd1918f43b3847af32a8a9ba35103-1502158938.jpg"
					addShadowSizes="false" />
			</View>
			<!-- END ReplaceableImageXMLView -->
		</VBoxView>
		<VBoxView>
			<!-- Layer 2 = link box area -->
			<MatrixView leftInset="0" rightInset="0" topInset="25"
				columnFormat="65%,25,35%">
				<!--
					This ensures we leave enough room for the banner image in the 3
					column view
				-->
				<View minHeight="400" />
				<View />
				<VBoxView rightInset="25" bottomInset="0" leftInset="0"
					topInset="0">
					<VBoxView>
						<VBoxView leftInset="0" rightInset="0" topInset="0"
							bottomInset="0">
							<!-- BEGIN description box-->
							<!-- END description box-->
							<!-- START ListBoxStack -->
							<View>
								<FontStyle name="outlineTitleFontStyle" color="ffffff" />
								<FontStyle name="outlineTextFontStyle" color="ffffff" />
								<!-- BEGIN RoundedBox -->
								<View rightInset="0" bottomInset="0" leftInset="0"
									topInset="0">
									<Test value="7.0.0" comparison="greater or equal"
										property="iTunes version">
										<!-- BEGIN MaskedView -->
										<View rightInset="0" bottomInset="0" leftInset="0"
											topInset="0">
											<PictureButtonView rightInset="0" topInset="0"
												bottomInset="0" leftInset="0" alt="" color="rgba(0,0,0,0.75)"
												mask="http://deimos3.apple.com/rsrc/Images/masks/rounded_box.png"
												cap="4" />

										</View>
									</Test>
								</View>
							</View>
						</VBoxView>
					</VBoxView>
				</VBoxView>
			</MatrixView>
		</VBoxView>
	</View>
</MatrixView>

The xml file is quite verbose and it takes some time to identify the interesting content.
The hard part of this project was to identify the right way to identify and select the wanted data from the xml tree.

Following are code snippets of the source file with their most important or interesting aspects explained.

First task of the script is to retrieve the xml page by settings the right user-agent [3] and then use the powerful groovy xml parser api [13] to have an easily navigable xml object.

def xml
def conn = new URL(handle).openConnection()
conn.setRequestProperty ( "User-Agent", "iTunes/8.1" )
def putBackTogether = new StringBuffer()
def r = new InputStreamReader ( conn.getInputStream(), "UTF-8" )
char [  ]  cb = new char [ 2048 ]
int amtRead = r.read ( cb )
while  ( amtRead  >  0 )   {
	putBackTogether.append ( cb, 0, amtRead )
	amtRead = r.read ( cb )
}
xml = putBackTogether.toString()
xml = new XmlSlurper().parseText(xml)

To work around default namespace bugs when recreating a xml string from an xml object the Markupbuilder has to be set to the apple namespace[6].

import groovy.xml.StreamingMarkupBuilder
def getXml(item){
	def outputBuilder = new StreamingMarkupBuilder()
	outputBuilder.encoding = "UTF-8"
	String result = outputBuilder.bind{
		mkp.declareNamespace("":"http://www.apple.com/itms/")
		mkp.yield item
	}
}

Example of the powerful way to generate html code with the groovy MarkupBuilder.

import groovy.xml.StreamingMarkupBuilder
html.html(xmlns:"http://www.w3.org/1999/xhtml",lang:"en",'xml:lang':"en") {
	head {
		title xml.Path.PathElement.collect{it.'@displayName'}.join(" > ")
		meta('http-equiv':"X-UA-Compatible", content:"IE=8")
		meta('http-equiv':"content-type",content:"application/xhtml+xml; charset=UTF-8")
		//...

Identifying the type of layout the xml file is meant to build is done by looking at a reflection attribute from the first image found in the file.

def topimg = xml.'**'.find{ it.name() == "PictureView"}
def reflect = topimg?.'@reflection'==1 //true = course page

Getting page colors from FontStyle tags

def titleFontStyle = xml.'**'.find{it.name() == "FontStyle" && it.'@name' == "normalTitleFontStyle"}?.'@color'
if(titleFontStyle == null){
	titleFontStyle = xml.'**'.find{it.name() == "FontStyle" && it.'@name' == "outlineTitleFontStyle"}?.'@color'
}

Some more examples of the groovy MarkupBuilder: especially interesting is the mkp.yieldUnescaped [10] method to allow usage of html otherwise automatically converted symbols like &.

body(style:"background: #${xml.'*'.find{ it.'@backColor' != ''}.'@backColor'}"){
	div(class:"container"){
		div(class:"span-24 last nav"){
			def list = xml.Path.PathElement
			def last = list.size()-1
			list.eachWithIndex{ pe,i ->
				if(i == 0){
					a(class:"first",href:"index.groovy?handle=${pe.text().trim()}"){
						span(){
							mkp.yieldUnescaped "&nbsp;"
						}
					}
				}else{
					a(class:"${i == last ? 'last' : ''}",href:"index.groovy?handle=${pe.text().trim()}", pe.'@displayName')
				}
			}
		}
	}
}

Being able to identify the right element is sometimes a bit tricky, especially that not all pages seem to follow the same structure rules.

div(class:"description"){
	xml.ScrollView.View.MatrixView.View?.'**'.find{it?.name()== "VBoxView" && it?.parent()?.name()== "VBoxView" && it?.parent()?.parent()?.name()== "VBoxView" && it?.parent()?.parent()?.parent()?.name()== "VBoxView"}?.'*'.findAll{ it?.name() == "TextView"}.each{ text -> p text }
	mkp.yieldUnescaped "&nbsp;"
}

Usage of the xmlwise library[15] to parse the standard plist part of the xml containing the track list .

import xmlwise.*
div(class:"span-24 last tracklist"){
	table(cellspacing:"1", class:"tablesorter"){
		thead{
			tr{
				th ""
				th "Name"
				th "Time"
				th "Artist"
				th "Release Date"
				th ""
			}
		}
		tbody{
			def plist = Plist.fromXml(getXml(xml.TrackList.plist))
			plist.items.eachWithIndex{ track, i ->
				int seconds = (track["duration"] /1000)
				int minutes = seconds % 3600
				int hours = (seconds - minutes) / 3600
				seconds = minutes % 60
				minutes = (minutes - seconds) / 60
				
				tr(class:i % 2 == 0 ? "even" : "odd"){
					td track["rank"]
					td track["songName"]
					td ((hours > 0 ? String.format("%02d",hours)  + ":" : "")+ String.format("%02d",minutes) + ":" + String.format("%02d",seconds))
					td track["artistName"]
					td track["releaseDate"][0..9]
					td{
						a(href:track["previewURL"],rel:"lightbox[set 480 380]",title:track["songName"], "view")
						mkp.yieldUnescaped("&nbsp;")
						a(href:track["previewURL"],title:track["songName"], "download")
					}
				}
			}
		}
	}
}

Resources used for html design

http://www.blueprintcss.org/
http://jquery.com/ (thankfully, compatible with mootools)
http://www.digitalia.be/software/reflectionjs-for-jquery
http://mootools.net/ (needed for mediabox advanced)
http://iaian7.com/webcode/mediaboxAdvanced
http://tablesorter.com/docs/
http://comparenetworks.com/developers/jqueryplugins/jbreadcrumb.html (only used images)

Browsing iTunesU without intalling iTunes Read More »

Bonita Studio tutoriel vidéo

Ressources supplémentaires