Page 1 of 1

Considering other backends for IMSLP

Posted: Sun Aug 16, 2009 12:15 am
by Mazin
IMSLP has been using MediaWiki to serve its 30,000+ scores and it works. Feldmahler has spent a huge amount of time getting it that way, and thanks to him, we have a website to browse and download sheet music. However, it's apparent that we're pushing the bounds of what MediaWiki was meant for and designed for. Wikimedia is great for collaborative editing of (text-based) documents, but using it as a digital archive/library is pushing it, and it's evident by our custom-designed upload forms and other odd extensions that we've generally hacked on to MediaWiki. Not everything is necessarily semantically complete, and it's inhibiting things like the search functionality and offering an API.

So, of course, we should take a look at some of the other systems that libraries and archives are currently using. We don't have to adopt any of them, but we should look at 1) how they work and are organized, 2) what they offer that we don't, 3) what we have that they don't, and 4) the big question of "should we use it?". This will give us ideas for expanding or fixing our existing system, and ideas for how we want the technological platform to look like in the future.

So, my two suggestions to look at are:

Fedora
Fedora is a general-purpose, open-source digital object repository system.
Features
  • Store all types of content and its metadata
  • Scale to millions of objects
  • Access data via Web APIs (REST/SOAP)
  • Provide RDF search (SPARQL)
  • Rebuilder Utility (for disaster recovery and data migration)
An example of Fedora used specifically for sheet music is IN Harmony, a database of mostly-PD music available to the public, created as a partnership between four Indiana institutions (one of which is my school, Indiana University Bloomington).

Variations
Variations is/was a project to create a digital music library by Indiana University, and their Cook Music Library has been using it since 1996. As a research project, it's gone through a couple stages, and at least some software is available open-source (see http://variations.sourceforge.net/). Unfortunately, the main repository itself is closed off to people outside IU (I'll see what I can raid from it someday), so it's kind of hard to demo it. See the following links:

Re: Considering other backends for IMSLP

Posted: Sun Aug 16, 2009 1:06 am
by KGill
The problem with these is that they're not wikis. I fully concede (how could I not?) that there are many people here who have been here waaay longer than me and know the ins and outs much more intimately, but as far as I can see, the IMSLP is fully, wholly, and utterly collaborative, and that means...a wiki. It is inherently so- no other system could possibly work anywhere close to the way it has. Now, if we tried to create our own version of MediaWiki itself...

And the larger problem is that it's a huge deal just to make upgrades from one version of MW to the next- hard enough when it's already set up for MW. Imagine the truly herculean task of converting all of our (currently) 17351 work pages into an alien system! Not to mention all the other pages! First off, there's the issue that MW uses its own markup language. That could conceivably be gotten around with some sort of bot, but I wouldn't bet on perfection. And...so on and so forth, you know what all the issues would be.
One of the other things I think sets IMSLP apart from other online music libraries is that all the typos, inaccuracies, etc. are cleaned up- everything is standardized and easy to read, truly well-organized. I have never seen another music site anywhere close to being as easy to use, and I think that if we changed the system to pretty much anything else, we would lose that (to an extent) and degenerate kind of quickly. I looked at your examples, and have been looking at others such as Sibley, and none of them were nearly as elegant as what we have now. Especially Variations- who wants to have to DL extra software just to access a certain website?
I understand the problems that have always been associated with MW, but is it really all that drastic? If a search function were written from scratch, wouldn't that stave off a lot of objections? We've accomplished so much by now with what we have (specifically referring to the organization of the site), it would need to be redone, and that's...a lot of work.
In short, would any change be worth it in the slightest?

Re: Considering other backends for IMSLP

Posted: Sun Aug 16, 2009 9:09 am
by Yagan Kiely
The content is collaborative right? The content of IMSLP is the scores, the scores have nothing to do with a wiki on there own. Yes there are other contents, but if a way can be found to allow collaborative additions of these, then I see no need for a wiki. Wiki isn't the definition of collaboration, there can be other ways to collaborate, and if those a found, what is the worry?
In short, would any change be worth it in the slightest?
Would probably depend on how IMSLPs future holds. What of a (Recorded) Music repository as well as 100,000+ scores in Petrucci? If these pose challenges to the wiki as a wiki, it may well be worth the effort.

Re: Considering other backends for IMSLP

Posted: Sun Aug 16, 2009 9:31 am
by Leonard Vertighel
Wikimedia Commons hosts currently nearly 5 million media files, so in terms of scalability I don't think we are going to hit the limitations of Mediawiki anytime soon.

The search functionality will certainly have to be improved. This needs to be linked to an improvement of the categorization system. The way I see it, we should look specifically at the cataloging systems and search functionalities of other sites, figure out what is useful and why, and use this info to devise a system of our own. There is some partial discussion going on about this, but it looks like it's going to take a while before a complete proposal for a cataloging system will be developed.

Re: Considering other backends for IMSLP

Posted: Sun Aug 16, 2009 10:18 am
by steltz
I don't know the computing end of these things, but in terms of searchability, it seems to me that any system will be dependent on what users insert into the categories.

An example: my current bugbear, that I need to ask about in the correct forum (not this one), is that songs for 1 singer and piano are currently being uploaded, in a fair quantity, as "duets". While part of me sees that this is a duet, I also know that as a searchable category, it is better to keep duets as instrumental, and voice + piano as lieder, chanson, art song, whatever. The reason is that singers won't want to search through gallons of violin duets to find songs, and violinists won't want to wade through songs to get to duets that include them.

I don't see that changing from a wiki to something else will change this, as it is dependent on the uploader.

There may well be other reasons to go the non-wiki route, but my gut feeling is that the searchability function has to be solved another way.

Re: Considering other backends for IMSLP

Posted: Sun Aug 16, 2009 2:43 pm
by imslp
My response to this is pretty much what I was saying before. The current main obstacle for the IMSLP is the lack of a decent categorization system. This has slowly become more and more urgent over the past few months, and I do wish that there is some admin who is willing to take up the task of organizing a team to work seriously on this. Like I said before, the main problem is not really the programming side, which is rather flexible, but actually getting a categorization system that is useful in the first place.

If we actually get a decent categorization system, I don't think the current IMSLP structure is in any way inferior in use to a database-based library. On the other hand, even a database-based library without a decent categorization system won't be much of an improvement over IMSLP.

I thank Mazin for bringing this up. However, I also contend that:

1) I have never regretted choosing MediaWiki, even though I knew from the start that it is not completely suitable for a library setting. The reason is collaboration: IMSLP is not only collaborative for score submissions, but also for translations, site design, etc etc. Essentially, almost every aspect of IMSLP (except for the underlying software, which is not collaborative /yet/) is fully collaborative. This, I believe, gives IMSLP the edge it needs over other more "dictative" solutions such as a full database-based library. The cost of accessibility in a database-based library is its growth potential, because the bottleneck becomes both the programmer and the programmer-user communication channel. In MW, if you have an idea, you do it yourself.

2) It is not too hard to transfer the information on IMSLP to a database-based library, but much non-submission information (piece history, etc) may be lost. If I wanted, I could transfer the entire IMSLP to a database-based solution in a few days (I already have the programming to do this, as a result of something else). Therefore I don't see a need to hurry and transfer IMSLP to anywhere else.

3) The extensions are actually surprisingly well suited to MediaWiki. This may be in part because of the very nice coding standards of MW (unmatched by any other PHP project I know), but in any case, the IMSLP extension is not really a "hack". If you have seen the IMSLP code, you will agree with me. It may also be due to my own programming style of using as much existing code as possible.

To answer KGill's suggestion about IMSLP's version of Mediawiki: IMSLP already has its own version of MW :-) Even besides the large chunk of code that is the IMSLP MediaWiki Extensions (IMSLP MWE), IMSLP does *not* run on vanilla MW; there are many patches to the code, a few for feature extensions (category intersections and file caching), but mostly for obscure bugs that the MW team probably won't feel the urge to fix until 2020. IMSLP's configuration of MW is highly customized, hence why many bugs surface in MW.

My 3 cents...

Re: Considering other backends for IMSLP

Posted: Wed Aug 19, 2009 1:40 am
by Mazin
KGill wrote:The problem with these is that they're not wikis. I fully concede (how could I not?) that there are many people here who have been here waaay longer than me and know the ins and outs much more intimately, but as far as I can see, the IMSLP is fully, wholly, and utterly collaborative, and that means...a wiki. It is inherently so- no other system could possibly work anywhere close to the way it has. Now, if we tried to create our own version of MediaWiki itself...
And what about it requires a wiki specifically? We have a smattering of guidelines and such that make sense as a wiki, but really, nothing else does. It's not as if we're collaboratively editing scores themselves (some kind of Lilypond-based wiki would be quite odd), so it's not too far from a straight database with liberal permissions.
KGill wrote:Imagine the truly herculean task of converting all of our (currently) 17351 work pages into an alien system! Not to mention all the other pages! First off, there's the issue that MW uses its own markup language. That could conceivably be gotten around with some sort of bot, but I wouldn't bet on perfection. And...so on and so forth, you know what all the issues would be.
I was worried that the way our data is being stored isn't flexible enough. MW markup was always meant for human consumption, while a database or other "well-defined" data system would have its own pros and cons. For example, how do I do queries like:

find scores where composer="Beethoven" and publisher="Peters" and rating>7.0
find scores where scannedby="Sibley Music Library" and instrumentation="Piano"

... you get my idea.
imslp wrote:However, I also contend that:
1) Thank you,
2) Thank you, and
3) Thank you, I apologize, since it is a fact that I haven't seen the code itself. :)

Search system is lacking in a few ways, some trivial, some probably unresolvable.

Re: Considering other backends for IMSLP

Posted: Wed Aug 19, 2009 2:36 am
by KGill
Mazin wrote:And what about it requires a wiki specifically? We have a smattering of guidelines and such that make sense as a wiki, but really, nothing else does. It's not as if we're collaboratively editing scores themselves (some kind of Lilypond-based wiki would be quite odd), so it's not too far from a straight database with liberal permissions.
Feldmahler wrote:IMSLP is not only collaborative for score submissions, but also for translations, site design, etc etc. Essentially, almost every aspect of IMSLP (except for the underlying software, which is not collaborative /yet/) is fully collaborative. This, I believe, gives IMSLP the edge it needs over other more "dictative" solutions such as a full database-based library. The cost of accessibility in a database-based library is its growth potential, because the bottleneck becomes both the programmer and the programmer-user communication channel. In MW, if you have an idea, you do it yourself.
While it's not as if MW is the only possible program that could ever do this, it's the big one that's already there, and designed for this purpose. It's the wiki principle that sets us apart from other libraries, online and offline alike. Collaboration of this nature produces much more accuracy, detail, and quantity- as we can see demonstrated by the fact that WP has something like three times the amount of articles as Britannica (and a lot of them are probably longer in WP than their counterparts, as well as being able to be edited by anyone who finds some sort of new insight, or an error- exact same as here, in the description and classification of and extra information about works).
And after all, if we're to continue having 'liberal permissions' even if we did switch systems, wouldn't that be the same as if we continued with MW (already designed for that sort of thing) with a database-like extension?

Re: Considering other backends for IMSLP

Posted: Thu Aug 20, 2009 12:20 am
by Mazin
We aren't Wikipedia. I think comparisons as such are frivolous. In fact, here's another collaborative database that is not a wiki: MusicBrainz, which I've probably had more experience with than IMSLP, actually.

Again, it might be easy for us to add data, but it is easy for us to retrieve?

Re: Considering other backends for IMSLP

Posted: Thu Aug 20, 2009 5:08 pm
by KGill
I don't see how such a comparison is necessarily frivolous; my point is not that our non-PDF submissions are the main focus, my point is that we expand at an enormous rate, which means not just more works but more detail in each work page. So yes, in that regard, we are very similar to Wikipedia (except much smaller).
As for retrieving data, there is a large discussion going on at the moment about how to improve our categorization system.