Web 2.0


Very busy period on both my main work projects: the Gaia mission and the Vista public surveys.

For the Gaia project we have been experimenting a bit with Hbase and Hadoop to deal with the bulk of the observations. Unfortunately we don’t really have the right hardware to carry out some decent stress-tests, but we should still be able to get a rough picture to decide what to do next. We have quite a big hardware purchase coming up and we need to get it right because the budget is tight and the time before launch (Feb/Mar 2012) is closing in. Hopefully by the end of next week we should be able to complete a few test run and compare the results with similar test where all the data was handle by an Oracle 10g database.

For the Vista project I really need to finalize the relational model for the quality control database schema. The main structure is there, but unfortunately some important details of the processing software are not well defined yet so I am struggling a little bit. The implementation uses Hibernate for the ORM and a custom-made framework to configure what information need to be extracted from the astronomical images (FITS files). The design is quite sound although, if I had an infinite amount of time and patience, there are a few loose ends that I would like to improve.

As a side-project (given that I haven’t got enough on my plate) I’ve been playing with some very popular Web 2.0 apps in an unconventional way. I don’t particularly find them that useful or interesting for their original purpose, but I think I can get something useful out of them. And, if you are curious, no, it has nothing to do with useless flashy crap and bodged Apple-wanna-be cover-flow heavy-weight rubbish. Oops, I’m about to get into rant-mode… but it is really too late for this now. I simply get pissed off when I hear people waffle about Web 2.0 apps and then, when you really look at it, all they have done is to display a bunch of images in a fancy and rather user-unfriendly way. Ok, enough ranting for today, I’ve finished my wee dram and it is about time to get some sleep !

One of the daily rituals of every astronomer is to skim through the arxiv for the latest pre-prints. Usually you get this habit when you are in grad school but then, as time goes by, it gets harder and harder to find some time to do it. It is not really a matter of procratinating it is just that you have more urgent things on your list requiring some attention. Actually, the daily ritual becomes the “oh-crap”-cry when you open you emal inbox and find out that half of your nice (and preciously rare) british sunny morning will go in fixing some boring thing that you can’t really ignore. (more…)

Spreadsheets are probably very popular in the business world but not in science. Astronomers love to write their own little programs or scripts to perform the calculations they need, be it for a sophisticated simulation code or for daily data analysis tasks. The point is that spreadsheets are not really a convenient way of analysing data and perform calculations if not for smallish data samples and basic computations. So why am I writing about spreadsheets then ? Well, because I would never thought they could make my life so much easier…

A part of my job consist in taking care of the data reduction and quality control monitoring for the obsevations obtained with WFCAM at the UK InfraRed Telescope (UKIRT), Hawaii. Most of the observations are taken for the UKIRT InfraRed Deep Sky Survey (UKIDSS) which represent the state-of-the-art for the field or, “the next generation near-infrared sky survey, the successor to 2MASS“, as it describes itself. I will probably come back to this project in other posts, so let’s leave the details for another time (just have a look to the linked web sites if you are really curious).
Usually we get the data on LTO tapes via FedEx every other week or so. First task for me is to make sure we are getting the data for all the nights we are supposed to. Next steps consist in validate, process and perform some quality control validation on each single night. The whole thing has been running quite smoothly for a couple of years now and we’ve dealt with a few tens of Terabytes of data.
To keep things running I need to interact quite frequently with Brad, the guy that is taking care of the tapes on the other side of the world. I need to know from him for which nights I should expect data, when the tapes were shipped and which detectors they contain for each night. You can easily get fed up pretty soon by managing all this via email, and that’s probably not surprising either. A few weeks ago, Brad came up with a very smart idea: he prepared a Google spreadsheet to keep track of all the relevant information and “invited” me to collaborate on that document. In this way the spreadsheet is visible from my Google docs home, I can see when was the last time Brad made some changes and check them out. I prepared another spreadsheet to keep track of what I am finding on the tapes and to mark nights for which I did not receive all the data and, of course, I enabled the collaboration for Brad. In this way we can keep ourselved both updated on the status of the data transfers in a clean, effective and spam-free way.
Thanks Google!