surviving wikipedia blackout

January 18th, 2012 § 3 comments § permalink

Wikipedia blackout - January 18, 2012

Image by captsolo via Flickr

a couple of weeks ago i downloaded large part of wikipedia into my chrome. it felt a bit excessive experiment at that point, but, today, as we are experiencing wikipedia blackout, it actually came in extremely handy. feels weird to live in a world where having a local copy of a website is the only way to do business. feels like nineties all over again. fuck you SOPA.

Enhanced by Zemanta

the world in xhtml/css

June 8th, 2011 § Comments Off on the world in xhtml/css § permalink

World map showing countries which have adopted...

Image via Wikipedia

Every now and then I have to code something to see if i still remember. also lately i’ve become very passionate about infographics and tools that make it easy to visualize large datasets. Gapminder‘s bubbles are great, google maps can be great, but i feel they are also to heavy for some use cases.

incidentally I was browsing trough an old encyclopedia and saw a great simplified map of the world assembled from squares and rectangles in place of countries. so i thought i’d try coding a world map in pure xhtml/css, using colored div elements. it’s pretty easy actually. check out lightweight map of the world.

 

i took data about central points, width and length of all countries from CIA world factbook, calculated screen pixels from coordinates, and consider length correction towards the poles. the output is not w3c valid yet, but it’s not far from it 🙂 if anyone is interested in the script leave the comment or ping me…

i was pretty disappointed by wikipedia and freebase; they are both non-comprehensive, which is a serious problem for visualizations.

Enhanced by Zemanta

2 simple tweaks that would make email useful again

August 4th, 2010 § 3 comments § permalink

Major telegraph lines in 1891
Image via Wikipedia

It’s been ages since I last blogged here, mainly because I write too many emails these days. I actually wrote IMAP client to analyse this phenomenon, and it turned out I process something like 1000 emails per week. By process I mean read/write incoming/outgoing. I’m sure most of people doing business online these days are even worse-off.

So being a startup product guy, I was thinking what’s wrong with it and how to make it useful again.

I realized that there is only one thing missing in the whole email protocol. One simple concept, that has been around since human beings started communicating – ability to flag the messages with level importance. Sure the ‘urgent’ flag exists, but I believe it is not used because of a design mistake.

Think old-fashioned mail: there we have a three-part structure of the service:

  1. Sender can choose to send the package as ‘normal’ or as ‘priority’, depending on how important it is for him that it is delivered timely. Yes this was due to logistical limitations of the medium, but it also protected receivers from getting overwhelmed.
  2. At the same time, the receiver doesn’t have to pick the mail up immediately.
  3. And for truly urgent things we have telegrams

These ensure that all possible situations are covered – ability to send, control of own time, emergency situations.

Now, in digital world we can flag messages as ‘urgent’, and the only people really using this are PR spammers. So what went wrong? At the same time we completely dropped the (2), with an explanation that the receiver can choose to read at will. I believe this should be handled appropriately based on the social relationships.

So here are the two things email protocol lacks:

  • Social flagging: If my wife sends me email, I want to know about it immediately, and I’m ok if I read everything else only every hour or so. Right now, because of the way email works, I can only choose to see everything all the time. It’s great that the web can be real-time, but if it is all-time it starts to cause serious productivity problems.
  • Different urgent: Sometimes my wife will send me message she knows is not urgent and she wouldn’t want to bother me with it while at work. Because of the way email works she can only choose to send it now or wait and remember to send it later. The founding fathers of e-mail screwed up severely – in a real-time medium the internet today is we need the NOT-urgent flag. Most email programs don’t make it easy to do delayed sending, and it wouldn’t really solve this issue anyway, because I might be interested in reading non-urgent email during my lunch break.

We need email client that understands standard emails and ships them once per hour and priority that is delivered immediately. We need it on both ends – the senders and the readers. And I should be able to state mail from whom I want to see as soon as it arrives, and which should be delivered every hour.

Enhanced by Zemanta

my networks…

February 23rd, 2010 § Comments Off on my networks… § permalink

Dvorak vs. Qwerty performance test

January 23rd, 2010 § 5 comments § permalink

Yesterday, @tadej sent me an article that called Dvorak keyboard layout a myth, an urban legend, a lie made up to retain funding for Dvoraks’ research.

I have been typing dvorak for two years now, and know only one other person crazy enough to do it (he even has both layouts printed on the keys). I’ve never really been a vocal proponent of the layout – it took me roughly two months to learn, it doesn’t seem faster, I’ve developed some new typical typing mistakes. It does however feel a bit more ergonomic. Definitely not enough to bother and I’ve been actively discouraging people from switching, but since I already know how to use it I wouldn’t go back to Qwerty.

So this article was a very interesting read. I can buy the theory that the whole story is a scientific fabrication, but what’s with this feeling of comfort I have is also a fact. I decided to test it statistically – design simple model of typing, count the finger movement overhead for both layouts and let the number speak for themselves.

It took me a couple of hours, so it wasn’t really hard. It also isn’t very detailed – I tried to capture the main points that are used whenever keyboard layout efficiency is being discussed, and I’m totally open for corrections / suggestions / …

The model consists of following concepts:

  • the key: arranged in 4 rows and 12 columns, just like you’d find them on any PC keyboard
  • three hands: one with 4 fingers for each half of the keyboard and the thumbs as separate ‘hand’ for pressing space
  • the finger: each finger has assigned 3-6 keys it can press at any point
  • simplified text: consisting of words and spaces only, stripped of non-alphabet characters. Also no caps.

The rules for counting overhead are:

  • always look at pairs of: previous key – current key
  • if we are at the beginning of the word, use ‘space’ as previous key
  • if we switched hands movement overhead is 1
  • if we switched finger on same hand, movement overhead is 1.5
  • if we pressed the key with same finger as previous one, the overhead is the vector-distance between the keys

So, for example, if I type ‘oh’ on dvorak layout, I hit ‘o’ with my left ring-finger, then hit ‘h’ with my left index-finger, making it a very simple word that ‘costs’ 2 moves.

If I write ‘oh’ on qwerty, I’d move my right ring-finger up, and then right index-finger to the left, accounting for 1.5 + 1 moves.

Yes, these rules are somewhat arbitrary, but the idea is to follow the assumptions:

  • it easy to hit the first key
  • it’s a bit harder to hit the second one with same hand while retracting the first one
  • it’s hardest to hit the second key with same finger, increasing with the distance the finger has to travel.

For instance, typing ‘ny’ in querty is really hard, because the right index-finger has to do some funky acrobatics. Try hitting “qz” on querty now… 😉

Now, I’m sure you’re all curious about the results already.

I tested the layouts on that very article from the beginning. The article had 35304 characters (36125 with commas and dots included) in 5865 words, 1610 of them distinct. Here’s bird’s eye view of the performance of the two layouts:

dvorak qwerty
strokes needed (alphabet only) 32927 33560
strokes needed (commas and dots) 33615 34340
better at words 2375 1301
better at distinct words 654 529

If each stroke was worth 10ms, the dvorak layout would win by 1 minute in a 1-hour typing match. Dismissible?

So much for the most important metric – the two layouts seem to be of roughly same efficiency. It also seems they are similarly efficient across the distinct words. However, if you notice that the total number of words the layouts excel at differs noticeably,we can hypothesize that dvorak is more efficient at more frequent words.

I’ve calculated the difference between layouts’ performances for each distinct word in the article, and the number of times each word repeated. The product of these two indicators is an interesting ‘score’, indicating the impact the winning layout had on that particular word. Here are top-30 lists:

dvorak qwerty
word length gain frequency score word length gain frequency score
of 2 0.5 222 111 and 3 0.5 91 45.5
to 2 0.5 167 83.5 it 2 1 40 40
for 3 1 71 71 typists 7 1 37 37
that 4 0.5 128 64 typing 6 1 28 28
in 2 0.5 121 60.5 evidence 8 2 13 26
keyboard 8 1 54 54 study 5 1 25 25
but 3 2.5 19 47.5 which 5 2 12 24
not 3 1 41 41 since 5 2 12 24
example 7 2 20 40 dependence 10 2 9 18
was 3 1 34 34 can 3 1 17 17
qwerty 6 0.5 57 28.5 with 4 0.5 34 17
dvorak 6 0.5 52 26 cincinnati 10 3 5 15
by 2 1.5 17 25.5 academic 8 3 5 15
these 5 1 25 25 machine 7 2 7 14
only 4 1.5 16 24 choice 6 2 7 14
would 5 1 22 22 article 7 1.5 9 13.5
on 2 0.5 44 22 luck 4 1.5 9 13.5
more 4 1 21 21 scientific 10 3.5 3 10.5
published 9 4 5 20 such 4 1.5 7 10.5
it 2 0.5 40 20 mcgurrin 8 1.5 7 10.5
minute 6 1.5 13 19.5 success 7 1.5 7 10.5
were 4 0.5 37 18.5 conducted 9 2.5 4 10
we 2 0.5 36 18 switch 6 2 5 10
as 2 0.5 34 17 standard 8 1 10 10
keyboards 9 1.5 11 16.5 studies 7 1 10 10
results 7 1.5 11 16.5 chance 6 3 3 9
found 5 2 8 16 lockin 6 1.5 6 9
story 5 1 16 16 just 4 1 9 9
although 8 2.5 6 15 so 2 0.5 18 9
group 5 1.5 9 13.5 speed 5 0.5 17 8.5

This table gives us clear insight that dvorak layout performed better at often-used shorter words. Let’s compare graphs of frequency X gain for both of them:

Dvorak:

Dvorak frequency

Qwerty:

Qwerty frequencies

We can see what is going on – while the majority of words behave the roughly the same, dvorak wins over most of the frequent ones. Overall averages were:

dvorak qwerty document
avg repeats of a word 3.63 2.46 3.64
avg length of a word 7.03 7.47 6.85
avg gain over the other layout 1 1.14
avg score 6.52 5.41
avg score with dots and commas 10.78 7.21

The average score was calculated as an average of typing improvements for all words where the layout was superior. It is very interesting however that in the end, both layouts level out. Interesting enough to try it with another text, this time shorter and more mundane – an email to a friend. Here are the results:

dvorak qwerty document
characters / strokes (alphabet only) 3162 3278 3518
strokes (with commas and dots) 3335 3403 3658
better at words 260 116 632
better at distinct words 136 80 308
avg repeats of a word 1.91 1.45 2.05
avg length of a word 5.93 6.35 5.72
avg gain 0.96 1.05
avg score 2.74 1.11
avg score with dots and commas 4.07 1.87

This e-mail would take 5min to write and dvorak would save me 7sec had I been using it back then. Dvorak would be even less efficient per-word, but again on more words that count. And way more if I count the dots and commas.

Now, this approach is not language-specific, so it made sense to test the final dvorak myth – it’s supposed to be designed for English language. Here is the table for a journalistic-type text in Slovene:

dvorak qwerty document
characters / strokes (alphabet only) 20709 20783 21168
strokes (with commas and dots) 21203 21339 22699
better at words 1266 842 3210
better at distinct words 705 437 1476
avg repeats of a word 1.76 1.93 2.18
avg length of a word 7.7 7.82 7.38
avg gain 1.11 1.38
avg score 14.77 8.82
avg score with dots and commas 18.05 11.42

This document would take 36min to write and dvorak would save almost no time. Slovene language has permutations for all word-types, so the number of repeated words is lower, yet the ratio of success in distinct words is the same as for english documents.

The source code (keyboardlayouttest.pl) is available, feel free to abuse it. It would be very interesting to create a more generic word-count tool, that would calculate the time wasted for not using dvorak. 😛

Reblog this post [with Zemanta]

making sense out of life, one by one

October 20th, 2009 § Comments Off on making sense out of life, one by one § permalink

Spider web early in the morning
Image via Wikipedia

It’s been ages since I last wrote a blog post, but the good news is, the list of ideas is constantly growing and I hope I’ll be able to focus on them eventually. It’s all a matter of organizing one’s life and being on top of the everyday chaos of details and little disturbances.

I’ve learned to love the web 2.0 (is this name starting to sound really old, or is it just me?) in this respect. In the last few months, I’ve successfully organized some of the more funky parts of my life. I’ll just list them for now, and write posts describing each of the steps specifically later.

First there was Dopplr, that finally made it easy to track _all_ my travels. So easy in fact, that I went and filled in every trip I’ve made in the last 5 years. I now know exactly where I was since high school. And what good is that you ask? Well, for one, I’ve learned that in December 2004, I was in Barcelona, at the same time as the girl who is now my wife was, even though we didn’t know that then. Interesting, and previously undiscoverable.

Then there is Ancestry.com, where I finally made sense out of all the stories my parents have to say about our family history. Now I have a chart outlining _all_ knowledge that is available to me about my ancestors. And through smart algorhythms, I’m discovering links with people around the world.

But for more serious stuff, everybody has a bank account or two. Or several. Luckily, we also have Wesabe, a great online tool that enabled me to structure _all_ my personal banking history. Every transaction I have ever made is now indexed by Wesabe (with a lot of help from my coding skills, but still), which really made me feel more at ease. And enabled me to discover a credit card fraud in two days rather than next month!

Then I switched my phone again and realized, that not only is my contacts list split amongst two phones now, there are also contacts lists in every social network I am part of, but I have no access to them where I need them – in the cellphone’s address book. After much of research, I decided that the best thing to do was to use Gmail Contacts to store  the combined list of everybody I know from my existing address books, Facebook friends and Linkedin contacts. Imagine the power of communication after having all that info everytime with you!

… so much for the appetizer. more details and examples shall follow… 🙂

Reblog this post [with Zemanta]

permanently broken software… banshee

July 1st, 2009 § 2 comments § permalink

Banshees on the Wind
Image by Dead Air via Flickr

there are applications that I really like in theory and try to start using them every couple of months. I give the application an opportunity: i install it and start using it without thinking in my workflow. if after an hour it still works, I give it a couple of days to educate me on it’s specifics. if after that I feel comfortable, I start using it instead of some others…
so today I tried Banshee again. I liked it since day one, some four years ago, but it was crashing all the time back then. so i tried it some seven times in between, and guess what, it crashed after playing one song again today. I’ll try again next year.

Reblog this post [with Zemanta]

slow editor in wordpress?

July 1st, 2009 § 1 comment § permalink

BEIJING, CHINA - JANUARY 10:  Labourers build ...
Image by Getty Images via Daylife

argh! I went to write a blog post  about some broken software and was forced to use notepad, because of wordpress editor lag. there is one thing that irritates me the most in moders software – text editors that lag behind my typing. hate it. and it’s frustrating, because you can’t really do anything about it. grrr
the good news is, zemanta had nothing to do with it – after I disabled it, wordpress was just as slow as before. I don’t know what they are doing in the background, but am considering switching platform now.

Reblog this post [with Zemanta]

geometrics of the cloudy sky…

June 14th, 2009 § Comments Off on geometrics of the cloudy sky… § permalink

Cloud
Image via Wikipedia

“The principle definition of “partly cloudy” is opaque cloud covering 3-5 eighths (or oktas) of the sky as seen from a specific location. This can fall into either of your latter two definitions, as a snapshot in which one looks up and sees the sky 35-60 percent covered with opaque clouds at a given moment (like your number 3) or, if one totals up the amount of cloudiness over the course of a day and on average the amount of cloud cover is 3-5 oktas (like your number 2).” (from http://www.wral.com/weather/blogpost/3089051/)

… so my question is… how do I measure ‘oktas’ ? how do I take the 360 degrees of the sky and divide it geometrically to eights the easiest?

possible answer: http://www.weborix.com/8.htm

question remains, why use eights in the first place?

Reblog this post [with Zemanta]

Guess where I am today

April 30th, 2009 § Comments Off on Guess where I am today § permalink




Guess where I am today

Originally uploaded by igzebedze

Where Am I?

You are currently browsing the My Projects category at Rational Idealist.