Friday, November 11, 2016

Internet connecting the wrong masses

I used to think that the Internet will serve as a major facilitator for the advancement of free thought and civil liberties. I thought that being able to connect to other people and access to information from across the globe will provide an opportunity to learn about new and progressive ideas, raise awareness of different cultures, and develop an appreciation of how different yet so alike we are as human beings. How naive, how wrong I was!

In my defense, human history presents a constant progress. There are of course some discontinuities in this progress. Up until now, I never felt like my time on this planet was overlapping with one. With all that has happened lately in my home country and given the latest events in the US, I am now convinced that we are witnessing a global regression in human progress. I feel unlucky to be witnessing this. It is quite distressing, especially given how short our lifetimes really are. This is how people must have felt during other tragic times, such as the rise of fascism in Europe before WWII.

This regression is hard to accept, as I tend to think that the new is always better. However, in order for the new generations to reach their full potential, we must leave them a world better than the one we inherited. I'm saddened to see that this may not be at all possible in our case. I hope I am wrong. Perhaps you are thinking that this assessment is highly exaggerated. Yet, the things that are happening in my home country are out of the pages of George Orwell's 1984. You may wake up tomorrow and read on the news that the sky is not blue but red, and whoever claims it is blue would be jailed. Of course what's happening here cannot be used to draw such a bleak picture for the world. Nevertheless, if the US, considered to be the leader of the free world, can elect a racist con artist who has no respect for science and reason as their president, then what else?

I am starting to think that the Internet and the social media have caused all this. The average person suffers from confirmation bias. They use the social media to find more ideas like theirs and connect to people who think like themselves. This will further strengthen their often incorrect convictions. Intellectual people will of course use the Internet for broadening their perspectives. These intellectuals, such as scientists, artists, thought leaders, etc., are people who were already connected even before the Internet era. They always had communities enabling them to keep connected. The conservative masses, on the other hand, did not know about each other. With the advent of the Internet and the social media, now these masses are getting connected and becoming even more conservative. I believe that the Internet has become a bigger booster for conservatism, than for liberal thought.

If the Internet is not going to promote a culture of tolerance, free thought, and personal freedom, then what will? I believe the answer is mobility. Visiting foreign places, meeting people from other cultures, and sharing experiences with them. Obviously this is not easy. I live in a country where a large portion of the population is anti-Semitic, yet never had a single Jew friend; homophobic yet never had a gay or lesbian friend; and even xenophobic yet never even met a foreigner.




Monday, October 28, 2013

Скопје & Prizren

We decided to take some time off this year during the Feast of the Sacrifice holiday in Turkey. Republic of Macedonia does not require a visa from us, so we took a 3-day trip there, with the aim of visiting the neighboring Kosovo as well. Given the limited time, we planned to see three cities: Skopje and Tetovo in Macedonia, and Prizren in Kosovo.

Skopje, which we call Üsküp in Turkish, is the capital of the Republic of Macedonia. It is written as Скопје in Macedonian using the Cyril alphabet. The city has been settled since Neolithic times. It came under the control of the Roman Empire, later changed hands between the Byzantine Empire and the Bulgarian Empire, served as the capital of the Serbian Empire, and in 1392, it was conquered by the Ottoman Empire and served as the capital of the Kosova Province until 1912. It is noteworthy that Skopje became part of the Ottomon Empire almost 60 years before İstanbul was conquered. It was under Ottoman control for more than 500 years. It has been part of Yugoslavia until Republic of Macedonia separated from Yugoslavia in 1991.

From Turkey, you can fly to Skopje via İstanbul. The flight takes around 1 hour and 20 minutes. Skopje has an international airport called Alexandre the Great Airport. Despite the name, the airport is humble in size. The naming of the airport, and more importantly, the naming of the country itself is rather controversial. The ancient Kingdom of Macedonia was of Greek origin and centered around the current day city of Thessaloniki (Selanik in Turkish) in the Region of Macedonia within Greece. The famous leader and military commander Alexandre the Great, who lived around 350 BC, is from ancient Macedonia. The Greeks are quite opposed to the name of the current day Republic of Macedonia, as it is geographically located on the northern parts of the ancient Macedonia and ethnically current day Macedonians are a different group of people than the ancient Macedons. My understanding is that Macedonians might have historic ties to the ancient Macedons, but they came under heavy influence of Slavs. As a result, they are more of a Slavic ethnic group today. They speak a Slavic language as well. I personally find it strange that they want to own the ancient Macedon figures like Alexandre the Great and Philip II. Apparently that really upsets the Greeks, who consider these figures as part of their history. Due to the naming controversy, today you might see the words FYROM in some maps, which means The Former Yugoslavian Republic of Macedonia, which is the official UN name due to Greece's protests over the name Republic of Macedonia.

Skopje, the capital of Macedonia, is a city that houses half a million people. Macedonians constitute 65% of the population, whereas Albanians form 25%-30%. There is ~4% Turkish population, and ~3%of Romanis (Gypsies). Close to two thirds of the population is Christians (mostly Macedonian) and one third is Muslims. There does not seem to be any religious tensions. You can see both churches and mosques around. Call for prayers are heard five times a day. The official language is Macedonian in Macedonia, but in places where there is a sizable Albanian population, Albanian Language, written using the Latin alphabet, is co-official, as in Skopje. The money used in Macedonia is Macedonian Denar. In Skopje, some places also take Euros.

Stone bridge in downtown Skopje
From the airport, we take a taxi to the city center of Skopje. It is around 20kms from the airport. As we enter the city, it surprises us to see so many Halkbank branches and ads. Halkbank is a Turkish bank. We check-in at a hotel close to the main square in the city center, aptly named the Square Hotel. The hotel is at the 6th floor of a building. Doesn't look too promising from outside, but it is clean and tidy inside. The Vardar River passes through downtown Skopje. The city is under construction. There was a big earthquake in 1963 in Skopje, which has destroyed the city and it seems the city is truly recovering from that destruction only now.  Macedonia is a candidate country for the European Union and with support from the EU, they have started a downtown renewal project. As part of this effort many new buildings and statues are being built.

Warrior on a Horse
However, we feel that the statue thing is a little overdone. There are too many of them and at weird places as well. Some of them are nicely placed though. One such example is the Alexandre the Great statue shown on the left, which is the central attraction in the main square. It is best viewed from the old bridge, as it is placed quite high. From up close, you are more likely to see the underbelly of the horse, rather than the Alexandre sitting on top of it. The funny thing is, Macedonians are not allowed to name this statue as Alexandre the Great, due to the longstanding naming dispute with Greece. As a result, it is named Warrior on a Horse. On the other side of the old bridge, there is another statute, this time of Philip II, standing up with his fist pointed in the air. Again, it cannot be named Philip II, and instead it is named The Warrior. There is also a triumphal arch, again built as part of the renewal project. It is said that the nationalist government has spent more than 200 million for this effort so far, which is striking considering the rather poor country. The unemployment rate is as high as 35%, which is frighteningly high. If you walk a little outside of downtown, you can start seeing graffiti over unkempt and rundown buildings.

Kale Fortress
Across the bridge, you can find the old city, where the Ottoman impact is highly visible. There are Ottoman mosques, such as the Mustafa Pasha Mosque, Hamams, Inns, and a Bazaar. Also on that side of the river, located on top of a hill and overlooking the city center is the Fortress of Skopje, called the Kale Fortress. Kale is the Turkish word for Fortress. So it is the fortress fortress :). The fortress was originally  built by the Byzantine emperor Justinian I, but has been partially reconstructed many times. With the lighting at night, it looks very impressive, as seen on the right. The old city has many stores and restaurants with Turkish names and it is easy to find local folks who can speak Turkish. For lunch we had meatballs at a restaurant, and two of the waiters knew Turkish. In fact, one of them was a local Turk. 

The next day we took a bus to neighboring Kosovo. The two important cities in Kosovo are Prishtine, the capital, and Prizren. We decided to visit Prizren, as the Turkish connection is stronger there. For those who do not know, Kosovo is a country. However, it is not internationally recognized. Among many who recognize it are the USA and Turkey. It was separated from Yugoslavia as a result of the Kosovo War in 1999, which was fought by the Kosovo Liberation Army with aerial bombardment support from NATO. Turkey has participated in the NATO operations as well. Almost a decade after the war, it declared its independence from Serbia in 2008. However, Serbia still does not recognize Kosovo and has territorial claims on it. Today Albanians form around 90% of the population in Kosovo. Serbian forces have committed war crimes during the war and many Serbians have left during or after the war due to the resulting ethnic conflict. Kosovo is not part of EU yet, given its mixed status as an independent country. But there is a EU mission in Kosovo providing rule of law, such as police officers, prosecutors, and judges. The euro is the official currency in Kosovo.

Namazgah where Fatih has prayed
Prizren is located on the slopes of the Sar Mountains and the road from Skopje to Prizren has nice views. It takes around 3 hours by bus, but half an hour of that is spent waiting for the passport control on the Macedonia-Kosovo border. Prizren is a very attractive little city, full of history. Perhaps the right description for it is 'boutique'. As we leave the bus station, on the other side of the street, we find the Namazgah. This is the place where Fatih Sultan Mehmed has stopped to pray when Prizren was conquered by the Ottoman Empire in 1455, that is 2 years after the Fall of Constantinople. The remains from the Namazgah are partially restored, as seen on the left. It does not look pretty, but it does have historical significance for us.

Stone bridge over the Bistrica
The city houses less than 200 thousand people, which are predominantly Muslim Albanians. There is also a Turkish minority, which seems quite active in the political scene. Turkish is a co-official language in Prizren, Albanian being the main language spoken by the population. Everyone seem to know a little Turkish though. They speak it with a cute accent, somewhat resembling the Black Sea coast accent from Turkey. Bistrica river passes through the city. On the right, you can see an old stone bridge over the river. You can walk along both sides of the river and pass through nice little cafes, restaurants and other local stores. The city is quite lively. Historical sites are very well preserved. The city is very clean and tidy.

As you walk along the river towards the city center, you can catch the view of the Fortress of Prizren, which is located on a hill overlooking the city center. The fortress was built by the Byzantines, was expanded by Serbians, and was given its final shape by the Ottomans. Unfortunately, we did not have the chance to climb the fortress to get a view of the city from up there, because it started to rain heavily an hour after we reached Prizren.

In the city center, you can find the town square, which is called the Shadervan, meaning Fountain in Turkish. There is indeed a fountain in the middle of the town square. The square hosts many small restaurants. We had lunch in one of them. The cabbage salad and the thin sliced beef steak were delicious. Located next to the town square is the Sinan Pasha Mosque, built in 1615. There is also a Hamam close to the city center. There are also Eastern Orthodox Churches. One of them is the Our Lady of Ljevis Church, which we saw from outside, but were not allowed to get in or take pictures. It seems many churches were burned during the ethnic clashes and even today there is animosity against Serbians and thus against the Churches. We hang around the cafes at night, stayed at a very central hotel close to the main square, and took the bus next morning back to Skopje. Before boarding the bus, we tasted local Böreks at a pastry store. Böreks are filled pastries made of a thin flaky dough (known as phyllo in the US). They were delicious.

As soon as we arrived in Skopje, we took a bus to the nearby city of Tetovo. During the Ottoman times, this place was named Kalkandelen. The city is the political capital of the Albanian minority within the country and Albanians form the relative majority here. In 2001, there was a minor civil war within Macedonia, and Tetovo played a central role during that time. The conflict was resolved by granting more rights and better representation to the Albanian minority. An example is the right to have universities that teach in the Albanian language.

Painted Mosque
The Pena river passes through the city and you can find an Ottoman Hamam next to it as you walk towards the city center. You can also find, located very centrally, the Painted Mosque. This mosque, shown on the right, was originally built in 1438, and later reconstructed in 1833. The mosque is unique because it does not follow the traditional Ottoman architecture. It has floral paintings. Inside the mosque, under the dome, you can see panoramic drawings of trees, plants, and buildings. This is the first time we see an interior mosque decoration like that. From the outside, it also looks interesting. The back side of it looks as if it is made out of playing cards :). The mosque has a nice small courtyard, which also houses an Ottoman mausoleum (called a Türbe), which houses the tombs of the financers of the mosque. Tetovo also houses an Ottoman fortress overlooking the city and a historical Bektashi Tekke, that is a Bektashi Dervish Lodge.

We had a nice dinner in Tetovo and returned back to Skopje at night. There was a concert in the main square which continued until midnight. Early next day, we went back to the airport for our return flight to İstanbul.







Sunday, October 27, 2013

Macho Grande

One of the most hilarious scenes from the movie "Airplane 2: The Sequel":



Unnamed crew member (witness):
  • Flew with Striker during the war
  • He never forgets the night they bombed Macho Grande
  • How they survived was a miracle
  • He won't ever get over Macho Grande, those wounds run pretty deep
Striker:
  • Was the squadron leader
  • He brought them real low, but he couldn't handle it
Buddy:
  • Was the bombadier
  • He went into pieces
  • It was awful how he came unglued
  • He bailed out
Howie:
  • Was a rock, best tail gunner in the outfit
  • They lost Howie the next day
Andy:
  • Was the navigator
  • He was alright
  • He hung tough




Sunday, October 20, 2013

Ideal occupations

Here are 10 occupations I wish I had pursued, in no particular order.

1. Architect: The kind that designs major buildings. Nothing is higher than architect.

2. Poet: I do this as an amateur and I think I am good at it. However, all those moments will be lost in time, like tears in the rain :)

3. Mechanical engineer: As a kid I always admired my uncle who is a top-notch mechanical engineer. If I ever win the lottery, I'll do a masters in this field.

4. Game designer and programmer: I wish I was the person who led the development of Diablo series. This is perhaps the closest to my actual occupation, but the mixture of programming, story telling, art and graphics makes game design unique.

5. Fighter jet pilot: I am afraid of heights, so this is quite far fetched. But I am fascinated by fighter jets. I used to build and paint plastic model kits of them. It seems in the near future fighter jets will be commanded from land stations. So it seems I missed this opportunity by 50 years or so.

6. Professional tennis player: I am too much of a couch potato to do a sports professionally. Tennis is the only sports activity I enjoy performing, despite enjoying watching any kind of sports on TV (I can even watch a curling game). I was a big Agassi fan back in the day. Before that I was a fan of Boris Becker. Nowadays I am more interested in female players.

7. Philosopher: I do have a doctorate of philosophy. I wish I was doing my qualified job :) 

8. Singer: When I was young, I saw Joe Cocker in Unchain My Heart's video clip and was impressed. I think that is why I enjoy growing my beard. He has that rugged voice that I like. If I were to drink and smoke, I could bring my voice to that level.

9. Resistance leader (like that's a job): Not sure where this came from. The only resistance I lead today is against my inner urges...

10. Iron chef: I like cooking. I am a big fan of Iron Chef Rokusaburo Michiba from the Japanese series. What a name! Iron Chef Michiba always writes his menu in calligraphy before starting to cook. Planning is key to any activity.


Sunday, April 28, 2013

Iterating visitables


A common design pattern in programming is the 'visitor design pattern' (see here). The idea is that, the caller passes a visitor object to the callee, and the callee uses this object to make calls as it visits some internal structure that it maintains, passing the info about the visited objects as a parameter to the calls made on the visitor object.

Here is a real-world use case I run across recently: I was using a persistent R-tree library written in C++. R-tree is a tree used for indexing spatial data, such as 2-d and 3-d objects. This particular implementation provided a visitor interface, where querying the R-tree index was done by passing a visitor object to its query method. Every time a node of the tree is visited during the answering of the query, the query method makes a call to the visitor object, passing in the node being visited as a parameter.

An alternative approach, typically used with indexes like this is iteration (cursors in DB lingo). An iterator can be used to iterate through the results. In my use case, I needed to provide an iterator interface over the R-tree index. Doing this kind of adaptation requires use of non-traditional abstractions, which we will get to soon. Take a few minutes to think about a solution, and you will see that it is not at all straightforward.

Btw, I should note that the problem of providing an iterator interface on top of a visitor one is quite real, and something you can run across in real life programming, as I did. See here for a StackOverflow question about the same issue (which I tried to answer). It is also worth noting that the reverse problem of providing a visitor interface over an iterator one is trivial.

Here I will give a simple example that involves a counter to illustrate the problem. Let us first introduce abstract classes that define the concepts of visitor and visitable.

template<typename Data>
class Visitor
{
public:
  virtual ~Visitor() { }
  virtual bool visit(Data const & data) = 0;
};

template <typename Data>
class Visitable
{
public:
    virtual ~Visitable() {}
    virtual void performVisits(Visitor<Data> & visitor) = 0;
};

Let us illustrate their use with a simple counter that is visitable:

class Counter : public Visitable<int>
{
public:
    Counter(int start, int end)
        : start_(start), end_(end) {}
    void performVisits(Visitor<int> & visitor)
    {
        bool terminated = false;
        for (int current=start_; !terminated && current<=end_; ++current)
            terminated = visitor.visit(current);
    }
private:
    int start_;
    int end_;
};

Here, we have a Counter class that provides a Visitable interface for visiting numbers between 1 and n. Using it is rather straightforward:

class CounterVisitor : public Visitor<int>
{
public:
    bool visit(int const & data)
    {
        std::cerr << data << std::endl;
        return false; // not terminated
    }
};

int main(void)
{
    Counter counter(1, 100);
    CounterVisitor visitor;
    counter.performVisits(visitor);
    return EXIT_SUCCESS;
}

How can we convert this code to use iteration, rather than visitation? Let's first define the concept of an iterator:

template<typename Data>
class Iterator
{
public:
    virtual ~Iterator() {}
    virtual bool isValid()=0;
    virtual void moveToNext()=0;
    virtual Data const & getData()=0;
};

It is worth noting that I am not using STL style code here, to make the exposition more general. I am following a mainly OO approach. Here is what we want to do ideally:

int main(void)
{
    Counter counter(1, 100);
    Iterator<int> & it = magicFunction(counter);
    for (; it.isValid(); it.moveToNext()) {
        int const & data = it.getData();
        std::cerr << data << std::endl;
    }
    return EXIT_SUCCESS;
}

Below are two ideas on how to achieve this, both with serious shortcomings.

Idea 1) We write a wrapper that collects all data items into a collection and then we can provide the usual iteration interfaces.

Unfortunately, this idea has a critical shortcoming. It requires space in the order of number of items visited. In some applications this is unacceptable. A good example is the R-tree example I gave earlier, where the number of query results might be too large. If the only thing you want to do is to compute some aggregate over the results, or perhaps just look at the first k of the results, this approach will be very costly in some cases. So we rule it out as a general solution.

Idea 2) Another idea is to use threads.

We can spawn a new thread that will execute the performVisits method.  We can then start waiting on a conditional variable until the spawned thread signals us about a result. When a visit call is made on the spawned thread, the current data item will be saved into a shared variable, the spawned thread will signal us about the readiness of the result, and it will start waiting on another conditional variable until we signal it for the next request. This way our iterator can return the item stored in the shared variable when the getData method is called. When the moveToNext is called for the next step of the iteration, we could signal the spawned thread's conditional variable, so that it can do the next visit. Meanwhile we can wait on our conditional variable for the results. This could continue until the the visit is completed.

Unfortunately, this switching between threads is really costly if we end up doing it on a per-item basis. Alternatively, we can buffer some data items to improve performance. In general, threads are somewhat heavyweight, and also this kind of implementation is somewhat dirty. It can be made to work, however.

The second idea is actually moving us in the right direction: Threads have their own stack! So what we need is a different stack. We do not need concurrent execution, however. Instead we need alternating execution between two contexts with their own stacks. An abstraction that can provide this is 'coroutines'. See here, if you are not familiar with coroutines. Coroutines enable two functions to suspend and resume each other. That's where the name 'co' comes from. I won't describe them in detail here.

C++, being the flexible language it is, can support coroutines via a library. boost::coroutines is one such library and surprisingly it is very easy to use (assuming 'easy' can be applied to anything in boost).

Below, I will provide the code for a class that can convert a visitable into an iterator. Fasten your seatbelt:


template<typename Data>
class VisitableIterator : public Iterator<Data>
{
private:
    typedef boost::coroutines::coroutine<void()> coro_t;
    typedef coro_t::caller_type caller_t;
public:
    VisitableIterator(Visitable<Data> & visitable)
        : valid_(true), visitable_(visitable)
    {
        coro_ = coro_t(boost::bind(&VisitableIterator::visitCoro, this, _1));
    }
    inline bool isValid()
    {
        return valid_;
    }
    inline Data const & getData()
    {
        return visitor_.getData();
    }
    inline void moveToNext()
    {
        if(valid_)
            coro_();
    }
private:
    class InternalVisitor : public Visitor<Data>
    {
    public:
        InternalVisitor() {}
        inline bool visit(Data const & data)
        {
            data_ = &data;
            (*caller_)(); // return back to caller
            return false;
        }
        inline void setCaller(caller_t & caller)
        {
            caller_ = &caller;
        }
        inline Data const & getData()
        {
            return *data_;
        }
    private:
        Data const * data_;
        caller_t * caller_;
    };
    void visitCoro(coro_t::caller_type & caller)
    {
        visitor_.setCaller(caller);
        visitable_.performVisits(static_cast<Visitor<Data> &>(visitor_));
        valid_ = false;
    }
private:
    bool valid_;
    coro_t coro_;
    InternalVisitor visitor_;
    Visitable<Data> & visitable_;
};

In the code above, we see a class called VisitableIterator. The class provides a constructor that takes a Visitable object as parameter. Since it extends an Iterator, it provides the appropriate methods to perform iteration over the visitable object.

The code makes use of a coroutine. The coroutine is stored in the variable coro_. The code for the coroutine is specified in the visitCoro function. The visitCoro function takes as a parameter a caller object. The caller object represents the caller of the coroutine and is used to transfer the control back to the caller when needed. The visitCoro function makes use of a visitor object of type InternalVisitor, named visitor_. This is the object that implements the visit function. After performing each visit, it saves the current data item and transfers the control back to the caller. Since it needs to perform this transfer, it needs access to the caller object, which is kept in the caller_ variable. There are a few other details that should be obvious from the code.

Here is how we use this:

int main(void)
{
    Counter counter(1, 100);
    VisitableIterator<int> iter(static_cast<Visitable<int>&>(counter));
    for (; iter.isValid(); iter.moveToNext()) {
        int data = iter.getData();
        std::cerr << data << std::endl;
    }
    return EXIT_SUCCESS;
}

That was not too hard!

I have not yet investigated performance implications. Would update once I do.

Saturday, February 2, 2013

Varints

Variable length encoding of integers results in using less space for small integers and more for big ones. This comes handy in a few places. For instance, if you are encoding a string, you have the choice of using a null-terminated string or a length-encoded string. In the former case, you only use 1 extra byte, but end up with the restriction of not being able to include the 0 byte in your string. If you go the length-encoded way, then you can use 4-bytes to encode the string length. In this alternative, the 0 byte can appear in the string without any issue. However, for short strings, using 4 bytes for the length is very wasteful. To solve this, one can use variable length encoding for the string length, which makes it possible to use up only 1 byte for the encoding for a small string.

Another useful scenario is when you have a single integer type in your system, say a 64 bit signed integer, but most of the time, the integers you represent have a much smaller range. Again, the variable length encoding will help minimize the size of the encoding, in case you want to write things to disk or send data over the network. In fact, Google Protocol buffers uses a variable length encoding for a similar purpose. Wikipedia has an article on variable length encoding of integers here. Apparently it was popularized by the MIDI file format.  IBM's SPL language uses a simpler version of variable length encoding for serializing its string types (see here).

I want to open a parenthesis here and talk about type names in programming languages briefly. Until C99, there was a lot of confusion in C in terms of the sizes of the types. How big is a long? Well, it depends on the system. Historically, not all systems supported all sizes, so the size for different types were not fixed. With C99, inttypes.h now defines types such as int64_t and uint64_t, which remove the ambiguity (there is also the craziness of int_leastX_t and int_fastX_t stuff, and in fact uintX_t is optional, but let's not get into that). What surprises me is that, Java has decided to call its types byte, short, int, long, etc. I much prefer int8, int16, int32, int64. If short is defined as a signed integer with a size of 16 bits (which means the size is not an implementation detail), why not call it int16, which makes it easier to remember as well. 

Going back to variable length encoding, the idea is to divide the integer into 7-bit blocks, and use 1-byte (8-bits) to encode each block. The trailing 0-only bytes need not be encoded, saving space for small valued integers. When encoding, we go from the least significant block to most significant block, and set the most significant bit of each block to 1, except for the last one, which will have 0 as its most significant bit. As a result, a most significant bit value of 1 for a block is an indicator that the block is a continuation block, whereas a 0 value means that the block is the termination block. Here is an example:

89657 = 10101111000111001

Divide into 7-bit blocks

 101 - 0111100 - 0111001

Set the most significant bit to 1 for all but the leftmost block, which is padded with 0's instead:

00000101 - 10111100 - 10111001

Recall that we encode starting from the least significant block, so we get:

10111001 - 10111100 - 00000101

To decode, we apply the following procedure. If the first block starts with a 0, we have a single block. Otherwise, we collect all the consecutive blocks that start with 1, until we reach the block that starts with a 0 - the termination block. We then strip the most significant bits from each block and concatenate. In the running example, we have:

Strip the most significant bit:

0111001 - 0111100 - 0000101

Reverse block order:

0000101 - 0111100 - 0111001

And concatenate:

10101111000111001= 89657

This works for unsigned types. For singed types, the negative numbers will have their most significant bit set. So -1 will take a lot of space. If you have an application where you want the encoding to take up less space if the absolute value of the number is small, then you can apply a mapping like the following:

0 -> 0
-1 -> 1
1 -> 2
-2 -> 3
2 -> 4
...
-2^(n-1)+1 -> 2^n-3
2^(n-1)-1 -> 2^n-2 
-2^(n-1) -> 2^n-1 

Code not reusable is not worth writing. So, below is the C++ code for this, which I hope you will find reusable:

#include <inttypes.h>
#include <cstdlib>


class VarintUtils {
private:
    template <typename T>
    static char * encodeUnsigned(char * buf, T value) {
        do {
            uint8_t byte = static_cast<uint8_t>(value) & 0x7F;
            value >>= 7;
            if (value)
                byte |= 0x80;
            *(buf++) = byte;
        } while (value!=0);
        return buf;
    }

    template <typename T>
    static char * encodeSigned(char * buf, T value) {
        size_t const size = sizeof(T)<<3;
        T newValue = (value << 1) ^ (value >> (size-1));
        return encodeUnsigned(buf, toUnsigned(newValue));
    }

    template <typename T>
    static char * decodeUnsigned(char * buf, T & value) {
        value = 0;
        size_t shift = 0;
        uint8_t byte;
        do {
            byte = *(buf++);
            value |= (static_cast<T>(byte & 0x7F) << shift);
            shift += 7;
        } while ((byte&0x80)!=0);
        return buf;
    }

    template <typename T>
    static char * decodeSigned(char * buf, T & value) {
        buf = decodeUnsigned(buf, toUnsignedRef(value));
        T half = toUnsigned(value) >> 1;
        value = (value & 1) ? ~half : half;
        return buf;
    }

    // conversion from signed to unsigned
    inline static uint32_t toUnsigned(int32_t value) { return value; }
    inline static uint64_t toUnsigned(int64_t value) { return value; }
    inline static uint32_t & toUnsignedRef(int32_t & value)
      { return reinterpret_cast<uint32_t &>(value); }
    inline static uint64_t & toUnsignedRef(int64_t & value)
      { return reinterpret_cast<uint64_t &>(value); }

public:
    inline static char * encode(char * buf, uint32_t value) {
        return encodeUnsigned(buf, value);
    }
    inline static char * encode(char * buf, uint64_t value) {
        return encodeUnsigned(buf, value);
    }
    inline static char * encode(char * buf, int32_t value) {
        return encodeSigned(buf, value);
    }
    inline static char * encode(char * buf, int64_t value) {
        return encodeSigned(buf, value);
    }
    inline static char * decode(char * buf, uint32_t & value) {
        return decodeUnsigned(buf, value);
    }
    inline static char * decode(char * buf, uint64_t & value) {
        return decodeUnsigned(buf, value);
    }
    inline static char * decode(char * buf, int32_t & value) {
        return decodeSigned(buf, value);
    }
    inline static char * decode(char * buf, int64_t & value) {
        return decodeSigned(buf, value);
    }
};

Interestingly enough, you can perform binary search on a variable length encoded buffer. You might wonder why this is useful. Well, imagine you have a large graph stored using the adjacency list layout. Further assume that you have remapped the vertex ids so that the ones with the highest degree have the lowest ids. Now you can get significant compression if you encode adjacency lists using the variable length encoding of vertex ids. However, once you do that, how are you going to find if a given vertex is a neighbor of another vertex? You would have to resort to a linear search. Well, it is relatively easy to implement binary search on a variable length encoded list (of course, assuming it is sorted).

The idea is to readjust the location of the mid byte so that it aligns with the start of a variable length encoded value. Here is the code:


class VarintUtils {
    ...
private:
    template <typename T>
    static char * binarySearchGeneric(char * start, char * end, T value) {
        char * origin = start;
        T refval;
        while (start < end) {
           char * mid = adjust(start + (end-start) / 2, origin);
           char * midNext = decode(mid, refval);
           if (refval==value)
               return mid;
           if (refval>value)
               end = mid;
           else
               start = midNext;
        }
        return NULL;
    }

    static char * adjust(char * bufCurr, char * bufStart) {
        if (isFinalByte(*bufCurr)) {
            if (bufCurr==bufStart || isFinalByte(*(bufCurr-1)))
                return bufCurr;
            bufCurr--;
        }
        while (bufCurr!=bufStart && !isFinalByte(*(bufCurr-1)))
            bufCurr--;
        return bufCurr;
    }

    inline static bool isFinalByte(char c) {
        return (0x80 & c)==0;
    }
public:
    inline static char * binarySearch(char * bufferStart, char * bufferEnd, uint32_t value) {
        return binarySearchGeneric(bufferStart, bufferEnd, value);
    }
    inline static char * binarySearch(char * bufferStart, char * bufferEnd, uint64_t value) {
        return binarySearchGeneric(bufferStart, bufferEnd, value);
    }
    inline static char * binarySearch(char * bufferStart, char * bufferEnd, int32_t value) {
        return binarySearchGeneric(bufferStart, bufferEnd, value);
    }
    inline static char * binarySearch(char * bufferStart, char * bufferEnd, int64_t value) {
        return binarySearchGeneric(bufferStart, bufferEnd, value);
    }
};




Well, one weakness of the above implementation is that, mid point is determined by considering the number of bytes between the start and the end. However, towards the end of the list, the numbers are larger and thus may take more bytes. Still, for long lists, this provides good speedup compared to a linear scan.

That's all for today...









Sunday, October 7, 2012

I had a dream...

It was 12 years or so ago. At the time there was no such thing as a smart phone. I had a dream one night, where I was sitting in front of a computer and was looking at the world map. I was zooming in and in and in, and eventually I was able to see all the details down to roads, buildings, trees, etc. The only difference from today's map applications with satellite view support was that, it was all live! Looking at Apple's new map application with the 3D turned on over Manhattan suddenly reminded me this long forgotten dream.

It seems so many people have complained about Apple's map app. Unlike what most people say, I like it better than Google Maps on iPhone. First of all, it supports vector graphics, so the maps are crystal clear and do not require downloading different images for different zoom levels. 3D views with buildings is also great. Most importantly, it brings turn-by-turn directions with voice support. There are two missing things though. First is the public transportation data. Google Maps was great for finding subway routes and times in NYC. Second is the inaccuracies in the map data. This latter thing is only a matter of time to fix. I don't criticize Apple for releasing the app. They have to start from somewhere. I am sure within a year all inaccuracies would be fixed in the map data.


What I hate about Apple is their DRM stuff though. I had 30 episodes of Twin Peaks and 14 episodes of Firefly bought from iTunes. I wanted to move them to my external disk so that I can play them on my TV without hooking up my computer to the TV. Guess what, you can't do that without buying an Apple TV. After some searching around and with the suggestion of a friend, I found the software called Requiem. It is the only free DRM removal tool that can wipe out the DRM from your movies without playing and re-recording them (there are many paid tools that do the latter). So you get the same original content, just without the DRM on it.

The tool works like a charm, but it is somewhat hard to get to. Due to legality issues, it is hidden behind the Tor network (which uses onion routing to hide the web end points). It can be reached from here: https://tag3ulp55xczs3pn.onion.to/ and works for the latest version of iTunes. It took around 10 hours to remove the DRM from 52GB worth of video. There were no problems.