MultiLevelMenu

Dec 8, 2015

Persistent 'undo' in ViM

I love ViM. It's universality, simplicity and keyboard-only-controllability appeals to me tremendously. I hear Emacs is better, but I can't be bothered to learn it. ViM suits me just fine.
However, there's one ViM shortcoming that has annoyed me endlessly. It's inability to maintain edit history across buffer switches. To put it simply, it maintains the edit history that allows undo's and redo's only for the current buffer being worked on. The moment you switch to another buffer, the history is cleaned up and a blank one is given to the new file.
Well, it turns out that ViM has the capability to preserve edit history for each file across buffer switches. Like many other features of the editor, you just gotta turn it on from the rc file. Here's how you do it
set undofile
set undodir=$HOME/.vim/undo
set undolevels=1000
set undoreload=10000
With this in your rc file, ViM creates a temporary file for each file being edited in the folder pointed to by undodir. Make sure you create this folder before you start ViM. ViM won't create it for you, if it doesn't already exist.
Note that undoreload=10000 results in the entire file being edited to be duplicated to theundodir. So if you plan to edit large files with ViM, beware!
Lastly, since the edit history is stored in a file, it'll be available across program sessions. Isn't it awesome?
Thanks to this wonderful answer on SO for the tip.

Dec 4, 2015

Practices to reduce memory leaks in C++

One of the fundamental differences between C++ and the more modern and if I dare say popular programming languages such as Java and C#, is the lack of a garbage collector in C++. Often this is used as an example to portray how C++ has not kept up with the times and how code written with the language is a potential minefield for memory leaks.
While the debate about the merits of one language over the other continues (and in all possibility would continue for many more years), there are a few steps one can take, as a programmer, that can pretty much minimize, if not eliminate, the chances of a memory leak.
I would like to list some of the techniques that I have employed over the years and have worked for me pretty well.

Use RAII

Use RAII (Resoure Acquisition Is Initialization) programming paradigm: Perhaps the best technique that one can employ to minimize memory leaks.
RAII is a term coined by Stroustrup to refer to the technique where the behavior of the constructor and automatic destructor of objects is employed to manage resources. In C++, there are two ways to allocate objects -- from the stack and from the heap. Unike Java and C#, where all objects are created by allocating memory from the heap, objects in C++ declared with local scope (within a function or meothod) are allocated from the stack and when these objects go out of scope, as that happens when a function returns, they objects are automatically destroyed.
This behavior can be exploited to make local resource management very simple. For example:
void foo()
{
    FILE* fp = fopen(...);
    if (!fp)
        return;
    // ...
    // do some other operation
    bakeacake()
    // call bar() giving the file as argument
    bar(fp);
    fclose(fp); // close the file
}
In this example, consider what happens if either bakeacake() or bar() throws an exeption. Control would immediately be transferred to the caller of foo() (and if it doesn’t handle the exception, to a higher level caller in the call chain that hopefully traps the exception) leaving the file resource dangling.
Given the above scenario, you can always write this function as
void foo()
{
    FILE* fp = fopen(...);
    if (!fp)
        return;
    try {
        // ...
        // do some other operation
        bakeacake()
        // call bar() giving the file as argument
        bar(fp);
    } catch (...) {     // catch the error
        fclose(fp); // remember to close
        throw;
    }
    fclose(fp); // close the file
}
and solve the problem of potential resource leak. However, this has repetitive code, something that is firstly, hard to maintain (as any changes in resource management would require updating two different places in the code) and secondly requires the programmer to exercise discipline while writing code (to remember or find out the various control paths and all the resources allocated so that they all can be released).
Another way to write this function would be to use a local object (allocated on the stack of foo()) that provides the required file operations. Consider this example:
void foo()
{
    File f(“a.out”);
    // ...
    // do some other operation
    bakeacake()
    // call bar() giving the file as argument
    bar(f);
} 
The above approach will only work if you have access to the code of bar() and you can make the necessary changes to it so that it can accept a reference to File object rather than a FILE*. On the downside, now you have to change the implementation of another function to fit your new programming style.
Now consider the scenario where such a File object does not exist (and for some reason it is difficult to build one yourself) or you’re not at liberty to modify bar() as that function is used in many other modules code for which you do not have access to, you may try this approach:
class FilePtr {
    FilePtr(FilePtr& const);
    FilePtr& operator=(FilePtr& const);
public:
    FilePtr(char *const name, char* const mode) : fp_(0)
    { fp_ = fopen(name, mode); }
    FilePtr(FILE* fp) : fp_(fp)
    {}
    ~FilePtr()
    { 
        if (fp_) fclose(fp_);
    }
    operator FILE*() // crucial to making this work!
    { 
        return fp_;
    }
protected:
    FILE* fp_;
};
In this example, we first create a small wrapper class to whom we delegate the responsibility of managing the FILE* resource. When foo() returns, since the FilePtr object is allocated from the stack, it would be automatically destroyed resulting in its destructor being called which would close the FILE* gracefully.
There are two key points here that merit highlighting.
bar() function does not have to be modified. This is because FilePtr declares a FILE* operator, which will be automatically invoked by the compiler, when it cannot find an implementation of bar() that takes FilePtr (or FilePtr&) as its argument.
FILE* automatically gets released when the function returns, either through an error and consequent exception or when bar() returns.

Use objects to abstract concepts

Use objects to abstract concepts as much as possible rather than using procedural style code.
One of the strengths of C++ is also its weakness. C++ allows you to mix C coding style (where program logic is expressed purely as procedural functions) into C++ code. While this provides immense flexibility and backward compatibility, it also leads you to maintain the same coding style that you’re used to (assuming that you spent considerable time coding in C and then eventually migrated to C++) preventing you from exploiting the language to its fullest.
Coupled with the RAII explained before, this approach will yield excellent results. Following are a few examples of scnearios where this approach will payoff rich dividends:
1. Synchronisation locks
2. Files as in the previous example
3. Temporary buffers (we’ll examine this soon)
4. Any other form of OS resources that have local scope
A few examples:
void a::foo()
{
    Guard g(mutex_)
    // code below, until the function exits, is
    // guaranteed to be safe across all threads
    // that access the resources protected by
    // the mutex mutex_.
    // update shared resources
    status_ = 1;
}
Here the constructor of Guard class acquires the mutex (waiting for it indefinitely) and its destructor automatically releases the same mutex.
void b::OnPaint()
{
    CPaintDC dc(this);    // RAII
    dc.SetBkColor(RGB(127, 127, 127));
    CRect rc; dc.GetClientRect(&rc);
    dc.FillSolidRect(&rc, RGB(0, 0, 0));
}
This is a classic MFC code. In Windows, drawing to a window in response to a WM_PAINT message requires one to call the BeginPaint API and once the drawing has been completed call the EndPaint API. Both these calls are encapsulated in the constructor and destructor of CPaintDC class respectively insulating the programmer from the explicit call to these APIs.

Use std::auto_ptr

Despite its destructive copy semantics, auto_ptr is still a very useful template that can be effectively employed to automatically release dynamic objects.
An example scenario where auto_ptr can be effective is in UI widgets that implement optional behaviors. For instance, imaging a window widget class that allows the user to set a background image as an option. Until the user sets this option, the widget draws the background using a solid color.
class Window {
    ...
    std::auto_ptr<Background> background_;
    ...
public:
    ...
    void setBackground(const wchar_t* imagefilename) throw(std::exception)
    {
        background_.reset(new     
        Background(imagefilename));
    }
    void Paint()
    {
        ...
        if (background_.get()) {
            // paint the background
        }
    }
};
Objects of the Window class would, by default, have an empty embedded Background object. However, when a user sets the background, a Background object would be created which will be tested for NULLness in the window paint routine. If the member variable points to a valid object (and hence is is not NULL), the image is painted as the background of the window.
The key point here is that the programmer does not have to remember to explicitly do anything to manage the Background object. When the window is destroyed which will result in a consequent deletion of the corresponding Window object, the associated Background object’s destructor will automatically be called where all the resources used to paint the background image can be released.
Caveat 1: As mentioned in the beginning, auto_ptr has destructive copy semantics making any object containing a member variable of this type not safe for copy constructor or operator assignment. So if your object requires either of these two constructs, you should stay away from auto_ptr. Alternative is boost::shared_ptr<>.
Caveat 2: For the above reason, C++0x deprecates auto_ptr and instead provides a new template std::unique_ptr.
Caveat 3: std::auto_ptr is also not capable of managing array types whereas std::unique_ptr does.

Turn on memory tagging

Turn on tagging of heap memory allocated blocks in debug builds and compare heap memory snapshots between program startup and termination to ensure that all memory blocks are being released.
In Windows one can use the _CrtMemCheckpoint() and _CrtMemDifference() CRT calls to check if a function leaks any memory. Visual C++ also provides other Heap State Reporting Functions that can be used to monitor memory leaks.
I believe in LINUX, more specifically gcc, provides mtrace() which logs all memory allocation requests to a logfile which can then be parsed my mtrace utility to report if there are any leaks.

Use std::vector<char> instead of malloc.

std::vector<> uses RAII to acquire its memory and its destructor releases the memory. This way you don’t have to remember to release the memory you allocated at all of your return points.
For example:
void foo()
{
    char* buf = new char[1024];
    if (!buf)
        return;

    // dummy frame buffer
    volatile ulong* fb = 0x00040000;
    memcpy((void*)buf, (const void*)fb, 1024);

    if (cond1) {
        dosomething(buf);
        delete[] buf;
        return;
    }
    if (cond2) {
        dosomethingelse(buf);
        delete[] buf;
        return;
    }
    // do some other processing
    doyetanotherthing(buf);
    delete[] buf;
}
You’ll notice that that three different locations from where the function returns to the caller and hence three different calls to release the memory allocated at the beginning of the function. As mentioned in the first point, this requires the programmer to exercise discipline to remember to release the memory from all  paths of return. Secondly, if any of dosomething(), dosomethingelse() or doyetanotherthing() throws an exception, the memory would never be released.
Now let’s look at how the same function, but now using std::vector for its memory allocation.
void foo() throw(std::exception)
{
    std::vector<char> buf(1024);

    // dummy frame buffer
    volatile ulong* fb = 0x00040000;
    memcpy((void*)&buf[0], (const void*)fb, 1024);

    if (cond1) {
        dosomething(buf);
        return;
    }
    if (cond2) {
        dosomethingelse(buf);
        return;
    }

    doyetanotherthing(buf);
}
Admittedly this is not such a great example as the control paths can be combined to eventually lead to a single point of return where the delete can be placed. But I hope you get the idea.

Declare base class destructors as virtual.

An often repeated mistake especially as the project scope expands and the classes that were designed and implemented in earlier versions are reused by inheriting from them to alter their behavior.
When deriving from a class whose destructor is not declared as virtual, and then subsequently deleting the object using a pointer to the base class would only invoke the destructor of the base class and not the derived class. Thus, the derived class resources would not be cleaned up.

Nov 10, 2015

Django Localization Issues and Solutions

Django has excellent support for building a multi language multi culture application. However, there are a few interesting bits about its implementation of the internationalization feature that can prove to be minor annoyances in truly localizing yoru web based application. This post aims to discusses these issues and highlights the solutions for them.

It matters where you run 'makemessages' command from

The core command that prepares the application for localization, makemessages can be launched from two locations and depending on where it's launched it behaves slightly differently. The command can be launched from project root folder or from an application folder.
If started from the project root, it attempts to extract all the localizable strings from all installed apps and put them in the *.po file. However, if started from an application subfolder, it only extracts the strings from that application's files.
The key issue here is that makemessages works differently compared to other commands likemakemigrations which accepts an app name as an optional argument to generate migrations for that specific app. If you want to extract messages for a specific app, issue the command from the application folder.

The locale folder is optional for an application

Conventional wisdom will tell you that each application stores the localizable artifacts in its own private folder. It turns out that this is optional. makemessages will look for a locale subfolder under the app folder and if it finds one, it would place the .po file underneath it in the corresponding locale-name subfolder (more about this below) . If the locale folder is not found, the localizable strings for the app are collected in a .po file placed in the first path specified in the LOCALE_PATHS setting.

Language and locale names for Chinese have changed in 1.8

Earlier versions of Django used to refer to Chinese(Traditional) and Chinese(Simplified) as zh-twand zh-cn. This has given way to zh-hant and zh-hans respectively. Apparently, the change was brought about by the pan-nation usage of the language (and hence the related locale) that goes beyond Taiwan and China to countries such as Hong Kong, Malaysia and Singapore.
Interesting bit here is that 'makemessages' would still accept the old language codes as its argument and would send the messages file to a locale subfolder with that name. However, when trying out some localized strings, they won't show up.

Language names and locale names are different

Language names specified in the LANGUAGES tuple in settings are typically all lowercase. English (United States) is specified as 'en-us' and English(Great Britain) is specified as en-gb. However, locale names follow ISO-639 standard for country names and therefore these are uppercased. Soen-us becomes en_US and en-gb becomes en_GB.
There's an exception to this as well. If the country/region name has more than 2 characters, then it is represented in its pronoun form.
Finally, locale names use underscore to separate language and country codes whereas the Django language specification system uses all lowercase. So the locale name for English (United States) specified as 'en-us' in LANGUAGES gets translated to the locale name en_US . Chinese(Traditional) specified as zh-hant gets translated to locale name zh_Hant and Chinese(Simplified) specified as zh-hans gets translated to locale name zh_Hans.

Beware of 'fuzzy' in PO files

Often as you tweak your website, you will invariably change some text. The change could be as simple as fixing a typo or adding a missing comma. After you change this, you would also generate a PO file using the makemessages command. This is where things get a little, shall we say 'fuzzy' (pun intdented). When you regenerate the PO file, GNU gettext utility parses the source code and starts extracting each string from it while looking for a matching entry in the PO file. On finding an exact match, it doesn't do anything as there could already be a msgstr "" entry for that string which it shouldn't mess up.
However, if it doesn't find any strings in PO file which is an exact match, it doesn't just create a brand new entry for the new string. Rather it has a concept called a close match. That is, the new string has a corresponding string in the PO file that, though not an exact match, is very close to the original string. In this case, makemessages just updates the relevant string entry in po file with a line marked as #, fuzzy followed by the new string while commenting out the old closely matching string.
Such strings are not by default stored in the MO file when the messages are compiled and consequently the localized text for the newly modified string won't show up. The system expects you to review the slight changes and make a decision whether the localized text needs to be appropriately updated. As part of this review, you have to remove the #, fuzzy and the commented original string which marks this message and its equivalent localized text as properly reviewed.
This mechanism is documented in detail here.