Thursday, May 19, 2016

i18nify any word with this Python utility

By Vasudev Ram

I18Nify

While I was browsing some web pages, reading a word triggered a chain of thoughts. The word had to do with internationalization (often shortened to i18n by developers, because there are 18 letters between the first i and the last n). That's how I thought of writing this small program that "i18nifies" a given word - not in the original sense, but in the way shown below - making a numeronym out of the word.

Here is i18nify.py:
from __future__ import print_function
'''
Utility to "i18nify" any word given as argument.

You Heard It Here First (TM):
"i18nify" signifies making a numeronym of the given word, in the 
same manner that "i18n" is a numeronym for "internationalization" 
- because there are 18 letters between the starting "i" and the 
ending "n". Another example is "l10n" for "localization".
Also see a16z.

Author: Vasudev Ram
Copyright 2016 Vasudev Ram - https://vasudevram.github.io
'''

def i18nify(word):
    # If word is too short, don't bother, return as is.
    if len(word) < 4:
        return word
    # Return (the first letter) plus (the string form of the 
    # number of intervening letters) plus (the last letter).
    return word[0] + str(len(word) - 2) + word[-1]

def get_words():
    for words in [ \
        ['a', 'bc', 'def', 'ghij', 'klmno', 'pqrstu', 'vwxyz'], \
        ['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', \
        'lazy', 'dog'], \
        ['all', 'that', 'glitters', 'is', 'not', 'gold'], \
        ['often', 'have', 'you', 'heard', 'that', 'told'], \
        ['jack', 'and', 'jill', 'went', 'up', 'the', 'hill', \
        'to', 'fetch', 'a', 'pail', 'of', 'water'],
    ]:
        yield words

def test_i18nify(words):
    print("\n")
    print(' '.join(words))
    print(' '.join([i18nify(word) for word in words]))

def main():
    for words in get_words():
        test_i18nify(words)
        print

if __name__ == "__main__":
    main()
Running it with:
$ python i18nify.py
gives this output:
a bc def ghij klmno pqrstu vwxyz
a bc def g2j k3o p4u v3z

the quick brown fox jumped over the lazy dog
the q3k b3n fox j4d o2r the l2y dog

all that glitters is not gold
all t2t g6s is not g2d

often have you heard that told
o3n h2e you h3d t2t t2d

jack and jill went up the hill to fetch a pail of water
j2k and j2l w2t up the h2l to f3h a p2l of w3r

Notes:

- The use of yield makes function get_words a generator function. It is not strictly needed, but I left it in there. I could have used "return words" instead of "yield words".

- Speaking of generators, also see this post: Python generators are pluggable.

- The article on numeronyms (link near top of post) reminded me of run-length encoding

Anyway, e3y :)

- Vasudev Ram - Online Python training and consulting

Signup to hear about my new courses and products.

My Python posts     Subscribe to my blog by email

My ActiveState recipes

Sunday, May 15, 2016

[DLang]: A simple file download utility in D

By Vasudev Ram



Download image attribution

Hi readers,

Here is another in my recently started series of posts about D (DLang / the D language).

This is a simple download utility written in D, to download a file from a URL.

It makes use of a high-level function called download (sic) in the std.net.curl module, so the code is very simple. But it only handles basic cases; e.g. no authentication or HTTPS support.
/*
File: download.d
Version: 0.1
Purpose: A simple program to download a file from a given URL 
to a given local file. Only handles simple cases. Does not  
support authentication, HTTPS or other features.
Author: Vasudev Ram - https://vasudevram.github.io
Copyright 2016 Vasudev Ram
*/

import std.stdio;
import std.net.curl;

void usage(string program)
{
    writeln("Usage: ", program, " URL file");
    writeln("Downloads the contents of URL to file.");
}

int main(string[] args)
{
    if (args.length != 3)
    {
        usage(args[0]);
        return 1;
    }
    try
    {
        writeln("Trying to download (URL) ", args[1], " to (file) ", args[2]);
        download(args[1], args[2]);
        writeln("Item at URL ", args[1], " downloaded to file ", args[2]);
        return 0;
    }
    catch(Exception e)
    {
        writeln("Error: ", e.msg);
        debug(1) writeln("Exception info:\n", e);
        return 1;
    }
}
A makefile to build it in both debug and release versions:
$ type Makefile

release: download.d
        dmd -ofdownload.exe download.d

debug: download.d
        dmd -debug -ofdownload.exe download.d
To build the release version:
make release
To build the debug version:
make debug
A test run: download the current HN home page:
download news.ycombinator.com nyc.html
Output is in nyc.html; see screenshot below:


The line:
debug(1) writeln("Exception info:\n", e);
gets compiled into the binary / EXE only if you build the debug version.
That line prints a bit more information about the exception than the release version does.

You can also read a few posts about downloading using Python tools.

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Signup to hear about my new courses and products.

My Python posts     Subscribe to my blog by email

My ActiveState recipes



Thursday, May 12, 2016

Getting CPU info with D (the D language)

By Vasudev Ram



I have been using the D programming language for some time.

From the D home page:

[ D is a systems programming language with C-like syntax and static typing. It combines efficiency, control and modeling power with safety and programmer productivity. ]

[ Some information about D:

- D Language home page
- D Overview
- D on Wikipedia ]

Here is a D program, cpu_info.d, that gets some information about the CPU of your machine.
// cpu_info.d
// Author: Vasudev Ram - https://vasudevram.github.io 
// http://jugad2.blogspot.com
import std.stdio;
import core.cpuid;
void main()
{
    writeln("processor: ", processor());
    writeln("vendor: ", vendor());
    writeln("hyperThreading: ", hyperThreading());
    writeln("threadsPerCPU: ", threadsPerCPU());
    writeln("coresPerCPU: ", coresPerCPU());
}
The program can be compiled and run with:
$ dmd cpu_info.d
dmd is the D compiler (the name stands for Digital Mars D). It creates cpu_info.exe which I can then run:
$ cpu_info
processor: Intel(R) Core(TM) i3-2328M CPU @ 2.20GHz
vendor: GenuineIntel
hyperThreading: true
threadsPerCPU: 4
coresPerCPU: 2
- Vasudev Ram - Online Python training and consulting

Signup to hear about my new courses and products.

My Python posts     Subscribe to my blog by email

My ActiveState recipes



Sunday, May 8, 2016

Calling C from Python with ctypes

By Vasudev Ram

Python => C

ctypes is a module in the Python standard library. It is "a foreign function library for Python". Such libraries help with calling code written in language B, from language A. In the case of ctypes it helps with calling C code from Python code.

[ Note: There are various other methods of linking between Python code and code written in other languages, such as writing Python extensions using the Python C API, SWIG, cffi, etc. I am only looking at ctypes in this post. It is one of the simpler methods. May look at others later. ]

Here is a small example of using ctypes, to call the time() function in the C runtime library on Windows:

# libc_time.py
# Example of calling C library functions from Python
# using the Python ctypes module.
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram

from __future__ import print_function
from ctypes import cdll
import time

libc = cdll.msvcrt

def test_libc_time(n_secs):
    t1 = libc.time(None)
    time.sleep(n_secs)
    t2 = libc.time(None)
    print("n_secs = {}, int(t2 - t1) = {}".format(n_secs, int(t2 - t1)))
    
print("Calling the C standard library's time() function via ctypes:")
for i in range(1, 6):
    test_libc_time(i)
And here is the output:
$ python libc_time.py
Calling the C standard library's time() function via ctypes:
n_secs = 1, int(t2 - t1) = 1
n_secs = 2, int(t2 - t1) = 2
n_secs = 3, int(t2 - t1) = 3
n_secs = 4, int(t2 - t1) = 4
n_secs = 5, int(t2 - t1) = 5
Note: libc.time() is the Python interface [1] to the C time() function, and time.sleep() is the sleep function in the time module of the Python standard library.

[1] We obtain that interface using ctypes; see the program above.

I use a call to time.sleep() sandwiched between two calls to libc.time(), to verify that the calls to libc.time() are returning the correct result; as you can see from the output, they are doing so.

- Vasudev Ram - Online Python training and consulting

Signup to hear about my new courses and products.

My Python posts     Subscribe to my blog by email

My ActiveState recipes



Thursday, April 28, 2016

Exploring sizes of data types in Python

By Vasudev Ram

I was doing some experiments in Python to see how much of various data types could fit into the memory of my machine. Things like creating successively larger lists of integers (ints), to see at what point it ran out of memory.

At one point, I got a MemoryError while trying to create a list of ints that I thought should fit into memory. Sample code:
>>> lis = range(10 ** 9)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
After thinking a bit, I realized that the error was to be expected, since data types in dynamic languages such as Python tend to take more space than they do in static languages such as C, due to metadata, pre-allocation (for some types) and interpreter book-keeping overhead.

And I remembered the sys.getsizeof() function, which shows the number of bytes used by its argument. So I wrote this code to display the types and sizes of some commonly used types in Python:
from __future__ import print_function
import sys

# data_type_sizes_w_list_comp.py
# A program to show the sizes in bytes, of values of various 
# Python data types.`

# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram - https://vasudevram.github.io

#class Foo:
class Foo(object):
    pass

def gen_func():
    yield 1

def setup_data():
    a_bool = bool(0)
    an_int = 0
    a_long = long(0)
    a_float = float(0)
    a_complex = complex(0, 0)
    a_str = ''
    a_tuple = ()
    a_list = []
    a_dict = {}
    a_set = set()
    an_iterator = iter([1, 2, 3])
    a_function = gen_func
    a_generator = gen_func()
    an_instance = Foo()

    data = (a_bool, an_int, a_long, a_float, a_complex,
        a_str, a_tuple, a_list, a_dict, a_set,
        an_iterator, a_function, a_generator, an_instance)
    return data

data = setup_data()

print("\nPython data type sizes:\n")

header = "{} {} {}".format(\
    "Data".center(10), "Type".center(15), "Length".center(10))
print(header)
print('-' * 40)

rows = [ "{} {} {}".format(\
    repr(item).center(10), str(type(item)).center(15), \
    str(sys.getsizeof(item)).center(10)) for item in data[:-4] ]
print('\n'.join(rows))
print('-' * 70)

rows = [ "{} {} {}".format(\
    repr(item).center(10), str(type(item)).center(15), \
    str(sys.getsizeof(item)).center(10)) for item in data[-4:] ]
print('\n'.join(rows))
print('-' * 70)
(I broke out the last 4 objects above into a separate section/table, since the output for them is wider than for the ones above them.)

Although iterators, functions, generators and instances (of classes) are not traditionally considered as data types, I included them as well, since they are all objects (see: almost everything in Python is an object), so they are data in a sense too, at least in the sense that programs can manipulate them. And while one is not likely to create tens of thousands or more of objects of these types (except maybe class instances [1]), it's interesting to have an idea of how much space instances of them take in memory.

[1] As an aside, if you have to create thousands of class instances, the flyweight design pattern might be of help.

Here is the output of running the program with:
$ python data_type_sizes.py

Python data type sizes:
----------------------------------------
   Data          Type        Length  
----------------------------------------
  False     <type 'bool'>      12    
    0        <type 'int'>      12    
    0L      <type 'long'>      12    
   0.0      <type 'float'>     16    
    0j     <type 'complex'>     24    
    ''       <type 'str'>      21    
    ()      <type 'tuple'>     28    
    []      <type 'list'>      36    
    {}      <type 'dict'>     140    
 set([])     <type 'set'>     116    
----------------------------------------------------------------------

----------------------------------------------------------------------
<listiterator object at 0x021F0FF0> <type 'listiterator'>     32    
<function gen_func at 0x021EBF30> <type 'function'>     60    
<generator object gen_func at 0x021F6C60> <type 'generator'>     40    
<__main__.Foo object at 0x022E6290> <class '__main__.Foo'>     32
----------------------------------------------------------------------

[ When I used the old-style Python class definition for Foo (see the comment near the class keyword in the code), the output for an_instance was this instead:
<__main__.Foo instance at 0x021F6C88> <type 'instance'> 36
So old-style class instances actually take 36 bytes vs. new-style ones taking 32.
]

We can draw a few deductions from the above output.

- bool is a subset of the int type, so takes the same space - 12 bytes.
- float takes a bit more space than long.
- complex takes even more.
- strings and the data types below it in the first table above, have a fair amount of overhead.

Finally, I first wrote the program with two for loops, then changed (and slightly shortened) it by using the two list comprehensions that you see above - hence the file name data_type_sizes_w_list_comp.py :)

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Signup to hear about my new courses and products.

My Python posts     Subscribe to my blog by email

My ActiveState recipes