Command line python script to get context lines on a search string

grep -B and -B flags don’t work when grep is used on the command line with readline support. So I created this little script that does work on the command line. use -h flag to learn. Here’s how I’ve used it:

ls -al | ./printcontext.py -b 1 -a 1 -d test.txt

This finds the test.txt file and prints the 2 files around it.


#!/usr/bin/python
# print context when using a python script with readline support (command line piping)
# by inderpreetsingh.com

import sys, re
from optparse import OptionParser

def main():
    usage = "usage: %prog [options] needle"
    parser = OptionParser(usage)
    parser.add_option("-b", "--before", type="int", dest="before", default=0,
            help='Before context lines (a la grep)')
    parser.add_option("-a", "--after", type="int", dest="after", default=0,
            help='After context lines')
    parser.add_option("-d", "--debug", action="store_true", dest="debug", default=False,
            help='Debug information.')

    #Not implemented
    #parser.add_option("-o", "--output", type="string", dest="output")
    
    (options, args) = parser.parse_args()
    
    if len(args) != 1:
        parser.error("Specify what you want to search")
    
    needle = args[0]
    if options.debug:
        print "\nNeedle: %s\nBefore context lines: %s\nAfter context lines: %s\n" % (needle, options.before, options.after)
    

    lines = sys.stdin.readlines()
    lines = [x.strip() for x in lines]
    
    lastline = ''
    i = 0
    for line in lines:
        if needle in line:
#        if re.search(needle, line):
            first = max(0, i - options.before)
            last = min(len(lines), i + options.after + 1)
            
            if options.debug:
                print "Found '%s' on line %d, printing line %d to %d" % (needle, i, first, last)
                
            for println in lines[first:last]:
                print println
            print ""
        lastline = line
        i += 1

if __name__ == "__main__":
    main()

Virtualenvwrapper on CentOS/RHEL with Virtualmin

I used the IUS Community and EPEL repositories to install python 2.6 on my RHEL 5.6 Tikanga box. However I faced some errors, which I didn’t see fully documented online so I figured they would come handy to myself and whoever else tries to do similar things.

Problem 1: Error on mkvirtualenv and other commands: No module named virtualenvwrapper.hook_loader
Resolution: After looking at the source code, virtualenvwrapper can’t find our special python installation. Put the following line in your .bashrc:

VIRTUALENVWRAPPER_PYTHON=/usr/bin/python26

along with the other two lines that everybody tells you to put in:

export WORKON_HOME=$HOME/.virtualenvs
source /usr/bin/virtualenvwrapper.sh

(Note that virtualenvwrapper was installed to the above location, this is a different location than the one that everybody else (who is documenting the procedure) is installing at. I’m not sure if this is a new change in virtualenv or because of our special python26 installation. So use locate to find your location properly.)

Problem 2: Virtualenvwrapper commands do not auto-complete or can’t be found also unless .bashrc is sourced manually.
Reason: .bashrc is not executed when logging in to the box (like it should be because it is listed inside .bash_profile, which should be executed also)
Solution: This happens because virtualmin setup’s default $SHELL for each non-root user is /bin/sh. To fix this for on user, open up /etc/passwd, find the user that you are interested in, and change the /bin/sh part to /bin/bash. To fix the default for each virtualmin created user, go to virtualmin’s admin page, under System Customization > Custom Shells > Choose the /bin/bash custom shell.

Django/Python: UnicodeDecodeError error printing Youtube unicoded data

I was having a problem printing Youtube’s Unicode data using my print method:

print "<p>Video: desc=%s</p>" % (vid.desc)

I’m not well versed with Unicode data, so I was just able to brute force out of this problem, and get rid of the UnicodeDecodeError “ordinal out of range”, by doing the following:

print "<p>Video: desc=%s</p>" % (unicode(vid.desc,'iso-8859-1'))

PS: My database information is in utf-8 format. So, in my understanding, this is converting that utf-8 data into iso-8859-1 to show to the users.

Django on CentOS Python 2.6 VirtualEnv Using GeekyMedia RPMs

Django on centos geekymedia

for setuptools (easy_install):

wget http://pypi.python.org/packages/2.6/s/setuptools/setuptools-0.6c11-py2.6.egg#md5=bfa92100bd772d5a213eedd356d64086
easy_install *setuptools*

use it to install pip:
easy_install pip

download MySQLdb and install by:
python26 setup.py build
python26 setup.py install

download virtualenv
mkdir ~/.virtualenvs

add to .bashrc
VIRTUALENVWRAPPER_PYTHON=/usr/bin/python26
source /usr/bin/virtualenvwrapper.sh

initialize virtualenv
mkdir dev
virtualenv dev

Start virtualenv for current session
source dev/bin/activate

now install packages, they will go inside virtualenv (since we are activated)
pip install django
pip install south
pip install pil

Create django project and app
cd dev/
django-admin.py startproject myproj
cd myproj
python manage.py startapp polls

Install Python 2.6 on CentOS 5.x

Steps to success:

  1. Download all the Python 2.6 rpm for your CentOS (i386 or x86_64) from geekymedia.
  2. Install tcl, tk, tix (required dependencies): yum -y install tcl tk tix
  3. Install the geekymedia rpms.

    Note that the python26-libs-2.6-geekymedia1.*.rpm and python26-2.6-geekymedia1.*.rpm must be installed together like this rpm -Uvh python26-libs-2.6-geekymedia1.*.rpm python26-2.6-geekymedia1.*.rpm. Similarly, *tools* and *tkinter* rpm must be installed together like this rpm -Uvh *tools*rpm *tkinter*rpm.

Python Scraping: Scrapy and BeautifulSoup

When I search for solutions to my problems, I often search the internet for “compare and contrast” or analytical posts on the best tools for the job, which in turn help me make an informed decision.

Recently, my problem was scraping a website for data using python. I searched online and a lot of users recommended Scrapy over BeautifulSoup. Well, that was easy, I naively said. Scrapy probably is the better option for most people (it supports XPath right out the box). Like Scrapy’s docs put it:

comparing BeautifulSoup (or lxml) to Scrapy is like comparing jinja2 to Django.

But Scrapy didn’t settle well with my Cent OS platform (or Google Apps Engine). For one, there were a whole lot of problems trying to install Scrapy in my virtualenv (safe python environment system) because of it’s dependency on libxml2/libxslt and their bindings. Examples:


etree.so "undefined symbol: libiconv"
Version 2.6.26 found. You need at least libxml2 2.6.27 for this version of libxslt
ImportError: /pyenv/test/lib/python2.6/site-packages/libxml2mod.so: undefined symbol: xmlTextReaderSetup
No module named libxml2
Failed to find headers. "update includes_dir"

Note: This may look overly dramatic. And it maybe is a little dramatic, because a lot of these errors/problems do have solutions. Most of them can be searched out of Google results.

I endlessly chased solutions at trying to integrate libxml2, libxml2 python bindings, libxslt and lxml in a virtualenv (with python 2.6; note Cent OS/RHEL only have python2.4 in their repositories). I eventually grew tired of trying to find what is linking to what shared library and what seems to be the missing culprit. And I figured, let me just give BeautifulSoup a try. I thought I’d spend the extra time learning the library that BeautifulSoup is, as opposed to learning the “framework” that Scrapy is.

In the end, BeautifulSoup was not that hard. It may be missing XPath support in its default setup, but I could easily implement the XPaths that I had with ones using BeautifulSoup syntax.

Lesson: Don’t let your ego get into it. Save time by going for fairly-efficient solutions that can be implemented in fairly-optimal time (as my Algorithms professor used to say).

Convert Django MySQL Database Tables to Unicode

When I created a Django application, I hadn’t noticed that my MySQL was defaulted to latin character set (probably by Virtualmin or CentOS’s default MySQL values). So I didn’t want to delete my current project and start again. So here are the commands to convert a database to unicode:

for the database

ALTER DATABASE djangodb CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci

on each table do

ALTER TABLE djangotablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci

UnicodeDecodeError (unexpected code byte) on a template

Started receiving this error after I pasted some template code from a WordPress blog (it could also happen from any word processing product like Microsoft Office’s MS Word). The solution to this problem was that I had to look through my code to hunt down the following characters and replace them with their equivalents:
” ‘ ’
Replaced with (respectively):
" ' '

Django: “Error importing authentication backend”

This is probably a very rare error that one may encounter in Django. But I think I should share it here, as it would save about an hour of anybody else who has this problem.

Problem
Exception Type: ImproperlyConfigured at /
Error importing authentication backend

Probable Cause
I was very desparate to change the name of an app inside my Django project. I renamed the folder name and all possible mentions of the application name anywhere in the code and the database tables (Please note: This is not recommended, there is probably a better solution to do this). Once I faced that problem with no clear indication of where I was going wrong, I looked everywhere in the code and the database. After going into panic mode, I tried desperately changing and removing anything that may break. In the end, I ran out of places to find the application name but the error still existed.

Solution
I had noticed after looking at my cookies that I still had cookies from my session, which meant that everytime I connected to the server, I was trying to pass my “delicious” cookies. But just deleting your own cookies won’t do it. The session object of the user was cached in the database inside the table “django_session”. This especially stores the “AUTHENTICATION_BACKENDS” last used. So, truncate the table: TRUNCATE TABLE django_session to finally get rid of this nasty problem.