Proper way of reading in files from a directory using Python 2.6 in bash shell

I am trying to read in files for text processing, and the idea is to run them through Hadoop pseudo distributed file system on my virtual machine, using map-reduce code I am writing. The interface is Ubuntu Linux, I am running Python 2.6 with the installation. I need to use sys.stdin for reading in the files, and sys.stdout so I pass from mapper to reducer. So here is my test code for the mapper:

#!/usr/bin/env python
import sys

import string

import glob

import os
files = glob.glob(sys.stdin)

for file in files:

with open(file) as infile:

txt = infile.read()

txt = txt.split()

print(txt)

I’m not sure how glob works with sys.stdin, but this is not working. I get the following errors:

After testing with piping:

[training@localhost data]$ cat test | ./mapper.py

I get this:

cat: test: Is a directory

Traceback (most recent call last):

File “./mapper.py”, line 8, in <module>

files = glob.glob(sys.stdin)

File “/usr/lib64/python2.6/glob.py”, line 16, in glob

return list(iglob(pathname))

File “/usr/lib64/python2.6/glob.py”, line 24, in iglob

if not has_magic(pathname):

File “/usr/lib64/python2.6/glob.py”, line 78, in has_magic

return magic_check.search(s) is not None

TypeError: expected string or buffer

For the moment I am just trying to read in three small .txt files in one directory.

Thanks!

#python #bash #hadoop