Proper way of reading in files from a directory using Python 2.6 in bash shell

I am trying to read in files for text processing, and the idea is to run them through Hadoop pseudo distributed file system on my virtual machine, using map-reduce code I am writing. The interface is Ubuntu Linux, I am running Python 2.6 with the installation. I need to use sys.stdin for reading in the files, and sys.stdout so I pass from mapper to reducer. So here is my test code for the mapper:

#!/usr/bin/env python

import sys
import string
import glob
import os

files = glob.glob(sys.stdin)
for file in files:
with open(file) as infile:
txt = infile.read()
txt = txt.split()
print(txt)

I’m not sure how glob works with sys.stdin, but this is not working. I get the following errors:

After testing with piping:

[training@localhost data]$ cat test | ./mapper.py

I get this:

cat: test: Is a directory
Traceback (most recent call last):
File “./mapper.py”, line 8, in <module>
files = glob.glob(sys.stdin)
File “/usr/lib64/python2.6/glob.py”, line 16, in glob
return list(iglob(pathname))
File “/usr/lib64/python2.6/glob.py”, line 24, in iglob
if not has_magic(pathname):
File “/usr/lib64/python2.6/glob.py”, line 78, in has_magic
return magic_check.search(s) is not None
TypeError: expected string or buffer

For the moment I am just trying to read in three small .txt files in one directory.

Thanks!

#python #bash #hadoop

2.55 GEEK