File Handling
General Information
Submitting to the Auto-Grader
For auto-grading homework assignments in the course, we use Gradescope.
General Submission Info
When submitting files to Gradescope, you need to submit all the files you want graded at once.
You can do this in several ways:
- Select all files you want to submit and drag and drop them in the submission area when prompted.
- Browse your files when prompted by Gradescope and select all files you want to submit.
- Directly compress all the files you want to submit into a zip file (zip file name does not matter), and submit that zip file via drag-and-drop or browsing for it.
- Compress a folder containing the files you want to submit into a zip file (zip file name does not matter), and submit that zip file via drag-and-drop or browsing for it.
Gradecope will look for files with specific names, and those names are case-sensitive.
So if in the instructions we ask for a file named hello.py
,
then Gradescope will not recognize Hello.py
, hello.txt
, or hello.py.txt
.
Group Submissions
When a group of people is submitting an assignment, only one person should submit the solution.
They should then add the other members to that submission. On the Gradescope page for a submission there should be a "+ Add Group Member" button by the name in the upper-right.
This is important because if multiple people submit, our similarity checker will see identical submissions and flag that.
Unlimited attempts
You have unlimited attempts for our auto-graded assignments.
This means that you should submit early and often in order to be confident that code you are writing as you go passes all of our tests.
Academic Honesty
All work that is completed in this assignment is your own group's. You may talk to other students about the problems you are to solve, however, you may not share code in any way, except with your partner(s). What you submit must be your own group's work.
You may not use any code that is posted on the internet. If you are not sure it is in your best interest to contact the course staff. We will be using software that will compare your code to other students in the course as well as online resources. It is very easy for us to detect similar submissions and will result in a failure for the exercise or possibly a failure for the course. Please, do not do this. It is important to be academically honest and submit your work only. Please review the UMass Academic Honesty Policy and Procedures so you are aware of what this means.
Copying partial or whole solutions, obtained from other students or elsewhere, is academic dishonesty. Do not share your code with your classmates, and do not use your classmates' code. If you are confused about what constitutes academic dishonesty you should re-read the course policies. We assume you have read the course policies in detail and by submitting this project you have provided your virtual signature in agreement with these policies.
About
The aim of this homework assignment is to work with files in Python. Even though we implement functions in Python in this class, the skills developed in this assignment will be relevant for any modern programming language.
A file is a collection of data stored on a disk in one unit identified by a name. The process of file handling pertains to how we store the available data in a file with the help of a computer program. Data can be retrieved from these files and used in any program written in a modern programming language. Files are an essential part of computer systems. File Handling includes operations such as opening existing files and editing them, creating new files, removing existing files among several other operations.
Even though we work with files in this assignment, be aware that you will be required to use the knowledge and skills you have acquired so far to succeed in this assignment. This includes defining functions, evaluating boolean expressions, evaluating arithmetic expressions, taking user input, working with f-strings, working with sequence types including lists and strings, branching, looping, working with modules, and handling exceptions. You may want to review them before you proceed.
Learning Objectives
- Practice working with files in Python.
- Practice writing modular code in Python.
- Practice throwing and catching exceptions in Python.
Estimated Size
This assignment consists of problems. Unlike previous assignments, there is required problem in this assignment and you cannot attempt remaining problems without successfully completing that problem.
- Each problem specifies a few functions to implement.
- Each function can be reasonably implemented in at most lines of code, though you would find yourself attempting most in lines of code.
- Many functions have simple computations, similar to the previous assignments.
- Some functions require you to use other functions implemented by you in the same file.
- Some functions may require you to create new variables, lists, strings, etc.
Note
- You do not need to use any assert statements in this assignment.
0. Setup
Make a folder on your computer and open it in VSCode.
You must create files in this folder. Make sure that you do not rename any of these files.
basic_io.py
text_io.py
file_info.py
csv_parse.py
csv_split.py
csv_stats.py
Remember that for every file you need to have comments indicating the authors at the top. For example:
# Authors : Jared Yeager, Tim Richards
# Emails : jyeager@cs.umass.edu, richards@cs.umass.edu
# Spire IDs : 31415926, 27182818
The # Author
/# Authors
, # Email
/# Emails
, and # Spire ID
/# Spire IDs
are the only really necessary formatting details.
Some copying and pasting here will save you some time and sanity.
Note that these details are necessary for the auto-grader to recognize your solution.
1. Implement basic_io.py
(REQUIRED)
Note: You must implement basic_io.py correctly to be able to work on other problems in this assignment. Without correctly implementing all functions in this file, you cannot solve other problems in this assignment since they depend on importing functions from basic_io.py.
It can be somewhat overwhelming to start working with files. This problem is intended to be a gentle introduction to file handling in Python. The focus is on getting used to the syntax, basic error handling with files and reading as well as writing to and from files.
In this problem, you will implement some simple functions that will help you get comfortable with file handling in Python.
Note that every part of this problem will require an informative error message if there is an exception while opening a file. If you call a function from an earlier part that already catches exceptions, then you should not have to implement an exception handler in the current function. For example, if your read_string
function handles an exception and you call read_string
from read_lines
, then your read_lines
function won't require an exception check. Also note that the auto-grader will be looking for the exact error message Error occurred when opening FNAME to read
, where FNAME
should be replaced with the actual name of the file.
1a. Implement read_string
(1 point)
One of the most fundamental tasks in file handling is to open a file and read its contents as a string.
Your task is to implement a function that takes as argument the name of a file fname
. The function should first attempt to open the file fname
. If an exception occurs while opening the file, the function should display the informative error message Error occurred when opening FNAME to read
, where FNAME
should be replaced with the actual name of the file, and the function should then return None
. If an exception does not occur while opening, your code should read the contents of the file and return the contents.
The function should have the following signature:
def read_string(fname)
Example:
Assume a file cics110.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
print(read_string('cics110.txt'))
you should get as output the following:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
1b. Implement read_lines
(1 point)
A quick variation is to open a file and read its contents as a string line by line i.e. every line in the file is read one by one. This is in contrast to 1a., where we read the entire file into one string.
Your task is to implement a function that takes as argument the name of a file fname
. This function should use read_string
to read the contents of fname
. If an exception occurs while opening the file, the function should display the informative error message Error occurred when opening FNAME to read
, where FNAME
should be replaced with the actual name of the file, and the function should then return None
. If an exception does not occur while opening, it should return a list containing the lines in the fname
one by one.
The function should have the following signature:
def read_lines(fname)
Example:
Assume a file cics110.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
print(read_lines('cics110.txt'))
you should get as output the following:
['CICS 110 students will succeed in their future endeavors.', 'Allison is an instructor for CICS 110.', 'Manan is a TA for CICS 110.', 'Ben is a UCA for CICS 110.']
Hint: Using read_string
, you can get a string that stores the contents of the file fname
. Consider splitting this string into a list of strings using a function you have seen in this class before.
1c. Implement write_string
(1 point)
Another fundamental task in file handling is to open a file and write text
into the file, where text
is some string.
Your task is to implement a function that takes as argument the name of a file fname
and a string text
. The function should first attempt to open the file fname
. If an exception occurs while opening the file, the function should display the informative error message Error occurred when opening FNAME to write
, where FNAME
should be replaced with the actual name of the file, and the function should then return None
. If an exception does not occur while opening, your code should write into the file text
.
The function should have the following signature:
def write_string(fname, text)
Example:
Assume a file cics110.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
write_string('cics110.txt', 'CICS 110 is an introductory python class.')
cics110.txt
should contain only the following:
CICS 110 is an introductory python class.
1d. Implement write_lines
(1 point)
A quick variation is to open a file and write to it line by line i.e. every string in a provided list is written into the file line by line.
Your task is to implement a function that takes as argument the name of a file fname
and a list of strings text
. If an exception occurs while opening the file, the function should display the informative error message Error occurred when opening FNAME to write
, where FNAME
should be replaced with the actual name of the file, and the function should then return None
. If an exception does not occur while opening, this function should use write_string
to write the elements of text
into the file fname
line by line.
The function should have the following signature:
def write_lines(fname, text)
Example:
Assume a file cics110.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
write_lines('cics110.txt', ['Hello,', 'World'])
cics110.txt
should contain only the following:
Hello,
World
Hint: The newline character "\n"
will be useful in this problem. One possible approach could be to use list comprehension to add a "\n"
to the end of each line in text
, save in another list, and use "".join()
to create a string from this new list which can be then written into the file using write_string
.
1e. Implement append_string
(1 point)
A third fundamental task in file handling is to open a file and write contents into it without erasing the existing contents of the file i.e. add new content at the end of the file.
Your task is to implement a function that takes as argument the name of a file fname
and a string text
. If an exception occurs while opening the file, the function should display the informative error message Error occurred when opening FNAME to append
, where FNAME
should be replaced with the actual name of the file, and the function should then return None
. If an exception does not occur while opening, your code should write into the file text
after the existing content of the file.
The function should have the following signature:
def append_string(fname, text)
Example:
Assume a file cics110.txt
with the following contents (note the newline at the end):
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
append_string('cics110.txt', 'Tanush is also a UCA.')
cics110.txt
should contain the following:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
Tanush is also a UCA.
Hint: This is essentially the same as write_string
but instead of opening the file in write mode, you need to open it in append mode.
1f. Implement append_lines
(1 point)
A quick variation is to open a file and append to it line by line i.e. every string in a provided list is written into the file line by line after the existing content in the file.
Your task is to implement a function that takes as argument the name of a file fname
and a list of strings lines
. If an exception occurs while opening the file, the function should display the informative error message Error occurred when opening FNAME to append
, where FNAME
should be replaced with the actual name of the file, and the function should then return None
. If an exception does not occur while opening, the function should use append_string
to append the elements of lines
into the file fname
line by line after the existing content of the file.
The function should have the following signature:
def append_lines(fname, lines)
Example:
Assume a file cics110.txt
with the following contents (note the newline at the end):
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
append_lines('cics110.txt', ['Tanush is a UCA.', 'Lana is a UCA.', 'Dev is a UCA.'])
cics110.txt
should contain the following:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
Tanush is a UCA.
Lana is a UCA.
Dev is a UCA.
Hint: This is essentially the same as 1d. but instead of write mode, you should open the file in append mode. The hint in 1d. applies for this problem as well.
2. Implement text_io.py
Now that we are comfortable working with basic file handling operations, we will move to some more interesting file operations. Note that you will need to use the functions you implemented in basic_io.py
so don't forget to import it into text_io.py
!
2a. Implement echo_to_file
(2 points)
Remember the echo_until_quit
function we implemented in the previous homework? Here is a slight variation of that problem with some file handling.
Your task is to implement a function that takes as arguments the name of a file fname
and a boolean append
which is True
if the user wants to append the content to the file fname
and False
when the user wants to rewrite the content of the file fname
. This function should repeatedly ask the user for an input string until the user types the string quit
and append these strings to the file fname
if append
is True
and rewrite the file fname
with only the strings entered by the user if append
is False
.
Note: quit
should not be appended/written into fname
.
Note: Your code should handle quit
in any case such as QUIT
, quit
and QuiT
.
The function should have the following signature:
def echo_to_file(fname, append)
Example 1:
Assume a file cics110.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
echo_to_file('cics110.txt', True)
and enter the following sequence of strings in the terminal:
Enter line of text or "quit": Hello,
Enter line of text or "quit": World!
Enter line of text or "quit": This
Enter line of text or "quit": is
Enter line of text or "quit": a python program
Enter line of text or "quit": quit
cics110.txt
should contain the following:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
Hello,
World!
This
is
a python program
Example 2:
Assume a file cics110.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
echo_to_file('cics110.txt', False)
and enter the following sequence of strings in the terminal:
Enter line of text or "quit": Hello,
Enter line of text or "quit": World!
Enter line of text or "quit": This
Enter line of text or "quit": is
Enter line of text or "quit": a python program
Enter line of text or "quit": quit
cics110.txt
should contain the following:
Hello,
World!
This
is
a python program
Hint: Store all inputs in a variable in Python and then use append_lines
and write_lines
appropriately.
2b. Implement print_file
(1 point)
It can be important in some situations to display the content of a file.
Your task is to implement a function that takes as argument the name of a file fname
. If an exception occurs while opening the file, the function should display the informative error message Error occurred when opening FNAME to read
, where FNAME
should be replaced with the actual name of the file, and the function should then return None
. If an exception does not occur while opening, the function should print (not return!) the content of the file if the file is non-empty.
The function should have the following signature:
def print_file(fname)
Example:
Assume a file cics110.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
print_file('cics110.txt')
you should get the following output:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
2c. Implement copy_file
(1 point)
It can be important in some situations to duplicate the content from one file into another file.
Your task is to implement a function that takes as arguments the names of two files: from_fname
, the file whose content you need to copy, and to_fname
, the file into which you should duplicate the content of the file from_fname
. This function should read content from the file from_fname
and write into the file to_fname
.
The function should have the following signature:
def copy_file(from_name, to_fname)
Example:
Assume a file cics110.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
and another file temp.txt
with the following contents:
Kobi is an instructor.
Jared is an instructor.
Tim is the course chair.
If you run the following code statement,
copy_file('cics110.txt', 'temp.txt')
then temp.txt
should contain the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
and cics110.txt
should remain unchanged.
2d. Implement erase_file
(1 point)
It can be important in some situations to remove all the content in a file.
Your task is to implement a function that takes as arguments the name of a file fname
. The function should then remove all the content in the file i.e. the file should be empty after a call to this function. If the file does not exist, the function should create an empty file.
The function should have the following signature:
def erase_file(fname)
Example:
Assume a file cics110.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
erase_file('cics110.txt')
then cics110.txt
should be an empty file with no content.
hint: If a file does not exist, then you are making a file. If a file does exist, then you are truncating the file's content. What mode for opening will allow for this logic to happen?
3. Implement file_info.py
Let's move on to some cool applications of files! In this problem, you will implement functions that will analyze the words and characters in a file and report cool statistics such as word counts and character counts.
For the purpose of this problem, we make assumptions:
- Every sentence ends with a
.
- Sentences contain only words (all alphabetic characters) or numbers (all digits).
Additionally, you will find the following useful for this problem:
string.digits
is the string'0123456789'
.string.ascii_letters
is the concatenation of'abcdefghijklmnopqrstuvwxyz'
and'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
.
You must import both of these from string
module.
3a. Implement get_info
(3 points)
Analyzing a file can help us understand the contents of the file better. In this problem, we will report some interesting file statistics.
Your task is to implement a function that takes as argument the name of a file fname
.
If an exception occurs while opening the file,
the function should display the informative error message Error occurred when opening FNAME to read
,
where FNAME
should be replaced with the actual name of the file,
and the function should then return None
.
If an exception does not occur while opening,
the function should return a dictionary formatted as under reporting the following statistics:
{
lines: LINE_COUNT,
sentences: SENTENCE_COUNT,
words: WORD_COUNT,
numbers: NUMBERS_COUNT,
letters: LETTERS_COUNT,
digits: DIGIT_COUNT
}
where LINE_COUNT represents the total number of lines of text in the file, SENTENCE_COUNT represents the total number of sentences in the file, WORD_COUNT represents the total number of words in the file, NUMBERS_COUNT represents the total number of numbers in the file, LETTERS_COUNT represents the total number of alphabetic characters in the file, and DIGITS_COUNT represents the total number of digits in the file.
The function should have the following signature:
def get_info(fname)
Example:
Assume a file cics110.txt
with the following contents:
CICS 110 is an introduction to computer programming and problem solving using computers.
This course teaches you how problems in the real world can be solved computationally using programming constructs and data abstractions of a modern programming language.
Concepts and techniques covered range from variables and expressions to logic and control flow to data structures and classes.
We will also cover how to translate problems into a sequence of instructions as well as how to trace though and debug programs.
No previous programming experience required. This is a 4 credit class. There are 6 sections being offered in Spring 2023.
If you run the following code statement,
print(get_info('cics110.txt'))
then you should get the following output:
{
'lines': 5,
'sentences': 7,
'words': 96,
'numbers': 4,
'letters': 520,
'digits': 9
}
Hint: You can use read_string
from part 1 to read the contents of the file and store it in a variable, say text
.
To count the number of lines, you can get the length of the list returned by the .splitlines()
string method.
To count the number of sentences, you can simply count the occurrences of "."
(there is a .count()
string method).
You can then get a list of "tokens" (words and numbers) by taking text
,
replacing the periods ("."
)s with nothing ""
(there is a .replace()
string method),
replacing the newlines ("\n"
) with spaces (" "
), and splitting on the spaces (there is a .split()
method).
To count occurrences of words, numbers, characters, and digits, you can iterate over all words,
maintaining and updating counters for each of these.
You can identify if a character is a digit using string.digits
.
Likewise, you can identify if a character is a letter using string.ascii_letters
.
3b. Implement get_info_files
(2 points)
In the previous problem, we analyzed a single file for statistics such as sentence count, word count, etc. In this problem, we will analyze the same statistics for multiple files.
Your task is to implement a function that takes as argument a list fname_list
containing names of certain files. If a file in the list cannot be opened, it should be ignored (and an error message should be displayed, but if you're calling get_info
from this function then you shouldn't need to reimplement the error message logic). This function should return a dictionary formatted as under reporting the following statistics:
{
FILE_1_NAME:
{
lines: LINE_COUNT,
sentences: SENTENCE_COUNT,
words: WORD_COUNT,
numbers: NUMBERS_COUNT,
letters: LETTERS_COUNT,
digits: DIGIT_COUNT
}
FILE_2_NAME:
{
lines: LINE_COUNT,
sentences: SENTENCE_COUNT,
words: WORD_COUNT,
numbers: NUMBERS_COUNT,
letters: LETTERS_COUNT,
digits: DIGIT_COUNT
}
.....
}
where LINE_COUNT represents the total number of lines of text in the file, SENTENCE_COUNT represents the total number of sentences in the file, WORD_COUNT represents the total number of words in the file, NUMBERS_COUNT represents the total number of numbers in the file, LETTERS_COUNT represents the total number of alphabetic characters in the file, and DIGITS_COUNT represents the total number of digits in the file, FILE_1_NAME refers to the name of the first file in the list, and so on.
The function should have the following signature:
def get_info_files(fname_list)
Example:
Assume a file cics110.txt
with the following contents:
CICS 110 is an introduction to computer programming and problem solving using computers.
This course teaches you how problems in the real world can be solved computationally using programming constructs and data abstractions of a modern programming language.
Concepts and techniques covered range from variables and expressions to logic and control flow to data structures and classes.
We will also cover how to translate problems into a sequence of instructions as well as how to trace though and debug programs.
No previous programming experience required. This is a 4 credit class. There are 6 sections being offered in Spring 2023.
Assume another file temp.text
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
print(get_info_files(['cics110.txt', 'temp.txt']))
then you should get the following output:
{
'cics110.txt': {
'lines': 5,
'sentences': 7,
'words': 96,
'numbers': 4,
'letters': 520,
'digits': 9
},
'temp.txt': {
'lines': 4,
'sentences': 4,
'words': 26,
'numbers': 4,
'letters': 106,
'digits': 12
}
}
hint: Iterate over the parameter list, call 3a., and add the filename and dictionary to the dictionary if the return of 3a is not None)
3c. Implement word_and_number_counts
(2 points)
Analyzing a file can help us understand the contents of the file better. In this problem, you will count the occurrences of words in a given file.
Your task is to implement a function that takes as argument the name of a file fname
. If an exception occurs while opening the file, the function should display the informative error message Error occurred when opening FNAME to read
, where FNAME
should be replaced with the actual name of the file, and the function should then return None
. If an exception does not occur while opening, the function returns a dictionary that displays each word and number in the file as key and its count in the file as the value.
The function should have the following signature:
def word_and_number_counts(fname)
Note: 'HELLO' and 'hello' are the same words. To deal with this issue, convert the text into lowercase inside the function.
Note: You must strip off the .
wherever present. Consider using .replace()
to do so.
Example:
Assume a file cics110.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
print(word_and_number_counts('cics110.txt'))
then you should get the following output:
{
'cics': 4,
'110': 4,
'students': 1,
'will': 1,
'succeed': 1,
'in': 1,
'their': 1,
'future': 1,
'endeavors': 1,
'allison': 1,
'is': 3,
'an': 1,
'instructor': 1,
'for': 3,
'manan': 1,
'a': 2,
'ta': 1,
'ben': 1,
'uca': 1
}
Notice that in context of this problem, 110
is considered to be a word and its occurrences are counted. That is, numbers are left in string form so the function is technically counting the occurrences of words and numbers.
Hint: By now you probably know how to get a list of words after reading a file. If not, take a look at the hint in 3a. Once you have this list, iterate over all the words in this list, adding the word as a key to the dictionary with value 1 if the word is initially not present in the dictionary. Otherwise, if the word is present in the dictionary, increment the value by 1. This is similar to what you did in Lab 7.
3d. Implement word_and_number_counts_files
(1 point)
In the previous problem, we analyzed a single file for reporting word counts. In this problem, we will analyze the count of each unique word for multiple files.
Your task is to implement a function that takes as argument a list fname_list
containing names of certain files. This function returns a nested dictionary that displays the file name as the key and the dictionary representing word counts for that file (the dictionary you computed in 3c.) as the value.
The function should have the following signature:
def word_and_number_counts_files(fname)
Example:
Assume a file cics110.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
Tim is the course chair.
Students in this class are awesome.
and another file temp.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
print(word_and_number_counts_files(['temp.txt', 'cics110.txt']))
then you should get the following output:
{
'temp.txt': {
'cics': 4,
'110': 4,
'students': 1,
'will': 1,
'succeed': 1,
'in': 1,
'their': 1,
'future': 1,
'endeavors': 1,
'allison': 1,
'is': 3,
'an': 1,
'instructor': 1,
'for': 3,
'manan': 1,
'a': 2,
'ta': 1,
'ben': 1,
'uca': 1
}
'cics110.txt': {
'cics': 4,
'110': 4,
'students': 2,
'will': 1,
'succeed': 1,
'in': 2,
'their': 1,
'future': 1,
'endeavors': 1,
'allison': 1,
'is': 4,
'an': 1,
'instructor': 1,
'for': 3,
'manan': 1,
'a': 2,
'ta': 1,
'ben': 1,
'uca': 1,
'tim': 1,
'the': 1,
'course': 1,
'chair': 1,
'this': 1,
'class': 1,
'are': 1,
'awesome': 1
}
}
3e. Implement letter_and_digit_counts
(2 points)
Analyzing a file can help us understand the contents of the file better. In this problem, will count the occurrences of characters in a given file.
Your task is to implement a function that takes as argument the name of a file fname
. If an exception occurs while opening the file, the function should display the informative error message Error occurred when opening FNAME to read
, where FNAME
should be replaced with the actual name of the file, and the function should then return None
. If an exception does not occur while opening, the function returns a dictionary that displays each character in the file as key and its count in the file as the value. It should return None
if the file is empty.
The function should have the following signature:
def letter_and_digit_counts(fname)
Note: 'HELLO' and 'hello' have the same characters. To deal with this issue, convert the text into lowercase inside the function.
Example:
Assume a file cics110.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
print(letter_and_digit_counts('cics110.txt'))
then you should get the following output:
{
'c': 12,
'i': 12,
's': 13,
'1': 8,
'0': 4,
't': 7,
'u': 6,
'd': 3,
'e': 8,
'n': 9,
'w': 1,
'l': 4,
'h': 1,
'r': 8,
'f': 4,
'a': 9,
'v': 1,
'o': 6,
'm': 1,
'b': 1
}
Hint: By now you probably know how to get a list of words after reading a file. If not, take a look at the hint in 3a. Once you have this list, iterate over all the words in this list character-by-character, adding the character as a key to the dictionary with value 1 if the word is initially not present in the dictionary. Otherwise, if the character is present in the dictionary, increment the value by 1. This is similar to what you did in Lab 7. Another way could be to split every word into characters and simply iterating over these characters maintaining the dictionary in the same way as mentioned above.
3f. Implement letter_and_digit_counts_files
(1 point)
In the previous problem, we analyzed a single file for reporting character counts. In this problem, we will analyze the count of each unique character for multiple files.
Your task is to implement a function that takes as argument a list fname_list
containing names of certain files. This function returns a nested dictionary that displays the file name as the key and the dictionary representing character counts for that file (the dictionary you computed in 3e.) as the value.
The function should have the following signature:
def letter_and_digit_counts_files(fname)
Example:
Assume a file cics110.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
Tim is the course chair.
Students in this class are awesome.
and another file temp.txt
with the following contents:
CICS 110 students will succeed in their future endeavors.
Allison is an instructor for CICS 110.
Manan is a TA for CICS 110.
Ben is a UCA for CICS 110.
If you run the following code statement,
print(letter_and_digit_counts_files(['temp.txt', 'cics110.txt']))
then you should get the following output:
{
'temp.txt': {
'c': 12,
'i': 12,
's': 13,
'1': 8,
'0': 4,
't': 7,
'u': 6,
'd': 3,
'e': 8,
'n': 9,
'w': 1,
'l': 4,
'h': 1,
'r': 8,
'f': 4,
'a': 9,
'v': 1,
'o': 6,
'm': 1,
'b': 1
},
'cics110.txt': {
'c': 15,
'i': 17,
's': 21,
'1': 8,
'0': 4,
't': 12,
'u': 8,
'd': 4,
'e': 14,
'n': 11,
'w': 2,
'l': 5,
'h': 4,
'r': 11,
'f': 4,
'a': 13,
'v': 1,
'o': 8,
'm': 3,
'b': 1
}
}
4. Implement csv_parse.py
In this problem, you will learn how to parse a CSV file. A CSV file, short for Comma-Separated Values, is a plain text file that contains data in a tabular format, with each row representing a record and each column representing a field. The values in each field are separated by a comma. CSV files can be opened and edited by many different software applications, including spreadsheet programs like Microsoft Excel, and are commonly used for data storage, transfer, and analysis.
Below is an example of students.csv
. Each row represents a record for a different student. Each record has 12 values (separated by commas) that correspond to 12 different fields: student name, section, score1, score2, ..., score10. We will work with the following students.csv
for this problem (note that the auto-grader will use a version of students.csv
with more students and sections).
Noa Marijus,A,91.4,82.53,86.52,84.7,81.69,84.11,86.29,91.63,92.42,94.7
Christa Maple,B,97.37,80.58,88.78,68.53,88.3,73.72,74.44,79.22,78.09,74.82
Laurelle Cecil,B,92.4,84.53,86.52,84.7,81.69,84.11,86.29,91.63,92.42,94.5
Ryanne Rusty,A,96.2,80.3,88.78,68.53,88.3,73.72,74.44,79.22,78.09,75.32
You will need to use the functions you implemented in basic_io.py
here, so be sure to import basic_io
at the top of csv_parse.py
.
4a. Implement read_csv
(2 points)
In order to read a CSV file, we need to read its contents line-by-line. We already created a function that will read a file line-by-line (read_lines
in basic_io.py
) and we will use this function here. We also want to store the file's contents into a list such that each record of the CSV file is a dictionary element in this list.
The function should have the following signature:
def read_csv(fname)
fname
is the filename (e.g., 'students.csv'). The function should call read_lines
from basic_io.py
and store the result into a string (let's call it lines). For each line in lines, you need to create a dictionary with the following key-value pairs:
- key = name, value = name of a student (this is always the 1st value in a record)
- key = section, value = section letter (this is always the 2nd value in a record)
- key = scores, value = list of 10 scores (this is always the 3rd to last values in a record and the values are floats)
- key = average, value = average of all scores, rounded to 3 decimal places (this is not a value in the CSV file; you need to calculate this)
Your function should return the list of dictionaries.
Example of returned list of dictionaries:
[{'name': 'Noa Marijus', 'section': 'A', 'scores': [91.4, 82.53, 86.52, 84.7, 81.69, 84.11, 86.29, 91.63, 92.42, 94.7], 'average': 87.599},
{'name': 'Christa Maple', 'section': 'B', 'scores': [97.37, 80.58, 88.78, 68.53, 88.3, 73.72, 74.44, 79.22, 78.09, 74.82], 'average': 80.385},
{'name': 'Laurelle Cecil', 'section': 'B', 'scores': [92.4, 84.53, 86.52, 84.7, 81.69, 84.11, 86.29, 91.63, 92.42, 94.5], 'average': 87.879},
{'name': 'Ryanne Rusty', 'section': 'A', 'scores': [96.2, 80.3, 88.78, 68.53, 88.3, 73.72, 74.44, 79.22, 78.09, 75.32], 'average': 80.29}]
If read_lines
returns None
, then an error occurred when opening the file. It's best to catch this error with an if statement that simply returns None
if lines
is None
. This check should happen before creating the dictionary.
Hint: Since each record of the CSV is a comma-separated string, you can use .split(',')
to split the string into a list. You can then index this list to retrieve specific values. For example, given the string txt = "apple,banana,carrot,donut"
, I can split the string by calling txt.split(',')
which will return ["apple", "banana", "carrot", "donut"]
. I can now get the 1st value by indexing the list at index 0: list[0]
. Splitting and indexing can also be done in one line: txt.split(',')[0]
.
4b. Implement write_csv
(1 point)
We want to write the contents of the CSV file to another file. We already created a function that will write to a file (write_lines
in basic_io.py
) and we will use this function here.
The function should have the following signature:
def write_csv(fname, student_list)
fname
is the filename (e.g., 'output.txt') and student_list
is the contents of the CSV file in a list data structure (i.e., the output of read_csv
). For each dictionary in students_list, you need to create a string that contains the student name, section, and 10 scores with commas separating each value. That is, the string should be identical to one record in the original CSV file.
Example of output.txt
:
Noa Marijus,A,91.4,82.53,86.52,84.7,81.69,84.11,86.29,91.63,92.42,94.7
Christa Maple,B,97.37,80.58,88.78,68.53,88.3,73.72,74.44,79.22,78.09,74.82
Laurelle Cecil,B,92.4,84.53,86.52,84.7,81.69,84.11,86.29,91.63,92.42,94.5
Ryanne Rusty,A,96.2,80.3,88.78,68.53,88.3,73.72,74.44,79.22,78.09,75.32
Since write_lines
takes a list as the second parameter, we want to store each of these strings in a single list and pass this list to write_lines
.
Hint: Although it can be done, you are highly encouraged to NOT implement this function using only one line of code. Rather, it will be easier to tackle this problem using a loop (e.g., for student in student_list
) and using variables for the multiple pieces (e.g., name = student['name']
). For getting the scores to print in a comma-separated string, you can use python's map
function to map the scores to a string, and then join
the mapping into a string using a comma as a separator (e.g., ','.join(...)
).
5. Implement csv_split.py
CSV files can hold a lot of data. For this reason, it is common to want to filter or split on values. In this problem, you will practice both filtering and splitting on section and average. We will continue to work with students.csv
for this problem.
You will need to use the functions you implemented in basic_io.py
and csv_parse.py
here, so be sure to import both at the top of csv_split.py
.
5a. Implement filter_section
(1 point)
We want to be able to filter the CSV records such that only students of a specific section (e.g., section A) are shown. We can use list comprehensions to do this.
The function should have the following signature:
filter_section(student_info, section_string)
student_info
is the list of all students returned by read_csv
in csv_parse
, and section_string
is a string with the section name (e.g., "A"). Using list comprehensions, this function should return every record in which section is equal to section_string
.
Example:
filter_section(csv_parse.read_csv("students.csv"), "A")
could return:
[{'name': 'Noa Marijus', 'section': 'A', 'scores': [91.4, 82.53, 86.52, 84.7, 81.69, 84.11, 86.29, 91.63, 92.42, 94.7], 'average': 87.599},
{'name': 'Ryanne Rusty', 'section': 'A', 'scores': [96.2, 80.3, 88.78, 68.53, 88.3, 73.72, 74.44, 79.22, 78.09, 75.32], 'average': 80.29}]
5b. Implement filter_average
(1 point)
Similar to filter_section
, we want to be able to filter the CSV records such that only students with an average score within some threshold (e.g., 80 <= average < 90) are shown.
The function should have the following signature:
filter_average(student_info, min_inc, max_exc)
student_info
is the list of all students returned by read_csv
in csv_parse
, min_inc
is the minimum value of the threshold (it is inclusive), and max_exc
is the maximum value of the threshold (it is exclusive). Using list comprehensions, this function should return every record in which min_inc <= average < max_exc
.
Example:
filter_average(csv_parse.read_csv("students.csv"), 80, 85)
could return:
[{'name': 'Christa Maple', 'section': 'B', 'scores': [97.37, 80.58, 88.78, 68.53, 88.3, 73.72, 74.44, 79.22, 78.09, 74.82], 'average': 80.385},
{'name': 'Ryanne Rusty', 'section': 'A', 'scores': [96.2, 80.3, 88.78, 68.53, 88.3, 73.72, 74.44, 79.22, 78.09, 75.32], 'average': 80.29}]
5c. Implement split_section
(3 point)
Splitting refers to splitting the file into multiple, smaller files based on some criteria. We wish to split students.csv
into multiple files, one for each section: students_section_A.csv
and students_section_B.csv
(note that the students.csv
used by the auto-grader will have an unknown number of sections). Each file will contain only records for students belonging to each respective section.
The function should have the following signature:
def split_section(fname)
fname
is the original filename (e.g., 'students.csv'
). Your function should do the following in order:
- Read the CSV file by calling
read_csv
incsv_parse
. Let's call the output ofread_csv
students
. - If
read_csv
returnsNone
, then an error occurred when opening the file. It's best to catch this error with an if statement that simply returnsNone
ifstudents
isNone
. - Get a set of sections. DO NOT hardcode a set. Rather, use set comprehensions to find all sections in
students
. Let's call this setsections
. - For each section in
sections
, filter thestudents
bysection
and usewrite_csv
incsv_parse
to write the contents to a file who's name is of the formFROMFILE_section_SECTION.csv
, whereFROMFILE
isfname
(the file we are splitting) and whereSECTION
is the actual section (e.g.,students_section_A.csv
).
Example of students_section_A.csv
:
Noa Marijus,A,91.4,82.53,86.52,84.7,81.69,84.11,86.29,91.63,92.42,94.7
Ryanne Rusty,A,96.2,80.3,88.78,68.53,88.3,73.72,74.44,79.22,78.09,75.32
5d. Implement split_average
(2 points)
Similar to split_section
, we wish to split students.csv
into multiple files, one for each average score range:
- score 4: 85 <= average
- score 3: 75 <= average < 85
- score 2: 60 <= average < 75
- score 1: average < 60
The function should have the following signature:
def split_average(fname)
fname
is the original filename (e.g., 'students.csv'). You should follow the same steps as in split_section
. Instead of a set of sections, however, you will need a dictionary of scores. You CAN hardcode this dictionary as follows:
scores = {"1": (-1,60), "2": (60,75), "3": (75,85), "4": (85,101)}
The output file should have a name of the form FROMFILE_score_SCORE.csv
,
where FROMFILE
is fname
(the file we are splitting) and
where SCORE
is the score category (either 1, 2, 3, or 4).
Example of students_score_4.csv
:
Noa Marijus,A,91.4,82.53,86.52,84.7,81.69,84.11,86.29,91.63,92.42,94.7
Laurelle Cecil,B,92.4,84.53,86.52,84.7,81.69,84.11,86.29,91.63,92.42,94.5
In the case that no students belong to a particular score category (e.g., in students.csv
, there are no students that have a score of 1), you can do one of two things: (1) do not create a file for that score, or (2) create an empty file for that score category.
*Hint: You'll want to use filter_average
to help you find the the students who belong in each score category.
6. Implement csv_stats.py
When working with data, it is common to calculate statistics (e.g., mean, min, max, range, standard deviation) on this data. In this problem, you will practice calculating these stats on the data as well as writing these stats to files. We will continue to work with students.csv
for this problem.
You will need to use the functions you implemented in basic_io.py
, csv_parse.py
, and csv_split
here, so be sure to import all three at the top of csv_stats.py
.
Note: 6a and 6b are recommended helper functions that will not be tested by the auto-grader. You are encouraged to test them locally.
6a. Implement get_stats
The first thing we need to do is create a function that will calculate all of the stats we wish to work with. Given a list of numbers called num
, we can calculate the mean, min, max, range, and standard deviation by using several built-in python functions:
mean = sum(nums) / len(nums)
min = min(nums)
max = max(nums)
range = max - min
std_dev = [(num-mean)**2 for num in nums])/len(nums))**(1/2)
Write a function that returns these stats as a dictionary. The function should have the following signature:
def get_stats(nums)
nums
is a list of numbers.
Example:
get_stats([69.15, 85.71, 94.65, 93.49, 75.05])
should return:
{'mean': 83.61, 'std_dev': 10.062516583837263, 'min': 69.15, 'max': 94.65, 'range': 25.5}
6b. Implement get_assignment_stats
We will now create a function to extract the stats per a field/column of students.csv
. More specifically, we will calculate the stats for 11 columns: each of the 10 score columns and the average column.
The function should have the following signature:
def get_assignment_stats(student_info)
student_info
is the list of all students returned by read_csv
in csv_parse
. We can implement this function in three steps:
- Step 1: Map over
student_info
to create a list of score lists (let's call itscores_per_student
). Each list inscores_per_student
will contain the average value followed by the 10 score values (average must be at index 0, followed by score 1, score 2, ..., score 10). The length ofscores_per_student
should equal the length ofstudent_info
. Each list inscores_per_student
should have a length of 11. - Step 2: Transpose this list. Transposing a list of lists essentially means we are 'flipping' the data structure such that the rows become columns and the columns become rows. For example, if Step 1 creates 33 lists, each of length 11, inside
scores_per_student
, transposingscores_per_student
will flip it such that there are 11 lists, each of length 33. You can transpose the list by creating a new list of 11 empty lists (let's call itscores_per_assignment
) and looping overscores_per_student
, appending values intoscores_per_assignment
(hint: think nested loops). - Step 3: Map
get_stats
overscores_per_assignment
and return this mapping.
Example:
get_assignment_stats(csv_parse.read_csv("students.csv"))
should return:
[{'mean': 84.03825, 'std_dev': 3.702226179949031, 'min': 80.29, 'max': 87.879, 'range': 7.588999999999999},
{'mean': 94.3425, 'std_dev': 2.502382614629505, 'min': 91.4, 'max': 97.37, 'range': 5.969999999999999},
{'mean': 81.985, 'std_dev': 1.7020061692015118, 'min': 80.3, 'max': 84.53, 'range': 4.230000000000004},
{'mean': 87.65, 'std_dev': 1.1300000000000026, 'min': 86.52, 'max': 88.78, 'range': 2.260000000000005},
{'mean': 76.61500000000001, 'std_dev': 8.085, 'min': 68.53, 'max': 84.7, 'range': 16.17},
{'mean': 84.995, 'std_dev': 3.3049999999999997, 'min': 81.69, 'max': 88.3, 'range': 6.609999999999999},
{'mean': 78.91499999999999, 'std_dev': 5.195, 'min': 73.72, 'max': 84.11, 'range': 10.39},
{'mean': 80.36500000000001, 'std_dev': 5.925000000000004, 'min': 74.44, 'max': 86.29, 'range': 11.850000000000009},
{'mean': 85.42500000000001, 'std_dev': 6.204999999999998, 'min': 79.22, 'max': 91.63, 'range': 12.409999999999997},
{'mean': 85.255, 'std_dev': 7.164999999999999, 'min': 78.09, 'max': 92.42, 'range': 14.329999999999998},
{'mean': 84.835, 'std_dev': 9.76685594242078, 'min': 74.82, 'max': 94.7, 'range': 19.88000000000001}]
6c. Implement write_assignment_stats
(3 points)
We are now ready to write the stats for each assignment to a separate file. The function will look similar to split_section
and split_average
in csv_split
, however we are outputting all stats to a single file called students_stats.csv
.
The function should have the following signature:
def write_assignment_stats(fname)
fname
is the original filename (e.g., 'students.csv'). Your function should do the following in order:
- Read the CSV file by calling
read_csv
incsv_parse
. Let's call the output ofread_csv
students
. - If
read_csv
returnsNone
, then an error occurred when opening the file. It's best to catch this error with an if statement that simply returnsNone
ifstudents
isNone
. - Get the average stats and assignment stats by calling
get_assignment_stats
. - Output each dictionary of stats as a single line to a file
who's name is of the form
FROMFILE_stats.csv
, whereFROMFILE
isfname
. The stats should output in the following order: min, max, range, mean, and std_dev. The first line should be the stats for average, and the remaining lines should each be the stats for one of the assignments (in order). Thus, this file should have 11 lines.
Example of the first 3 lines of students_stats.csv
:
80.29,87.879,7.588999999999999,84.03825,3.702226179949031
91.4,97.37,5.969999999999999,94.3425,2.502382614629505
80.3,84.53,4.230000000000004,81.985,1.7020061692015118
6d. Implement write_section_assignment_stats
(3 points)
Similar to write_assginment_stats
,
we will write the assignment stats to a separate file.
This time, we will write to multiple files,
namely one for each section.
Note that the input file (e.g., students.csv
)
used by the auto-grader will have an unknown number of sections.
The function should have the following signature:
def write_section_assignment_stats(fname)
fname
is the original filename (e.g., 'students.csv').
Your function should be similar to write_assginment_stats
.
However, you should loop through a set of sections (as in split_section
)
and only include those records with a matching section letter in your stats calculation
(hint: Use filter_section
from csv_split
).
The stats from each section should be written to a file who's name is of the from
FROMFILE_section_SECTION_stats.csv
,
where FROMFILE
is fname
and where SECTION
is the actual section.
Each output file should still have 11 lines.
Example of the first 3 lines of students_section_A_stats.csv
:
80.29,87.599,7.3089999999999975,83.9445,3.6544999999999987
91.4,96.2,4.799999999999997,93.80000000000001,2.3999999999999986
80.3,82.53,2.230000000000004,81.41499999999999,1.115000000000002
Grading Scale
Note that points may be obtained from anywhere, you do not need to try to complete work in any particular order.
There is also an abundance of possible points to obtain, giving you options in what you want to work on in order to get your total points within the desired range.