?

Log in

No account? Create an account

Previous Entry | Next Entry

You think I used my time wisely?

No...I don't...I just spent the past 2 hours coding...UGH...I know...I know...

I wrote an absolutely worthless program...

You know what it does? The program will read in any file that the user wishes to input and read in all the words and find out the amount of times a unique word shows up. It will store this information into a BST (Binary Search Tree) sorting by string. It will then ask the user which of the top words they would like to
see and it will use an array to store the words in numerical order. However, because I am lazy...the words "the" and "The" are not the same word...I didn't feel like writing an extra function *shrug* So sue me...


/***Libraries Included***/
#include // Allows input and output
#include // Allows the ability to read a file in or out
#include // This includes the string library
#include
[Error: Irreparable invalid markup ('<time.h>') in entry. Owner must fix manually. Raw contents below.]

No...I don't...I just spent the past 2 hours coding...UGH...I know...I know...

I wrote an absolutely worthless program...

You know what it does? The program will read in any file that the user wishes to input and read in all the words and find out the amount of times a unique word shows up. It will store this information into a BST (Binary Search Tree) sorting by string. It will then ask the user which of the top words they would like to
see and it will use an array to store the words in numerical order. However, because I am lazy...the words "the" and "The" are not the same word...I didn't feel like writing an extra function *shrug* So sue me...

<lj-cut text="A very elementary program I know...">
/***Libraries Included***/
#include <iostream> // Allows input and output
#include <fstream> // Allows the ability to read a file in or out
#include <string> // This includes the string library
#include <time.h> // This includes the time library
using namespace std;

/***Typedef for an Unsigned Long integer***/
typedef unsigned long Uli;

/***GLOBAL VARIABLES***/
const int FORMAT = 10; // This is format to clear the screen
const Uli LOAD_FACTOR = 1; // This is the maximum loadfactor. This
// means that this is the maximum amount of
// nodes per indice.

/***STRUCTS***/
struct BucketNode
{
string word;
int count;
Uli hashValue;
BucketNode *nextBucket;

BucketNode (string w, Uli hValue)
{
word = w;
hashValue = hValue;
count = 1;
nextBucket = NULL;
}
};
typedef BucketNode* Bucket;
const Bucket empty = NULL;

struct Hash
{
Uli hashSize;
Bucket* table;
Uli uniqueWordCount;

Hash (Uli size)
{
table = new Bucket[size];

hashSize = size;

for (Uli i=0; i < size; i++)
table[i] = NULL;

}
};
const Bucket empty = NULL;

struct Hash
{
Uli hashSize;
Bucket* table;
Uli uniqueWordCount;

Hash (Uli size)
{
table = new Bucket[size];

hashSize = size;

for (Uli i=0; i < size; i++)
table[i] = NULL;

}
};
struct HeapArray
{
string word;
int wordCount;

HeapArray()
{
word = " ";
wordCount = 0;
}
};

/***FUNCTION DEFINITI
void clrScreen( );
// Clears screen by 1
// pre: None
// post: None

void welcome ( );
// Welcome message
// pre: None
// post: None

Uli hashFunction (const char *v);
// This will find the numerical value of the string and return it
// pre: There must be a string to find the numerical value for
// post: This will return the value of the string

Uli tableIndex (Uli value, Uli tableSize);
// This will return the index of where the word should go in the array
// pre: There must be a hash value and the size of the table available
// post: This will read in the value of the string that was returned by
// the hashFunction. It will then % that value by the size of the
// table. This will return the index to where that word should be
// placed.

void insertToFront (string text, Hash* someHT);
// This will insert a new string into the hash table
// pre: There must be a string and a pointer to a table read in
// post: This will take the string and call the hashFunction, then it will
// take the value returned from the hashFunction and call the tableIndex
// function. This way the index of where the word should go is found.
// after that is done, it will call the insert( ) function to insert
// the word.

void insert (string text, Uli hV, Uli index, Hash* someHT);
// This will insert the text hashValue into the correct bucket.
// pre: There must be a string, hashvalue, index number, and a HashTable
// post: This will be called by the insertToFront function, the first thing
// this will do is check to see if the word is in the array. To do this
// there is an if statement that will test to see if the find() function
// is true. If it is, in the find function it will increment the word
// count if this statement is proven to be false it will create a new
// node, fill the values, the text and the hashValue, and link it to the
// first object in the Bucket. After completing that it will link the
// pointer at the index in the array to the newly created Node. Also,
// everything a new Node is created the uniqueWordCount variable is
// incremented.

void insert2 (string text, int count, Uli hV, Uli index, Hash* someHT);
// This will insert the text, count, and hashValue into the correct bucket.
// pre: There must be a string, count, hashValue, index number, and Hashtable
// post: This does the same thing as the insert function, however, the
// difference is this is not taking in new values, it is copying the
// values from the old hash table to the new one. This function also
// does not increment the uniqueWordCount, because it not creating new
// information, it is just copying old information. It also does not
// check to see if the word is already in the list.

bool find (Uli section, Uli someValue, string someWord, Hash* someHT);
// This returns a true or false statement to test to see if a word is already
// in the hash table.
// pre: This reads in an index, a hash value, a word, and the hash table
// post: This is called by the insert function. It takes the index of where the
// word is supposed to go and checks that list to see if the word is there
// or not. If it is not there, it returns a false to the insert functions
// If it is there it will increment the word count at that point in the
// linked list

void rehash (Hash* oldHT, Hash* newHT);
// This function rehashs the values at every linked list to place them into
// different places
// pre: This must take two hash tables
// post: This function is called by the expand function. It will walk down
// the linked list and take the hashValue from the word it is going to
// and then % that value by the new size of the hash table. It will then
// call the insert2 function, which will copy the old values into the new
// hash table. After copying the information it will delete that node that
// was copied and move onto the next one

Hash* expand (Hash* someHT);
// This will increase the size of the hash and return a new one
// pre: This will read in the hash table that needs to be increased in size
// post: This will create a new hash table and the size will be HashSize*2. It
// will then fill the new table with NULL values. Then the rehash
// function will be called, which will copy the values from the old hash
// table to the new hash table while "re-hashing" the values by the new
// size of the new hash table. Then it will set the HashSize value to
// HashSize*2, delete the old hash table and make the old hash table
// point to the new hash table.

void swap (HeapArray &x, HeapArray &y);
// This will swap two HeapArray types
// pre: There must be an input for x and y
// post: When this function is called it will swap the position of x and y in
// the array.

void fixDown (HeapArray heap[], int element, int size);
// This will make sure that the object is heap[1] will always be the smallest value
// pre: There must be an array, an integer, and the size of the array inputted
// post: This will compare the values in the heap to assure that the value in heap
// remains the smallest, if the value in the heap[1] is not the smallest
// the values until it is the smallest.

void insertToHeap (HeapArray a[], int length, Bucket ptr);
// This will insert the values from the tree into the array
// pre: There must be a tree and an array
// post: This will fill a sorted heap with the information that's in the
// tree while keeping a[1] the smallest element in the heap

void fillHeap(HeapArray theHeap[], int length, Hash* someHT);
// This will fill the heap with all the information from the Hash
// pre: There must be a heap, the length of the heap, and a hash table inputted
// post: This will start at the index 0 in the Hash table and begin to walk the
// links. While doing this it will call the insertToHeap function. After
// calling this it will move to the next link in that index. As long as
// the nextBucket does not equal NULL, it will continue to walk down the
// list. When it comes out of that while loop it will increment to the
// next indice in the hash

void sortHeap (HeapArray theHeap[], int size);
// This will sort the heap from big to small
// pre: There neads to be a heap and the size of the heap inputted into the function
// post: It will take the heap calling the fixDown functions to sort it

void printHeap (HeapArray theArray[], int size);
// This will display the array
// pre: There must be an array and a size for the array
// post: An array will be displayed when this function is called

void DeleteNodes (Hash* someHT);
// This will delete the nodes in the hash
// pre: There must be a Hash table
// post: This will walk every indice in the hash table and delete the nodes
// from it

void byeBye ();
// This is a function to say good-bye to the user
// pre: None
// post: None


int main ( )
{
/***Variables in use***/
string fileName; // Variable for the name of the file
string wordInFile; // Variable for the word that will be read in
// from the file
char ans = 'y';
ifstream infile;
int topWords = 0; // The top words the user wishes to see
int limit = 9999; // This is a condition for the print statement.$
// It will print as long as the user does not
// ask for words that are bigger than this value
double hashTime; // Store the time for the hash
double heapTime; // Store the time for the heap

/***Function to clear the screen by 10 lines***/
clrScreen ();

/***Function to welcome user***/
welcome ();

/***This loop checks to see if the user wishes to continue looking at files***/
do
{
/***Initialize value for the size of the array***/
Uli theSize=1;

/***Create a dynamic hash table***/
Hash* myHash = new Hash(theSize);

/***Intialize Value for uniqueWord Count***/
myHash->uniqueWordCount = 0;

/***Prompt user for file name***/
cout << "Please enter the name of the file that you wish to do \n"
<< "the word count on: ";
cin >> fileName;
cout << endl << endl;

/***Open the file***/
infile.open(fileName.c_str());

/***Test to make sure the file exists***/
if (infile.fail())
{
cerr << " ***ERROR*** Requested file does not exist"
<< endl;
return EXIT_FAILURE;
}
clock_t hashStartTime = clock();

while (infile >> wordInFile)
{
if ((myHash->uniqueWordCount/myHash->hashSize) >= LOAD_FACTOR)
{
myHash = expand (myHash);
}
insertToFront (wordInFile, myHash);
}

clock_t hashEndTime = clock();

/***Find the average time that it took to fill the hash with all the words***/
hashTime = (hashEndTime - hashStartTime)/(double) CLOCKS_PER_SEC;

/***Close the file***/
infile.close();

/***This is a test if they choose to pick a different file. If the user
chooses to do another file, they must hit 0, if not they choose 1 to
continue with the same file***/
int endFile = 1;

/***This loop is to check if the user wishes to use the same file***/
while (endFile != 0)
{
/***Ask user how many of the top words they would like to see***/
cout << "How many top words would you like to see?: ";
cin >> topWords;
cout << endl << endl;

/***Create dynamic array with the size of the topWords, the +1 is for the
sentinel***/
HeapArray *displayHeap = new HeapArray [(topWords + 1)];

clock_t heapStartTime = clock();

/***Fill the array with the information from the hash***/
fillHeap(displayHeap, (topWords + 1), myHash);

sortHeap (displayHeap, topWords);

clock_t heapEndTime = clock(); // Get the ending time of the program

/***Display the heap***/
if (topWords < limit)
{
cout << " " << fileName << endl << endl;
printHeap(displayHeap, topWords);
}

clrScreen();

/***Calculate the time it took from the start of the program until
the end***/
heapTime = (heapEndTime - heapStartTime)/(double) CLOCKS_PER_SEC;
cout << "The total time it took to fill the hash: " << hashTime
<< " seconds. \n"
<< "The total time it took to fill the array: " << heapTime
<< " seconds. "
<< endl << endl;

heapTime = heapEndTime = heapStartTime = 0;

/***Delete dynamic memory***/
delete [] displayHeap;

/***Ask user if they would like to continue viewing the same file***/
cout << "Would you like to continue viewing this file for more top \n"
<< "words?(Press 0 for no and any other number for yes): ";
cin >> endFile;
cout << endl << endl;

}
/***Delete memory***/
DeleteNodes (myHash);
delete [] myHash;

/***Ask if user would like to continue***/
cout << "Would you like to look at another file?: (Y/N) ";
cin >> ans;
cout << endl << endl;

hashTime = hashEndTime = hashStartTime = 0;
}
while (ans == 'y' || ans == 'Y');


clrScreen();


byeBye();

return 0;
}

/***Functions being used***/

void clrScreen ()
{
for (int i=0; i < FORMAT; i++)
cout << endl;
}

void welcome ()
{
cout << "This program will take an input file and will give \n"
<< "the user the ability to look at the top number of words \n"
<< "in the file."
<< endl << endl;
}

Uli hashFunction (const char *v)
{
Uli value = 0;
int a = 128;

for ( ; *v != 0; v++)
{
value = (a * value + *v);
}
return value;
}

Uli tableIndex (Uli value, Uli tableSize)
{
return value % tableSize;
}

void insertToFront (string text, Hash* someHT)
{
Uli hV = hashFunction(text.c_str());
Uli index = tableIndex(hV, someHT->hashSize);

insert (text, hV, index, someHT);
}

void insert (string text, Uli hV, Uli index, Hash* someHT)
{
Bucket temp = someHT->table[index];
if(find(index, hV, text, someHT))
return;
temp = new BucketNode (text, hV);
temp -> nextBucket = someHT->table[index];
someHT->table[index] = temp;
someHT->uniqueWordCount++;
}

void insert2 (string text, int count, Uli hV, Uli index, Hash* someHT)
{
Bucket temp = someHT->table[index];

temp = new BucketNode (text, hV);
temp -> count = count;
temp -> nextBucket = someHT->table[index];
someHT->table[index] = temp;
}

bool find (Uli section, Uli someValue, string someWord, Hash* someHT)
{
bool answer = false;
Bucket ptr = someHT->table[section];

while (ptr != empty)
{
if (ptr -> hashValue == someValue && ptr -> word == someWord)
{
ptr -> count++;
answer = true;
break;
}
ptr = ptr -> nextBucket;
}
return answer;
}

void rehash (Hash* oldHT, Hash* newHT)
{
for (Uli i=0; i < oldHT->hashSize; i++)
{
Bucket oldStart = oldHT->table[i];
while (oldStart != empty)
{
Bucket oldNext = oldStart -> nextBucket;

Uli newIndex = (oldStart -> hashValue % (oldHT->hashSize * 2));

insert2 (oldStart -> word, oldStart -> count, oldStart -> hashValue,
newIndex, newHT);

delete oldStart;

oldStart = oldNext;
}
}
}

Hash* expand (Hash* someHT)
{
Hash* tempHT = new Hash(someHT->hashSize * 2);

rehash (someHT, tempHT);

someHT->hashSize = someHT->hashSize * 2;

delete [] someHT->table;

return tempHT;
}

void swap (HeapArray &x, HeapArray &y)
{
HeapArray temp = x;
x = y;
y = temp;
}

void fixDown (HeapArray heap[], int element, int size)
{
while (2*element <= size)
{
int maxChild = 2*element;
if (maxChild < size &&
heap[maxChild].wordCount > heap[maxChild + 1].wordCount)
{
maxChild++;
}
if(heap[element].wordCount < heap[maxChild].wordCount)
break;
swap(heap[element], heap[maxChild]);
element = maxChild;
}
}

void insertToHeap (HeapArray a[], int length, Bucket ptr)
{
Uli min = 1;
if (a[min].wordCount < ptr -> count)
{
a[min].word = ptr -> word;
a[min].wordCount = ptr -> count;
fixDown (a, min, length - 1);
}
return;
}

void fillHeap(HeapArray theHeap[], int length, Hash* someHT)
{
for (Uli i =0; i < someHT->hashSize; i++)
{
if (someHT->table[i] != empty)
{
Bucket temp = someHT->table[i];
while (temp != empty)
{
insertToHeap (theHeap, length, temp);
temp = temp -> nextBucket;
}
}
}
}

void sortHeap (HeapArray theHeap[], int size)
{
for (int k = size/2; k >= 1; k--)
{
fixDown (theHeap, k, size);
}
while (size > 1)
{
swap (theHeap[1], theHeap[size]);
fixDown (theHeap, 1, --size);
}
}

void printHeap (HeapArray theArray[], int size)
{
cout << "\t" << "Word" << "\t\t" << "Amount" << endl;
cout << "\t" << "-----" << "\t\t" << "-------"
<< endl << endl;

for (int i=1; i < (size + 1); i++)
{
cout << "\t" << theArray[i].word
<< "\t\t" << theArray[i].wordCount << endl;
}
}

void DeleteNodes (Hash* someHT)
{
for (Uli i=0; i < someHT->hashSize; i++)
{
Bucket ptr = someHT->table[i];

while (ptr != empty)
{
Bucket nextPtr = ptr -> nextBucket;

delete ptr;

ptr = nextPtr;
}
}
}
void byeBye ()
{
cout << "Thanks for using this program, have a nice day! :) "
<< endl;

clrScreen ();
}

</lj-cut>
The results you ask? Welll... ;) Here are the output results. It is the combination of the Kenneth Star Report and the Bible...and the Kenneth Star Report again...lol...

There are the top 10 words and the amount of time it took:

<table><tr><td>Word</td><td>Amount</td></tr>
<tr><td>-----</td><td>-------</td></tr>
<tr><td>the</td><td>49046</td></tr>
<tr><td>and</td><td>26972</td></tr>
<tr><td>of</td><td>26481</td></tr>
<tr><td>to</td><td>15313</td></tr>
<tr><td>in</td><td>8717</td></tr>
<tr><td>a</td><td>6815</td></tr>
<tr><td>you</td><td>6470</td></tr>
<tr><td>his</td><td>6340</td></tr>
<tr><td>shall</td><td>6164</td></tr>
<tr><td>he</td><td>6100</td></tr>
</table>

The total time it took to fill the hash: 5.87 seconds.
The total time it took to fill the array: .01 seconds.

I know this is not really well laid out...but you know...that's ok. Not the most efficient program but it kept me occupied for the past 2.5 hours...

Comments

( 4 comments — Leave a comment )
cozmo
Nov. 11th, 2003 08:21 pm (UTC)
wow, I'm impressed hehe. How much memory does it use to hold all those words?
natalie516
Nov. 12th, 2003 04:26 pm (UTC)
hehe...you are easily impressed...it is not even the most efficient program in the world :)

As for the memory...I don't know.
ronalum
Nov. 12th, 2003 08:54 am (UTC)
Does this mean that you are going to check your commentators for plagiarism?
natalie516
Nov. 12th, 2003 04:25 pm (UTC)
Nope
( 4 comments — Leave a comment )