1.32k likes | 1.34k Views
Create an RSS aggregator that archives the text of articles linked in RSS feeds in a searchable database, with potential datamining capabilities.
E N D
Computer Systems LabTJHSSTCurrent Projects 2004-2005First Period
Current Projects, 1st Period • Caroline Bauer: Archival of Articles via RSS and Datamining Performed on Stored Articles • Susan Ditmore: Construction and Application of a Pentium II Beowulf Cluster • Michael Druker: Universal Problem Solving Contest Grader 2
Current Projects, 1st Period • Matt Fifer: The Study of Microevolution UsingAgent-based Modeling • Jason Ji: Natural Language Processing: Using Machine Translation in Creation of a German-English Translator • Anthony Kim: A Study of Balanced Search Trees: Brainforming a New Balanced Search Tree • John Livingston: Kernel Debugging User-Space API Library (KDUAL) 3
Current Projects, 1st Period • Jack McKay: Saber-what? An Analysis of the Use of Sabermetric Statistics in Baseball • Peden Nichols: An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms • Robert Staubs: Part-of-Speech Tagging with Limited Training Corpora • Alex Volkovitsky: Benchmarking of Cryptographic Algorithms 4
Archival of Articles via RSS and Datamining Performed on Stored ArticlesRSS (Really Simple Syndication, encompassing Rich Site Summary and RDF Site Summary) is a web syndication protocol used by many blogs and news websites to distribute information it saves people having to visit several sites repeatedly to check for new content. At this point in time there are many RSS newsfeed aggregators available to the public, but none of them perform any sort of archival of information beyond the RSS metadata. The purpose of this project is to create an RSS aggregator that will archive the text of the actual articles linked to in the RSS feeds in some kind of linkable, searchable database, and, if all goes well, implement some sort of datamining capability as well. 5
Abstract: RSS (Really Simple Syndication, encompassing Rich Site Summary and RDF Site Summary) is a web syndication protocol used by many blogs and news websites to distribute information; it saves people having to visit several sites repeatedly to check for new content. At this point in time there are many RSS newsfeed aggregators available to the public, but none of them perform any sort of archival of information beyond the RSS metadata. As the articles linked may move or be eliminated at some time in the future, if one wants to be sure one can access them in the future one has to archive them oneself; furthermore, should one want to link such collected articles, it is far easier to do if one has them archived. The purpose of this pro ject is to create an RSS aggregator that will archive the text of the actual articles linked to in the RSS feeds in some kind of linkable, searchable database, and, if all goes well, implement some sort of datamining capability as well. Archival of Articles via RSS, and Datamining Performed on Stored ArticlesCaroline Bauer
Introduction This paper is intended to be a detailed summary of all of the author's findings regarding the archival of articles in a linkable, searchable database via RSS. Background RSS RSS stands for Really Simple Syndication, a syndication protocol often used by weblogs and news sites. Technically, RSS is an xml-based communication standard that encompasses Rich Site Summary (RSS 0.9x and RSS 2.0) and RDF Site Summary (RSS 0.9 and 1.0). It enables people to gather new information by using an RSS aggregator (or "feed reader") to poll RSS-enabled sites for new information, so the user does not have to manually check each site. RSS aggregators are often extensions of browsers or email programs, or standalone programs; alternately, they can be web-based, so the user can view their "feeds" from any computer with Web access. Archival Options Available in Existing RSS Aggregators Data Mining Data mining is the searching out of information based on patterns present in large amounts of data. //more will be here. Archival of Articles via RSS, and Datamining Performed on Stored ArticlesCaroline Bauer
Purpose The purpose of this project is to create an RSS aggregator that, in addition to serving as a feed reader, obtains the text of the documents linked in the RSS feeds and places it into a database that is both searchable and linkable. In addition to this, the database is intended to reach an implementation wherein it performs some manner of data mining on the information contained therein; the specifics on this have yet to be determined. Development Results Conclusions Summary References 1. "RSS (protocol)." Wikipedia. 8 Jan. 2005. 11 Jan. 2005 <http://en. wikipedia.org/wiki/RSS_%28protocol%29>. 2. "Data mining." Wikipedia. 7 Jan. 2005. 12 Jan. 2005. <http://en. wikipedia.org/wiki/Data_mining>. Archival of Articles via RSS, and Datamining Performed on Stored ArticlesCaroline Bauer
Construction and Application of a Pentium II Beowulf ClusterI plan to construct a super computing cluster of about 15-20 or more Pentium II computers with the OpenMosix kernel patch. Once constructed, the cluster could be configured to transparently aid workstations with computationally expensive jobs run in the lab. This project would not only increase the computing power of the lab, but it would also be an experiment in building a lowlevel, lowcost cluster with a stripped down version of Linux, useful to any facility with old computers they would otherwise deem outdated. 9
Text version needed (your pdf file won't copy to text) Construction and Application of a Pentium II Beowulf ClusterSusan Ditmore
Universal Problem Solving Contest GraderMichael Druker(poster needed) 11
Steps so far: Creation of directory structure for the grader, the contests, the users, the users' submissions, the test cases. -Starting of main grading script itself. Refinement of directory structure for the grader. -Reading of material on bash scripting language to be able to write the various scripts that will be necessary. Universal Problem Solving Contest GraderMichael Druker
Current program: #!/bin/bash CONDIR="/afs/csl.tjhsst.edu/user/mdruker/techlab/code/new/" #syntax is "grade contest user program" contest=$1 user=$2 program=$3 echo "contest name is " $1 echo "user's name is " $2 echo "program name is " $3 Universal Problem Solving Contest GraderMichael Druker
Current program continued: #get the location of the program and the test data #make sure that the contest, user, program are valid PROGDIR=${CONDIR}"contests/"${contest}"/users/"${user} echo "user's directory is" $PROGDIR if [ -d ${PROGDIR} ] then echo "good input" else echo "bad input, directory doesn't exist" exit 1 fi exit 0 Universal Problem Solving Contest GraderMichael Druker
Study of Microevolution Using Agent-Based Modeling in C++The goal of the project is to create a program that uses an agent-environment structure to imitate a very simple natural ecosystem: one that includes a single type of species that can move, reproduce, kill, etc. The "organisms" will contain genomes (libraries of genetic data) that can be passed from parents to offspring in a way similar to that of animal reproduction in nature. As the agents interact with each other, the ones with the characteristics most favorable to survival in the artificial ecosystem will produce more children, and over time, the mean characteristics of the system should start to gravitate towards the traits that would be most beneficial. This process, the optimization of physical traits of a single species through passing on heritable advantageous genes, is knownas microevolution. 15
Abstract The goal of the project is to create a program that uses an agent-environment structure to imitate a very simple natural ecosystem: one that includes a single type of species that can move, reproduce, kill, etc. The "organisms" will contain genomes (libraries of genetic data) that can be passed from parents to offspring in a way similar to that of animal reproduction in nature. As the agents interact with each other, the ones with the characteristics most favorable to survival in the artificial ecosystem will produce more children, and over time, the mean characteristics of the system should start to gravitate towards the traits that would be most beneficial. This process, the optimization of physical traits of a single species through passing on heritable advantageous genes, is known as microevolution. THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
Purpose One of the most controversial topics in science today is the debate of creationism vs. Darwinism. Advocates for creationism believe that the world was created according to the description detailed in the 1st chapter of the book of Genesis in the Bible. The Earth is approximately 6,000 years old, and it was created by God, followed by the creation of animals and finally the creation of humans, Adam and Eve. Darwin and his followers believe that from the moment the universe was created, all the objects in that universe have been in competition. Everything - from the organisms that make up the global population, to the cells that make up those organisms, to the molecules that make up those cells has beaten all of its competitors in the struggle for resources commonly known as life. THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
This project will attempt to model the day-today war between organisms of the same species. Organisms, or agents, that can move, kill, and reproduce will be created and placed in an ecosystem. Each agent will include a genome that codes for its various characteristics. Organisms that are more successful at surviving or more successful at reproducing will pass their genes to their children, making future generations better suited to the environment. The competition will continue, generation after generation, until the simulation terminates. If evolution has occurred, the characteristics of the population at the end of the simulation should be markedly different than at the beginning. THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
Background Two of the main goals of this project are the study of microevolution and the effects of biological mechanisms on this process. Meiosis, the formation of gametes, controls how genes are passed from parents to their offspring. In the first stage of meiosis, prophase I, the strands of DNA floating around the nucleus of the cell are wrapped around histone proteins to form chromosomes. Chromosomes are easier to work with than the strands of chromatin, as they are packaged tightly into an "X" structure (two ">"s connected at the centromere). In the second phase, metaphase I, chromosomes pair up along the equator of the cell, with homologous chromosomes being directly across from each other. (Homologous chromosomes code for the same traits, but come from different parents, and thus code for different versions of the same trait.) The pairs of chromosomes, called tetrads, are connected and exchange genetic material. THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
This process, called crossing over, results in both of the chromosomes being a combination of genes from the mother and the father. Whole genes swap places, not individual nucleotides. In the third phase, anaphase I, fibers from within the cell pull the pair apart. When the pairs are pulled apart, the two chromosomes are put on either side of the cell. Each pair is split randomly, so for each pair, there are two possible outcomes. For instance, the paternal chromosome can either move to the left or right side of the cell, with the maternal chromosome moving to the opposite end. In telophase I, the two sides of the cell split into two individual cells. Thus, for each cell undergoing meiosis, there are 2n possible gametes. With crossing over, there are almost an infinite number of combinations of genes in the gametes.This large number of combinations is the reason for the genetic biodiversity that exists in the world today, even among species. For example, there are 6 billion humans on the planet, and none of them is exactly the same as another one. THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
Procedure This project will be implemented with a matrix of agents. The matrix, initialized with only empty spaces, will be seeded with organisms by an Ecosystem class. Each agent in the matrix will have a genome, which will determine how it interacts with the Ecosystem. During every step of the simulation, an organism will have a choice whether to 1. do nothing 2. move to an empty adjacent space 3. kill an organism in a surrounding space, or 4. reproduce with an organism in an adjacent space. The likelihood of the organism performing any of these tasks is determined by the organism's personal variables, which will be coded for by the organism's genome. While the simulation is running, the average characteristics of the population will be measured. In theory, the mean value of each of the traits (speed, agility, strength, etc.) should either increase with time or gravitate towards a particular, optimum value. THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
At its most basic level, the program written to model microevolution is an agentenvironment program. The agents, or members of the Organism class, contain a genome and have abilities that are dependent upon the genome. Here is the declaration of the Organism class: class Organism { public: Organism(); //constructors Organism(int ident, int row2, int col2); Organism(Nucleotide* mDNA, Nucleotide* dDNA, int ident, bool malefemale, int row2, int col2); ~Organism(); //destructor void printGenome(); void meiosis(Nucleotide* gamete); Organism* reproduce(Organism* mate, int ident, int r, int c); int Interact(Organism* neighbors, int nlen); ... THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
//assigns a gene a numeric value int Laziness(); //accessor functions int Rage(); int SexDrive(); int Activity(); int DeathRate(); int ClausIndex(); int Age(); int Speed(); int Row(); int Col(); int PIN(); bool Interacted(); bool Gender(); void setPos(int row2, int col2); void setInteracted(bool interacted); private: void randSpawn(Nucleotide* DNA, int size); //randomly generates a genome Nucleotide *mom, *dad; //genome int ID, row, col, laziness, rage, sexdrive, activity, deathrate, clausindex, speed; //personal characteristics double age; bool male, doneStuff; ... THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
The agents are managed by the environment class, known as Ecosystem. The Ecosystem contains a matrix of Organisms. Here is the declaration of the Ecosystem class: class Ecosystem { public: Ecosystem(); //constructors Ecosystem(double oseed); ~Ecosystem(); //destructor void Run(int steps); //the simulation void printMap(); void print(int r, int c); void surrSpaces(Organism* neighbors, int r, int c, int &friends); //the neighbors of any cell private: Organism ** Population; //the matrix of Organisms }; }; THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
The simulation runs for a predetermined number of steps within the Ecosystem class. During every step of the simulation, the environment class cycles through the matrix of agents, telling each one to interact with its neighbors. To aid in the interaction, the environment sends the agent an array of the neighbors that it can affect. Once the agent has changed (or not changed) the array of neighbors, it sends the array back to the environment which then updates the matrix of agents. Here is the code for the Organisms function which enables it to interact with its neighbors: THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
int Organism::Interact(Organism* neighbors, int nlen) //returns 0 if the organism hasn't moved & 1 if it has { fout << row << " " << col << " "; if(!ID)//This Organism is not an organism { fout << "Not an organism, cannot interact!" << endl; return 0; } if(doneStuff)//This Organism has already interacted once this step { fout << "This organism has already interacted!" << endl; return 0; } doneStuff = true; int loop; for(loop = 0; loop < GENES * CHROMOSOMES * GENE_LENGTH; loop++) { if(rand() % RATE_MAX < MUTATION_RATE) mom[loop] = (Nucleotide)(rand() % 4); if(rand() % RATE_MAX < MUTATION_RATE) THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
The Organisms, during any simulation step, can either move, kill a neighbor, remain idle, reproduce, or die. The fourth option, reproduction, is the most relevant to the project. As explained before, organisms that are better at reproducing or surviving will pass their genes to future generations. The most critical function in reproduction is the meiosis function, which determines what traits are passed down to offspring. The process is completely random, but an organism with a "good" gene has about a 50% chance of passing that gene on to its child. Here is the meiosis function, which determines what genes each organism sends to its offspring: void Organism::meiosis(Nucleotide *gamete) { int x, genect, chromct, crossover; Nucleotide * chromo = new Nucleotide[GENES * GENE_LENGTH], *chromo2 = new Nucleotide[GENES * GENE_LENGTH]; Nucleotide * gene = new Nucleotide[GENE_LENGTH], *gene2 = new Nucleotide[GENE_LENGTH]; ... (more code) THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
The functions and structures above are the most essential to the running of the program and the actual study of microevolution. At the end of each simulation step, the environment class records the statistics for the agents in the matrix and puts the numbers into a spreadsheet for analysis. The spreadsheet can be used to observe trends in the mean characteristics of the system over time. Using the spreadsheet created by the environment class, I was able to create charts that would help me analyze the evolution of the Organisms over the course of the simulation. THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
The first time I ran the simulation, I set the program so that there was no mutation in the agent's genomes. Genes were strictly created at the outset of the program, and those genes were passed down to future generations. If microevolution were to take place, a gene that coded for a beneficial characteristic would have a higher chance of being passed down to a later generation. Without mutation, however, if one organism possessed a characteristic that was far superior to the comparable characteristics of other organisms, that gene should theoretically allow that organism to "dominate" the other organisms and pass its genetic material to many children, in effect exterminating the genes that code for less beneficial characteristics. THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
For example, if an organism was created that had a 95% chance of reproducing in a given simulation step, it would quickly pass its genetic material to a lot of offspring, until its gene was the only one left coding for reproductive tendency, or libido. THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
As you can see from Figure 1, the average tendency to reproduce increases during the simulation. The tendency to die decreases to almost nonexistence. The tendency to remain still, since it has relatively no effect on anything, stays almost constant. The tendency to move to adjacent spaces, thereby spreading one's genes throughout the ecosystem, increases to be almost as likely as reproduction. The tendency to kill one's neighbor decreases drastically, probably because it does not positively benefit the murdering organism. In Figure 2, we can see that the population seems to stabilize at about the same time as the average characteristics. This would suggest that there was a large amount of competition among the organisms early in the simulation, but the competition quieted down as one dominant set of genes took over the ecosystem. THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
Figure 4 These figures show the results from the second run of the program, when mutation was turned on. As you can see, many of the same trends exist, with reproductive tendency skyrocketing and tendency to kill plummeting. Upon reevaluation, it seems that perhaps the tendencies to move and remain idle do not really affect an agent's ability survive, and thus their trends are more subject to fluctuations that occur in the beginning of the simulation. One thing to note about the mutation simulation is the larger degree of fluctuation in both characteristics and population. The population stabilizes at about the same number, but swings between simulation steps are more pronounced. In Figure 3, the stabilization that had occurred in Figure 1 is largely not present. THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
Conclusion The goal of this project at the outset was to create a system that modeled trends and processes from the natural world, using the same mechanisms that occur in that natural world. While this project by no means definitively proves the correctness of Darwin's theory of evolution over the creationist theory, it demonstrates some of the basic principles that Darwin addressed in his book, The Origin of Species. Darwin addresses two distinct processes--natural selection and artificial selection. Artificial selection, or selective breeding, was not present in this project at all. There was no point in the program where the user was allowed to pick organisms that survived. Natural selection, though it is a stretch because nature was the inside of a computer, was simulated. Natural selection, described as the "survival of the fittest," is when an organism's characteristics enable it to survive and pass those traits to its offspring. THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
In this program, "nature" was allowed to run its course, and at the end of the simulation, the organisms with the best combination of characteristics had triumphed over their predecessors. "Natural" selection occurred as predicted. *All of the information in this report was either taught last year in A.P. Biology last year and, to a small degree, Charles Darwin's The Origin of Species. I created all of the code and all of the charts in this paper. For my next draft, I will be sure to include more outside information that I have found in the course of my research* THE STUDY OF MICROEVOLUTION USING AGENTBASED MODELINGMatt Fifer
Using Machine Translation in a German – English TranslatorThis project attempts to take the beginning steps towards the goal of creating a translator program that operates within the scope of translating between English and German. 35
Abstract: The field of machine translation - using computers to provide translations between human languages - has been around for decades. And the dream of an ideal machine providing a perfect translation between languages has been around still longer. This pro ject attempts to take the beginning steps towards that goal, creating a translator program that operates within an extremely limited scope to translate between English and German. There are several different strategies to machine translation, and this pro ject will look into them - but the strategy taken to this pro ject will be the researcher's own, with the general guideline of "thinking as a human." Natural Language Processing: Using Machine Translation in Creation of a German-English TranslatorJason Ji
For if humans can translate between language, there must be something to how we do it, and hopefully that something - that thought process, hopefully - can be transferred to the machine and provide quality translations. Background There are several methods of varying difficulty and success to machine translation. The best method to use depends on what sort of system is being created. A bilingual system translates between one pair of languages; a multilingual system translates between more than two systems. Natural Language Processing: Using Machine Translation in Creation of a German-English TranslatorJason Ji
The easiest translation method to code, yet probably least successful, is known as the direct approach. The direct approach does what it sounds like it does - takes the input language (known as the "source language"), performs 2 morphological analysis - whereby words are broken down and analyzed for things such as prefixes and past tense endings, performs a bilingual dictionary look-up to determine the words' meanings in the target language, performs a local reordering to fit the grammar structure of the target language, and produces the target language output. The problem with this approach is that it is essentially a word-for-word translation with some reordering, resulting often in mistranslations and incorrect grammar structures. Natural Language Processing: Using Machine Translation in Creation of a German-English TranslatorJason Ji
Furthermore, when creating a multilingual system, the direct approach would require several different translation algorithms - one or two for each language pair. The indirect approach involves some sort of intermediate representation of the source language before translating into the target language. In this way, linguistic analysis of the source language can be performed on the intermediate representation. Translating to the intermediary also enables semantic analysis, as the source language input can be more carefully to detect idioms, etc, which can be stored in the intermediary and then appropriately used to translate into the target language. Natural Language Processing: Using Machine Translation in Creation of a German-English TranslatorJason Ji
The transfer method is similar, except that the transfer is language dependent - that is to say, the French-English intermediary transfer would be different from the EnglishGerman transfer. An interlingua intermediary can be used for multilingual systems. Theory Humans fluent in two or more languages are at the moment better translators than the best machine translators in the world. Indeed, a person with three years of experience in learning a second language will already be a better translator than the best machine translators in the world as well. Natural Language Processing: Using Machine Translation in Creation of a German-English TranslatorJason Ji
Yet for humans and machines alike, translation is a process, a series of steps that must be followed in order to produce a successful translation. It is interesting to note, however, that the various methods of translation for machines - the various processes - become less and less like the process for humans as they become more complicated. Furthermore, it was interesting to notice that as the method of machine translation becomes more complicated, the results are sometimes less accurate than the results of simpler methods that better model the human rationale for translation. Natural Language Processing: Using Machine Translation in Creation of a German-English TranslatorJason Ji
Therefore, the theory is, an algorithm that attempts to model the human translation process would be more successful than other, more complicated methods currently in development today. This theory is not entirely plausible for full-scale translators because of the sheer magnitude of data that would be required. Humans are better translators than computers in part because they have the ability to perform semantic analysis, because they have the necessary semantic information to be able to, for example, determine the difference in a word's definition based on its usage in context. Creating a translator with a limited-scope of vocabulary would require less data, leaving more room for semantic information to be stored along with definitions. Natural Language Processing: Using Machine Translation in Creation of a German-English TranslatorJason Ji
A limited-scope translator may seem unuseful at first glance, but even humans fluent in any language, including their native language, don't know the entire vocabulary of the language. A language has hundreds of thousands of words, and no human knows even half of them all. A computer with a vocabulary of commonly used words that most people know, along with information to avoid semantic problems, would therefore be still useful for nonprofessional work. Development On the most superficial level, a translator is more user-friendly for an average person if it is GUI-based, rather than simply text-based. This part of the development is finished. The program presents a GUI for the user. Natural Language Processing: Using Machine Translation in Creation of a German-English TranslatorJason Ji
A JFrame opens up with two text areas and a translate button. The text areas are labeled "English" and "German". The input text is typed into the English window, the "Translate" button is clicked, and the translator, once finished, outputs the translated text into the German text area. Although typing into the German text area is possible, the text in the German text area does not affect the translator process. The first problem to deal with in creating a machine translator is to be able to recognize the words that are inputted into the system. A sentence or multiple sentences are input into the translator, and a string consisting of that entire sentence (or sentences) is passed to the translate() function. Natural Language Processing: Using Machine Translation in Creation of a German-English TranslatorJason Ji
The system loops through the string, finding all space (' ') characters and punctuation characters (comma, period, etc) and records their positions. (It is important to note the position of each punctuation mark, as well as what kind of a punctuation mark it is, because the existence and position of punctuation marks alter the meaning of a sentence.) The number of words in the sentence is determined to be the number of spaces plus one. By recording the position of each space, the string can then be broken up into the words. The start position of each word is the position of each space, plus one, and the end position is the position of the next space. This means that punctuation at the end of any given word is placed into the String with that word, but this is not Natural Language Processing: Using Machine Translation in Creation of a German-English TranslatorJason Ji
a problem: the location of each punctuation mark is already recorded, and the dictionary look-up of each word will first check to ensure that the last character of each word is a letter; if not, it will simply disregard the last character. The next problem is the biggest problem of all, the problem of actual translation itself. Here there is no code yet written, but development of pseudocode has begun already. As previously mentioned, translation is a process. In order to write a translator program that follows the human translation process, the human process must first be recognized and broken down into programmable steps. This is no easy task. Humans with five years of experience Natural Language Processing: Using Machine Translation in Creation of a German-English TranslatorJason Ji
in learning a language may already translate any given text quickly enough, save time to look up unfamiliar words, that the process goes by too quickly to fully take note of. The basic process is not entirely determined yet, but there is some progress on it. The process to determine the process has been as followed: given a random sentence to translate, the sentence is first translated by a human, then the process is noted. Each sentence given has ever-increasing difficulty to translate. Natural Language Processing: Using Machine Translation in Creation of a German-English TranslatorJason Ji
For example: the sentence, "I ate an apple," is translated via the following process: 1) Find the sub ject and the verb. (I; ate) 2) Determine the tense and form of the verb. (ate = past, imperfekt form) a) Translate sub ject and verb. (Ich; ass) (note - "ass" is a real German verb form.) 3) Determine what the verb requires. (ate -¿ eat; requires a direct ob ject) 4) Find what the verb requires in the sentence. (direct ob ject comes after verb and article; apple) 5) Translate the article and the direct ob ject. (ein; Apfel) 6) Consider the gender of the direct ob ject, change article if necessary. (der Apfel; ein -¿ einen) Ich ass einen Apfel. Natural Language Processing: Using Machine Translation in Creation of a German-English TranslatorJason Ji
References (I'll put these in proper bibliomumbo jumbographical order later!) 1. http://dict.leo.org (dictionary) 2. "An Introduction To Machine Translation" (available online at http://ourworld.compuserve.com/homepages/WJHutchins/IntroMT-TOC.htm) 3. http://www.comp.leeds.ac.uk/ugadmit/cogsci/spchlan/machtran.htm (some info on machine translation) 4. Natural Language Processing: Using Machine Translation in Creation of a German-English TranslatorJason Ji
A Study of Balanced Search TreesThis project investigates four different balanced search trees for their advantages anddisadvantages, thus ultimately their efficiency. Runtime and memory space management are two main aspects under the study. Statistical analysis is provided to distinguish subtledifference if there is any. A new balanced search tree is suggested and compared with the four balanced search trees. 50