PDA

View Full Version : Programming Exercise #3

Tolnaftate2004
03-26-2008, 02:44 AM
In a string of N characters, calculate the frequencies of repeated patterns, e.g., given "AUGCCCGTAUACGTA", one would get CGTA:2, CGT:2, GTA:2, CG:2, GT:2, TA:2, AU:2, CC:2. You do not need to display the information, only get the data in a format from which it can be extracted (hopefully in a pattern that is of order less than the algorithm). You may assume that only the letters A-Z will be used to make the given string.

It may help to review Programming Exercises #1 (http://forums.graalonline.com/forums/showthread.php?t=77627) and #2 (http://forums.graalonline.com/forums/showthread.php?t=78351).

Remember, the goal is to make this as efficient as possible with graal script.
Your algorithm will be ranked first by big-O notation, then by runtime speed.

cbk1994
03-26-2008, 02:52 AM
Do we need to find every pattern possible, or just ones specified?

For example,

findAllPatterns( n )

or something like

findNumber( n, pattern )

DrakilorP2P
03-26-2008, 03:25 AM
Assuming that the algorithm should find the number of occurrences of a string in another string. Which it shouldn't (http://forums.graalonline.com/forums/showpost.php?p=1382434&postcount=5).

function findFrequencyOfPattern(haystack, needle)
{
temp.occurrences = 0;
for (temp.i=0; temp.i<haystack.length() - needle.length() + 1; temp.i++) {
if (haystack.substring(temp.i, needle.length()) == needle) temp.occurrences++;
}
return temp.occurrences;
}

temp.haystack = "AUGCCCGTAUACGTA";
temp.needles = {"CGTA", "CGT", "GTA", "CG", "GT", "TA", "AU", "CC"};
sendrpgmessage(temp.haystack);
for (temp.needle: temp.needles) {
sendrpgmessage(temp.needle @ ": " @ findFrequencyOfPattern(temp.haystack, temp.needle));
}
Outputs:
AUGCCCGTAUACGTA
CGTA: 2
CGT: 2
GTA: 2
CG: 2
GT: 2
TA: 2
AU: 2
CC: 2

cbk1994
03-26-2008, 05:05 AM
Okay, I have mine finished. It takes about 5 seconds to run (go figure!) but it gets the right answer, at least from what I have seen.

String: AU, Occurances: 2
String: CC, Occurances: 2
String: CG, Occurances: 2
String: CGT, Occurances: 2
String: CGTA, Occurances: 2
String: GT, Occurances: 2
String: GTA, Occurances: 2
String: TA, Occurances: 2

This is what it outputs in RC.

The script is very inefficient and slow, but it works ...

NOTE: You will need to change the [] in the format to a percent sign, it won't let me post without giving me a bad request error. Look for this line:
echo( format( "String: []s, Occurances: []s", rc[0], rc[1] ) );

function onCreated()
{
temp.includeSingle = false; // Include ones that only appear once?

temp.n = "AUGCCCGTAUACGTA";
temp.na = stringToArrayList( n );

temp.occ = {};

for ( temp.r : na )
{
temp.fr = findFrequency( n, r );
if ( temp.fr == 1 && includeSingle == false )
{
continue;
}
occ.add( { r, temp.fr } );
}

for ( temp.rc : occ )
{
echo( format( "String: []s, Occurances: []s", rc[0], rc[1] ) );
}
}
function stringToArrayList( str )
{
temp.p = {};
for ( temp.char = 0; char < str.length(); char ++ )
{
for ( temp.charlen = 2; charlen < str.length() + 1 - char; charlen ++ )
{
temp.count ++;
temp.newText = str.substring( char, charlen );
if ( newText != str && p.index( @ newText ) == -1 )
{
}
if ( count >= 100 )
{
sleep( .05 );
count = 0;
}
}
}
return p;
}
function findFrequency( str, phrase )
{
temp.o = 0;
for ( temp.char = 0; char < str.length(); char ++ )
{
for ( temp.charlen = 1; charlen < str.length() + 1 - char; charlen ++ )
{
temp.count ++;
temp.newText = str.substring( char, charlen );
if ( newText != str )
{
if ( newText == phrase )
{
o ++;
}
}
if ( count >= 100 )
{
sleep( .05 );
count = 0;
}
}
}
return o;
}

This looks for all possible combinations, then finds how often they occur.

Also, if a mod wants to delete my last post in this thread they can, it won't let me edit it or delete it anymore for some reason, gives a Bad Request error.

Tolnaftate2004
03-26-2008, 07:34 AM
Assuming that the algorithm should find the number of occurrences of a string in another string.

Nooooooot quite.

Okay, I have mine finished. It takes about 5 seconds to run (go figure!) but it gets the right answer, at least from what I have seen.

This is closer (:)), but no doubt we can do better. ~O(n³)

Inverness
03-26-2008, 08:16 AM
function findpatterns(str) {
temp.outc = {};
temp.out = {};
temp.sub = "";
temp.p = 0;
temp.i = 0;
temp.e = 0;

for (i = 2; i < str.length(); i ++) {
for (e = 0; e < str.length(); e ++) {
if (e + i > str.length() - 1) {
continue;
}
sub = str.substring(e, i);
p = str.positions(sub);
if (p.size() > 1) {
if (outc.index(sub) < 0) {
}
}
}
}
return out;
}

Tolnaftate2004
03-26-2008, 09:13 AM
I have updated the guidelines slightly.

DrakilorP2P
03-26-2008, 09:59 AM
I have updated the guidelines slightly.

Why isn't strings of length one considered valid patterns?

Tolnaftate2004
03-26-2008, 12:11 PM
Why isn't strings of length one considered valid patterns?
A solid color does not a pattern make.

Inverness
03-26-2008, 01:28 PM
No comment on what I posted?

You've irritated me.

xXziroXx
03-26-2008, 02:05 PM
How about making Programming Exercise #4 something that we normal mortals stand a chance at? :whatever:

DrakilorP2P
03-26-2008, 05:13 PM
Not much to comment. The code and output dump should speak for themselves.
function findPatterns(theString)
{
sendrpgmessage("String: \"" @ theString @ "\"");
sendrpgmessage("Init at approximately " @ timevar2);
temp.oldTime = timevar2;

temp.patterns = {};
for (temp.length=2; temp.length<temp.theString.length(); temp.length++) {
for (temp.position=0; temp.position<temp.theString.length() - temp.length + 1; temp.position++) {
temp.pattern = temp.theString.substring(temp.position, temp.length);
temp.("pattern_" @ temp.pattern)++;

//Makes iteration through the variables much easier.
if (temp.("pattern_" @ temp.pattern) == 1) temp.patterns.add(temp.pattern);
}
}

//Output
temp.output = {};
for (temp.pattern: temp.patterns) {
if (temp.("pattern_" @ temp.pattern) > 1) {
sendrpgmessage(temp.pattern @ ": " @ temp.("pattern_" @ temp.pattern));
}
}

temp.deltaTime = timevar2 - temp.oldTime;
sendrpgmessage("Finished at " @ timevar2 @ "\nDelta time is " @ temp.deltaTime);

return temp.output;
}
String: "AUGCCCGTAUACGTA"
Init at approximately 1206544145.230022907
AU: 2
CC: 2
CG: 2
GT: 2
TA: 2
CGT: 2
GTA: 2
CGTA: 2
Finished at 1206544145.231662034
Delta time is 0.001600027

Crow
03-26-2008, 05:26 PM
NOTE: You will need to change the [] in the format to a percent sign, it won't let me post without giving me a bad request error. Look for this line:
echo( format( "String: []s, Occurances: []s", rc[0], rc[1] ) );

Tiny tip: If you post a "normal" post without any % signs, you can just edit it with quick edit afterwards and add them. Won't give an error.

Tolnaftate2004
03-26-2008, 07:26 PM
No comment on what I posted?

You've irritated me.
Yours is still O(n³).

e: The new standard set by DrakilorP2P, O(n²)
@Ziro: Will keep that in mind. ;)

Inverness
03-26-2008, 09:40 PM
Yours is still O(n³).

e: The new standard set by DrakilorP2P, O(n²)
What the hell are you talking about?

Increased speed of script:
function findpatterns(str) {
temp.time = timevar2;
temp.outc = {};
temp.out = {};
for (temp.i = 2; temp.i < temp.str.length(); temp.i ++) {
for (temp.e = 0; temp.e < temp.str.length(); temp.e ++) {
if (temp.e + temp.i > temp.str.length() - 1) {
continue;
}
temp.sub = temp.str.substring(temp.e, temp.i);
temp.p = temp.str.positions(temp.sub);
if (temp.p.size() > 1) {
if (temp.outc.index(temp.sub) < 0) {
}
}
}
}
echo("Time: " @ timevar2 - temp.time);
return temp.out;
}

DustyPorViva
03-26-2008, 09:43 PM
Dunno but it seems to show up in every single one of these exercises :/

Chompy
03-26-2008, 09:44 PM
Dunno but it seems to show up in every single one of these exercises :/

*points at HR and pfa*

hehe

Tolnaftate2004
03-26-2008, 09:49 PM
What the hell are you talking about?

This is big O notation, it is how we measure the efficiency of algorithms, without being hardware-specific. The O stands for order, which can take 2 meanings:
In mathematics, O(f) is for all terms order f and above. f is usually a function where the parameter is much less than 1 (error terms).
In computer science O(f) is the major factor in determining algorithm efficiency. f is a function of n, which is a whole number.

For example, the quicksort algorithm is O(n log n) [for every item to sort, it takes log base 2 of n loops to find its place].

cbk1994
03-26-2008, 10:18 PM
How about making Programming Exercise #4 something that we normal mortals stand a chance at? :whatever:

This isn't really that hard, you could do this! Think about it ;)

EDIT:

I fixed mine, it wins now. It goes something like this:

function onCreated()
{
echo( "time going in: 12412412412.1241023" );
findpatterns();
echo( "finished: 12412412412.124104" );
echo( "elapsed time: .000001" );
}

Inverness
03-26-2008, 10:26 PM
Well, after having this explained to me I'm kinda ticked that this wasn't even pointed out from the start.

If I were to optimize my script I would have used Drakilor's method which for me to do now would just be copying.

Chompy
03-27-2008, 12:02 AM
function findpatterns(string) {
temp.time = timevar2;
temp.out = 0;
temp.check = 0;

for (temp.i = 2; i < string.length()/2; i ++) {
for (temp.j = 0; j < string.length()/2; j ++) {
if (i+j+2 > string.length()-2) continue;
temp.letters = temp.string.substring(j, i);

if (check.index(letters) > -1) continue;
if (letters.length() < 2) continue;

temp.pos = string.positions(letters);
if (pos.size() < 2) continue;

}
}
echo("Delta Time: "@ timevar2-time);
return out;
}

Example1:

function onCreated() {
echo(findpatterns("TAPEREDATAPER"));
}
/*
output:

Delta Time: 0.000427007
"TA,2","AP,2","PE,2","ER,2","TAP,2","APE,2","PER,2","TAPE,2","APER,2","TAPER,2"
*/

Example2:

function onCreated() {
echo(findpatterns("AUGCCCGTAUACGTA"));
}
/*
output:

Delta Time: 0.000540971
"AU,2","CC,2","CG,2","GT,2","TA,2","CGT,2","GTA,2","CGTA,2"
*/

DrakilorP2P
03-27-2008, 01:36 AM
for (temp.i = 2; i < string.length()/2; i ++) {
Won't that miss the "AAAAAAAAA" pattern that occurs twice in "AAAAAAAAAA"?

Tolnaftate2004
03-27-2008, 02:04 AM
Well, after having this explained to me I'm kinda ticked that this wasn't even pointed out from the start.

This has always been the point of the Programming Exercises. Perhaps you need to think differently to solve this problem by better methods, rather than the obvious...

@Chris, you started your algorithm 355 years into the future? Also you fail at subtracting. :p

e: Here are exact (well, kinda) loop count functions for the algorithms (n=str.length()):
Inverness: http://img258.imageshack.us/img258/2690/d44f542b6de3d5d2b9bf3c1nt8.png
Where epsilon (the E) is a number representative of the other loops that lead to 'continue;'. The script could be adjusted slightly to eliminate this term.
DrakilorP2P: http://img87.imageshack.us/img87/3536/1c74ccc9db9c8d0566d612ekl4.png
It should be pretty obvious where I get the O(n^2) from...

cbk1994
03-27-2008, 03:39 AM
Chris, you started your algorithm 355 years into the future? Also you fail at subtracting. :p

PSHAW

Inverness
03-27-2008, 05:17 AM
This has always been the point of the Programming Exercises.Very nice of you GSTs to give programming exercises without explaining the point.

You also have yet to point out to others what functions they should be avoiding and why.
Perhaps you need to think differently to solve this problem by better methods, rather than the obvious...Like I've said, if your ranking system/method of measuring had been pointed out in the first place then maybe I would have made a script thats better on your ranking system instead of on Graal's.

You need to come down to the level of us mortals.

Tolnaftate2004
03-27-2008, 05:34 AM
Very nice of you GSTs to give programming exercises without explaining the point.

So, there is a lot of help around here dealing with syntax of scripting, but I don't see much discussion on best algorithms. Be this the case, I am going to every so often (when I feel like it) post some mathematically cool problem for you all to help each other solve and break some mental barriers.

...
Remember, the goal is to make this as efficient as possible with graal script.

You also have yet to point out to others what functions they should be avoiding and why.
Do you want me to rank them or tell you how efficient I think they are? x-x

Like I've said, if your ranking system/method of measuring had been pointed out in the first place then maybe I would have made a script thats better on your ranking system instead of on Graal's.

The theory behind big-O notation is that it should be an accurate measure of code efficiency regardless of language or hardware. Different servers run the same code in different time. Additionally, though big-O is only an approximation, it is most accurate for LARGE n.

I don't see what you're so upset about, you made a good contribution.

cbk1994
03-27-2008, 05:45 AM
I bet I could beat my previous time of 12 seconds by triggering my web server which would use awesome fast PHP to find patterns, then return them to Graal.

Slower than some, but surely faster than 12 seconds?

Inverness
03-27-2008, 06:41 AM
I'm upset because you're evaluating based on this big-O thing but didn't state that in the first place and explain it.

Anyhow, if the goal is "to make this as efficient as possible with graal script" then you should be measuring based on time to complete instead of your big-O thing since the hard-coded functions do have a notable speed difference.

DustyPorViva
03-27-2008, 06:46 AM
If it's going to be based on time they would all have to be done on one computer instead of us each executing them as our computer speeds can change the time.

Inverness
03-27-2008, 06:59 AM
If it's going to be based on time they would all have to be done on one computer instead of us each executing them as our computer speeds can change the time.I was assuming pfa would do that?

And could you "pfa" get yourself a new account or put the proper way to address you in your signature or something? I can't go calling you Tolnaftate2004 or a shortened version of it and I'm not sure if I should be calling you "pfa" either and didn't even know about that until the announcement that you became GST.

Tolnaftate2004
03-27-2008, 07:47 AM
Anyhow, if the goal is "to make this as efficient as possible with graal script" then you should be measuring based on time to complete instead of your big-O thing since the hard-coded functions do have a notable speed difference.

The hard-coded for-loop is some multiple constant faster than the graal script for-loop. This does not change the big-O.

If I pass a large enough string, an O(n^2) algorithm will always be faster than an O(n^3).

Here are some hints:
linear search: searching by iteration.
obj.index, obj.pos, obj.positions are most likely using linear search. What this means is that there is a 'hidden' for loop within script that use these functions. If N is obj.length() or obj.size(), obj.positions takes about N loops to find all occurences of your substring. obj.pos and obj.index on average take N/2 loops.

The 'in' operator when used with arrays is also using linear search (most likely). This again takes on average N/2 loops. Other uses of it are likely O(1).
obj.remove uses linear search as well.

Maybe more later... ;x

Chompy
03-27-2008, 06:34 PM
Won't that miss the "AAAAAAAAA" pattern that occurs twice in "AAAAAAAAAA"?

bah, didn't think about that :o

Inverness
03-27-2008, 09:04 PM
I bet I could beat my previous time of 12 seconds by triggering my web server which would use awesome fast PHP to find patterns, then return them to Graal.

Slower than some, but surely faster than 12 seconds?You need to like completely redo your script. My thing runs in 0.00138 seconds or so on average.

Tolnaftate2004
03-27-2008, 09:44 PM
You need to like completely redo your script.

Yes, this is why I did not bother calculating the loops in your script. :p

e: Just made the realization that obj.substring() also contains a hidden for loop. Which makes all big-O calculations I did wrong (you're all 1 order higher). :3

napo_p2p
03-27-2008, 10:25 PM
function findpatterns(str) {
temp.loc = 0;
temp.l = temp.str.length();
temp.results = new[0];

while (temp.loc < temp.l) {
temp.len = 2;
while (temp.len <= (temp.l - temp.loc)) {
temp.seq = temp.str.substring(temp.loc, temp.len);
temp.("count_" @ temp.seq)++;
if (temp.("count_" @ temp.seq) == 2) {
}
temp.len++;
}
temp.loc++;
}

for (temp.r: temp.results) {
echo(temp.r @ ": " @ temp.("count_" @ temp.r));
}
}

cbk1994
03-27-2008, 10:52 PM
You need to like completely redo your script. My thing runs in 0.00138 seconds or so on average.

I know, I'm not going to bother though since I couldn't beat anyone.

Inverness
03-27-2008, 11:00 PM
(you're all 1 order higher). :3So are you.

DustyPorViva
03-27-2008, 11:16 PM
function findpatterns(str) {
instances=oompaloompa(str);
return instances;
}

Tolnaftate2004
03-27-2008, 11:21 PM
So are you.

Now that's just uncalled for. :cry:

cbk1994
03-28-2008, 02:28 AM
function findpatterns(str) {
instances=oompaloompa(str);
return instances;
}

You so win.

zokemon
03-28-2008, 05:32 PM
You guys seriously having this much trouble? It's not that hard... Just boring; which is why I won't do it!

Also,
It's funny how many people here don't know what big-O is.

Tolnaftate2004
03-28-2008, 08:20 PM
You guys seriously having this much trouble? It's not that hard... Just boring; which is why I won't do it!

Also,
It's funny how many people here don't know what big-O is.

Give us pseudocode, then.

Rapidwolve24
03-28-2008, 08:55 PM
how many people here don't know what big-O is.

*Raises hand in embarrasment*

xXziroXx
03-29-2008, 08:01 AM
*Raises hand in embarrasment*

x2

Horrified
04-06-2008, 02:29 PM
Pffft!

3/0

Dumb noobs.

Chompy
04-06-2008, 02:35 PM
Pffft!

3/0

Dumb noobs.

o.o hmm?

oh, and when is exercise 4 coming? And could 4 have two difficults? One simplier and one harder?

04-11-2008, 01:59 PM
Give us pseudocode, then.

I got it working O(1) but forget sharing with you guys!

the previous is a lie but I'm tempted to make something ridiculous like O(n!) and declare it O(1) and laugh @ noobs who believe it

Tolnaftate2004
04-22-2008, 09:53 AM
Okay, well this thread has been dead for sufficiently long time. Here is what I came up with; it takes quadratic time.

function onCreated() {
temp.p = getpatterns("THISISATESTSTRING");
for (temp.s: temp.p) {
if (this.(@"h_"@temp.s) >= 2)
echo(temp.s @ ": "@ this.(@"h_"@temp.s));
this.(@"h_"@temp.s) = 0; /* clean-up */
}
}

function getpatterns(s) {
temp.l = s.length();
for (temp.i=0; temp.i<temp.l-1; temp.i++) {
temp.c = s.charat(temp.i);
for (temp.j=temp.i+1;temp.j<temp.l;temp.j++) {
temp.c @= s.charat(temp.j);
this.(@"h_"@temp.c) ++;
}
}
return temp.pat;
}

Perhaps there are still improvements to be made.

Inverness
04-22-2008, 10:35 AM
Maybe try doing it in a language that doesn't allow you to dynamically create new variables.

Tolnaftate2004
04-22-2008, 06:35 PM
Maybe try doing it in a language that doesn't allow you to dynamically create new variables.

So I would use a hash, and it would still run in quadratic time. Also, the instructions say we're making this as efficient as possible in graal script.

cbk1994
04-22-2008, 10:04 PM
Okay, well this thread has been dead for sufficiently long time. Here is what I came up with; it takes quadratic time.

function onCreated() {
temp.p = getpatterns("THISISATESTSTRING");
for (temp.s: temp.p) {
if (this.(@"h_"@temp.s) >= 2)
echo(temp.s @ ": "@ this.(@"h_"@temp.s));
this.(@"h_"@temp.s) = 0; /* clean-up */
}
}

function getpatterns(s) {
temp.l = s.length();
temp.pat.clear();
for (temp.i=0; temp.i<temp.l-1; temp.i++) {
temp.c = s.charat(temp.i);
for (temp.j=temp.i+1;temp.j<temp.l;temp.j++) {
temp.c @= s.charat(temp.j);
this.(@"h_"@temp.c) ++;
}
}
return temp.pat;
}

Perhaps there are still improvements to be made.

There are always improvements to be made, unless Stefan makes it, in which case it must be perfect. :rolleyes:

Programmer
04-22-2008, 10:28 PM
There are always improvements to be made, unless Stefan makes it, in which case it must be perfect. :rolleyes:

Even Stefan makes mistakes. You should see Kingdoms Debug and Zone Debug xD

Tolnaftate2004
04-23-2008, 02:12 AM
There are always improvements to be made...

This can be disproved.

Inverness
04-23-2008, 07:28 AM
There are always improvements to be madeMaybe for you, considering your level of skill, but the same does not go for others. :D

cbk1994
04-23-2008, 01:08 PM
Maybe for you, considering your level of skill, but the same does not go for others. :D

Yes, I have so much skill, it cannot be controlled.

THE POWER OF THE SUN IN THE CHRIS.

Tolnaftate2004
04-23-2008, 11:56 PM
I think it is feasible to shrink time to ~ aN + O(N log N). How might we go about doing this? How do we get a logarithmic factor in our order? How can this help us speed up our algorithm?

This post (http://forums.graalonline.com/forums/showpost.php?p=1382741&postcount=34) points out a pitfall in computing the algorithm efficiency. The questions above and this problem are related. How can we minimize the effects of substring()?