Monday, August 15, 2011

Some information clarification concerning new changes to the program...

Hello all,

Well I've been doing some research and asking a lot of questions hoping to find a way to incorporate new changes to the program.  These changes cover inserting transportation tiles, transfering data back and forth from Basic to assembly, tracking movement from assembly within Basic, and being able to save data from the assembly program while in BASIC.

I've been conversing with Beth and the next few blog will show what has transpired through email between Beth and myself.

Enjoy...
____________________________________________________________________

Hi Beth,

I wanted to give you a update on what I've been doing and how the program works so far.
A short time ago I told you that I figured out the movement part for the game and it was a lot faster than before. Here is the YouTube link to the video I made for the second movement test.  It’s pretty cool!!!!!


As you can see it is faster than before but I know I can make it faster if I can only figure out the memory shift part of the code. I have an idea but I need to really sit and think about it to get it right.

The assembly code for this video works as follows:
1.  I have a subroutine that initializes the graphics on the screen and clears it to black, leaving the four text lines on the bottom.
2. The next subroutine initializes all the variables and the row and column values for where I want the upper left corner of the screen to start.
3. I setup data string names for the tile reference values that correspond to each map line, such as, MAP1, MAP2, etc...  These hold the tile values such as 3 for trees, 4 for plains, 9 for mountains, etc..  These are all set in a serial fashion with one tile reference after the other.  Since the map is 180 tiles wide I had to setup each tiles value one after the other till I had all 180 tiles referenced in each data MAP line.
4. Now I run a subroutine that reads down to the row value, which I start at MAP7, and then it read to a column value, which is a 10. This gets me the starting upper left corner tile reference point.
5.  Now I run a subroutine that loops the first 20 tile values in the data string and places them in a MAPSTORE array.  It then drops down to the next MAP data string and reads the next 20 tiles values and adds them to the array.  When all 200 tiles are in the array I run the next subroutine to add the center character tile, which I do by replacing the 105 tile with the shape reference of the troll.
6.  Next, I loop through each tile from 0 to 199 and setup the shape data for that individual tile. So I create two 160 byte arrays called TEMP and TEMP1.  These two arrays hold the data for the 10 tiles that go from top to bottom, starting on the left side of the screen.  They are 160 byte because they represent each graphic line from the top of the screen to the bottom of the screen.  This leave the lower portion of the screen for the 4 text lines.  Also I use two temp arrays because this lets me draw both A and B bytes of the tiles at one time instead of just the 1 byte at a time.
7.  Now I draw the first 10 tiles on the screen and when they are complete I read the next 10 tiles into the TEMP arrays and draw them next.  I loop this for the 20 tiles across the screen.
8.  Once all the tiles are drawn I have four subroutines that are used for the movement of UP, DOWN, LEFT, and RIGHT.  These are called from a basic program using their memory location with a CALL statement.
9. The movement is just incrementing or decrementing the ROW or COLUMN value and running through the initial subroutines for the read and setup parts.  I know I have it redrawing all 200 tiles again from scratch but I haven’t figured out how to do the 90% yet, which I know will be faster.  Within these subroutines I also do a comparison check for the mountain tiles and the ocean tiles to make sure I can’t walk through them!
This is a very quick and dirty write up of the process and I probably could code the game cleaner but I know you get the idea.  If you have any input as to the process I’m using and can think of anything that would make it faster or cleaner.
If was funny, because I had just figured out how to move around the screen after a LOT of trial and error programming and boy was I on cloud nine! You cannot believe how frustrating it is to change your program, wait 10 minutes to compile it, and then make changes to the basic program to call the correct memory location for the movement keys, and BAM… the program shows scattered tiles all over the screen! As you can imagine this is a very slow and grueling process!   Once I thought I was ready to start entering in all the data lines for the world map, I realized this was going to take up a lot of space!  I had to use a single byte for each tile reference, which as you said is a huge waste of space!!!  So due to this process I was having to enter line after line of data strings each map line starting with all ocean tiles on the top of the screen and going down map line by map line 20 data bytes per data string.  The entire map is made up of 4 main maps that are side by side and 180 tiles wide give or take a few ocean tiles.  Still on my programming HIGH from finally figuring out to movement process, I only got down to about the 24th MAP data line and it gives me a memory space error.  CRAP!  I can’t believe my eyes!  I’m thinking to myself, “yeah, I figured it out and now all I have to do is enter all the map data and test is out!”  WRONG!  I was still 60 lines away from just getting the entire map data entered!
So that brings me to the point I’m at right now which is reprogramming a big part of my code to be able to now read compressed data strings!  I think I understand how to do it and I’ve been testing some new code.  I know I need to use the ASL and RLS functions.  I also read that you can do each 4 times to get the 4 bit shift!
I want to thank you again for your wisdom and advice!  I like how you give me just the right information to get me going in the right direction yet you don’t just give me the code.  This works out great because your email updated are like puzzles that I need to figure out and I am learning more about programming assembly when I have to sit down and really think about what the program is doing and what I need to do to get it to work correctly.
God bless you!  You are an incredible help and I just hope you stick it out with me because I am going to have more questions as I overcome each breakthrough and move on to the next section of the game.
I will let you know when I figure out the new code to read the compressed data strings.  I’m close, very close and the amazing part is I think I really understand how to program in assembly language.  I’m able to code right on the screen and see in my head what the program should do once it is assembled.   Sometimes it does wacky things but more and more it does exactly what I thought it would do!  This is awesome!
Thanks again,
Joe
+++++++++++++++++++++++++++++++++++++++

Joe,

Nice video! It’s pretty cool to see the game running like that, brings back memories for me! J

And I’m glad I’ve been able to point you in the right directions to figure things out. It’s super-satisfying to crack your head against something, finally have the lightbulb go off and then see the results of your understanding, isn’t it? There are things I love about programming!

One of the great thing about the Apple 2 is just how accessible and understandable it all really is. And all of the concepts learned there are still tremendously relevant in programming today. I often feel bad when mentoring kids coming out of school, that have no idea of what bits really are. They don’t understand having to keep track of memory, limited resources, how to optimize or solve these types of puzzles. They have to learn that stuff on the job, and it’s hard to do. Having learned it in the trenches on an Apple…well…there’s not really a better learning environment, imho.

Another thought I’ll leave you with – optimization. It’s something you’re obviously thinking about. Compression is a form of size optimization. Making the screen draw faster is a form of speed optimization. Usually you can optimize for size or speed, but not both. Then there are algorithmic optimizations – like the 90% thing. That’s a high-level, write a different, faster approach to solve the entire problem. Those are always worth doing before trying anything else.

However, eventually you’ll get to the point where you’ve optimized for speed vs. size as best you can, and you’re working with the best algorithm you can. At that point if you want it faster, you’ll have to start counting cycles. I’m only saying this to put it in the back of your head, for your awareness. You may never need to go there on this project, but it would be educational for you to do it at some point with at least a little chunk of code. What I mean by counting cycles is you take your tight loops of assembly and figure out how many clock cycles that loop takes to execute. You can look that up, each opCode has a fixed cost.


So, looking there I see things like this:

LSR (Logical Shift Right)

Affects Flags: S Z C
MODE           SYNTAX       HEX LEN TIM
Accumulator   LSR A         $4A  1   2
Zero Page     LSR $44       $46  2   5
Zero Page,X   LSR $44,X     $56  2   6
Absolute      LSR $4400     $4E  3   6
Absolute,X    LSR $4400,X   $5E  3   7

See how shifting left on the Accumulator is WAY faster (TIM) than doing it anywhere else? Good to consider when writing that algorithm.

A quick example: how long does it take to divide by 16 (right shift 4 times) vs. dividing by 10? I am using this example since you asked about it earlier in regard to the compression algorithm. Well, you count up the opCodes. That LSR will take 4x2 = 8 cycles. Pretty good. However…there’s no way to divide in 6502. You have to write a program to do it. I looked one up:

A SIMPLE DIVISION PROGRAM

10 ;
20 ;DIVISION.SRC 
30 ;
40  *=$0600 
50 ;
60 DVDL=$C0 ;LOW PART OF DIVIDEND
70 DVDH=$C1 ;HIGH PART OF DIVIDEND 
80 QUOT=$C2 ;QUOTIENT
90 DIVS=$C3 DIVISOR
100 RMDR=$C4 ;REMAINDER 
110 ;
120  LDA #$1C ;JUST A SAMPLE VALUE 
130  STA DVDL
140  LDA #$02 ;THE DIVIDEND IS NOW $021C 
150  STA DVDH
160  LDA #$05 ;ANOTHER SAMPLE VALUE 
170  STA DIVS ;WE'RE DIVIDING BY 5
180  ;
190  LDA DVDH ;ACCUMULATOR WILL HOLD DVDH 
200  LDX #08 ;FOR AN 8-BIT DIVISOR 
210  SEC 
220  SBC DIVS 
230 DLOOP PHP ;THE LOOP THAT DIVIDES 
240  ROL QUOT 
250  ASL DVDL 
260  ROL A 
270  PLP 
280  BCC ADOIT 
290  SBC DIVS 
300  JMP NEXT 
310 ADDIT ADC DIVS 
320 NEXT DEX 
330  BNE DLOOP 
340  BCS FINI 
350  ADC DIVS 
360  CLC 
370 FINI ROL QUOT 
380 STA RMDR 
390 RTS ;ENDIT

Okay…So I’m not even going to try to add up the cycles there. Maybe 100 to divide by 10? Division & Multiplication are both VERY expensive on the 6502, so it’s always best to work with shifts and base 16 (that’s an important lesson right there!).

Anyway, to pedantically finish out my original thought on optimizing – so the first divide by 16 takes 8 cycles, the divide by 10 takes 100. Then consider that you’re in a loop, where you are going to do that shift or divide to uncompress 10 full rows (or 1800 tiles). That’s 14400 cycles vs. 180000 cycles. Let’s say you’re shooting for 5 frames per second. That’s 72k cycles/sec of cost vs. 900k cycles/sec. And a 6502 can only do about 500k cycles/second! So right there you can see the difference that little optimization makes in what you can actually do – and you can directly quantify it to real performance gains.

All of this is to say – once you have a good high-level algorithm working, figure out where you can speed up loops, copies, divides, shifts and other places that are done many times per second. And figure out the cost the way you currently have it implemented vs. what you are planning on changing it to. You may find the tradeoff not worth your time to implement – or you may even discover your cool new way of doing something will actually be slower in total cycles!

Anyway, best wishes and keep it rolling, it’s great to see it coming together!

-          Beth

No comments:

Post a Comment