c# webcrawling basics

To download the metadata and the content of a website you have some opportunities like this for example:

HttpWebRequest httpWebRequest = (HttpWebRequest)HttpWebRequest.Create(url);
WebRequest webRequest = (WebRequest)httpWebRequest;
WebResponse webResponse = webRequest.GetResponse();
StreamReader sr = new StreamReader(webResponse.GetResponseStream());
string htmlStr = sr.ReadToEnd();

While doing this you create a http-web-request and a web-response. With the stream-reader you will get the metadata and the content of the website and are able to write this into string.

ACID

To understand a transaction in a database you can think of a single unit of database-processing like updating, deleting, …

You can start and complete a transaction. If there is a problem is between you can go for a rollback and everything is like the state of the beginning. The ACID-properties are properties of such an transaction. ACID is:

A = atomicity
Atmoicity tells that a transaction is there or is not. You just can go for the whole transaction or everthing will turn back to the start of the transaction via a rollback. So we just can have all operation or none. This is maintained by the transaction management component.

C = consistency
Consistency means nothing else than correctness. The best example for this is a financial transaction form account A to account B. For this we have to decrease the balance of account A by the value X. Than we add the value of X to the balance of account B. Consistency now says that there is no opportunity to run into an error and having the decrease account A without a increased account B. The responsibility to have consistency has to be assumed by the programmer.

BTW: In new databases like facebook uses for example, the database-consistency is compromised to run faster queries.

I = isolation
Isolation means that if there are two transactions (1 and 2) parallel completed that this should be the same result as if you run first transaction 1 and than transaction two and the other way around. So every transaction should not be affected by another transaction. This tasks is done by the concurrency control management.

D = durability
This property says that transactions and their changes must not be lost due some database failure. So whatever will happend the database will not loose some changes. The responsibility for this is in the recovery management.

process models

To define the best way how to work on projects there are some basic process models that are used every often:

waterfall model
The waterfall is split all task into the following parts:

This phases define probably the most-used standard-pattern to realize a project. You start by checking out the requirements an go on by creating the first design for the implementation. Well done you give the design out to let someone implement it. To manage the quality you verify that everything is like needed and the product is ready to get used by the customer. Anyway your task is not done yet because you have to maintenance the product. For me this is the standard of every standard for implementing some developments.

spiral model
The spiral model is made to have small cycles to have more agility while developing. The following graphic will show you how this works:

The model shows the different releases in cycles and how you work in every cycle. In every cycle you check out the needed fixes and changes requirements again and adapt your development to these changes. This will make sure that the final version is the sames that the customer wish to. Because of this re-management of everything you will need more time – but the result shine on you.

You can find the orginal-one here: Special Report CMU/SEI-2000-SR-008, July 2000.

v model
The v model provides for every implementation a single test. This concept makes sure that every single task is well done to avoid any kind of mistakes while the customers use. In this framework of implementation looks like this:

So basically you try to control every thing phase that is done on the left side by using different test-methods. For my self I do not think that you always should go with something like this because it is probably an overhead to test every single level. In some cases this is a good idea but generell I would not recommend this to you in fact to such a high test-time-expenses.

kanban
The last one that I would like to show you is the kanban. Kanban is a great system as well to work with which is like an alternativ to SCRUM. Here you have a backlog, a state of developing, an test-state and for sure the state of completion. For this a lot of users use a board that shows you what tast is in which state. With the most sofware-systems for this you can just drag and drop the task into the different states. With this you always have a great view of what is your team doing right now and the state of the task. This is pretty useful if you work in a team of developer with 3 to 8 team members. If you want to implement this I would invite you to check out the SCRUM system as well.

basic diagrams for developer

Today I want to describe three most used basic diagrams that every developer should know:

# flow chart
The flow chart is easy to read and do not need any further explanation. It is important to know the symbolic (compare with example: start -> input -> output -> process -> case -> subprogram -> loop with process between -> end). The example is done with “PapDesigner”. I can recommend this one. 🙂

coding_flowchart

# Nassi–Shneiderman diagram
The Nassi-Shneiderman diagram is another way to visualize code. Comparing to the flow chart you can not go back if “Code is shit”. Instead you have to handle the this with a bottom-controlled-loop. As the flow chart does you read it form top to bot. Here is the an example that I did with this tool:

To model a database you always go with an entity-relationship-diagram (or entity-relationship-model). Here I want to describe the two most common notation.

# class diagram
A class diagram is one of the most used modeling-opportunities if we talk about object-orientated-programming. For this you will draw a rectangle for every class and write down the name of the class into the head of the rectangle. After this you you have to define variables and than the methods. All of them need a type, the parameter for methods and constructors and the a link to the live-time. For this we have 3 important:

  • – for private things
  • # for protected things
  • + for public things

After doing this we have to describe the relations. for this we decide between the following types:

  • association: This means that you have a relation between two tables. You mark this just with a line. Use this if you say: Table A has table B
  • aggregation: This says that table A is part of table B. Use this if you can say: Table A consists of table B. For this draw a diamond into the relation at table A.
  • composition: This is like an aggregation but in this case it is depending on existence. So if there is not entry of table A there is no entry in table B. For this fill the diamond black.

Anyway you always have to define the relation with 1, 0..1, 0..*, 1..*, 1, 1..6, …

For more check out Wikipedia.

osi-model

Again and again I read somethings about the osi-model without knowing it. This cause that I can not understand the things correctly. So I will describe the osi-model hoping to remember it and the different layers:

The user:
– layer 8: the user (this layer is more or less a joke and does not exists)

The application:
– layer 7: application layer (http, ftp, …)
– layer 6: communication layer or presentation layer (http, ftp, …)
– layer 5: session layer (http, ftp, …)

The transport:
– layer 4: transport layer (TCP/UDP, …)
– layer 3: network layer (IP, …)
– layer 2: data link layer (MAC, …)
– layer 1: physical layer (ethernet, …)

Layer 1 to 4 will manage the transport of all packages. Layer 5 to 7 provide the application-data-flow. Layer 8, which does not exists is the user. If it is a layer-8-problem than the it is the faul of the user.

visual studio shortcuts for debugging

First of all it is necessary to check out if you really want to debug while running the code. Anytime I look a developer starting the application I see him using the full debug-mode. The full debug-mode is slower and has some parts to do that you do not need if you do not want to debug actually. So first thing that I recommend to you is using [Ctrl + F5] to run the code. As well do not use a mouse. You are faster if you work with your keyboard as much as possible. If you really need to go to the debug mode you can use just [F5] do start the full-debug-mode. To set breakpoints you can use [F9].

While debugging you three basic ways to go throw the code. The first one is used by [F10] and will go over the next statement. To step into a function or a method you can use [F11]. To step out of a function or a method you can use [Shift + F11].

One further thing: To go to the next breakpoint you can use [F5] as well.

most important ports #tcp/udp

Today I want to describe the most important ports that a developer probably should know about. So check this out:

  • 20: ftp-data (file transfer; default data)
  • 21: ftp (file transfer; control)
  • 22: secure shell
  • 23: telnet
  • 25: simple mail transfer
  • 53: domain name system (DNS)
  • 67: boostrap protocol server (dhcp server)
  • 68: boostrap protocol client
  • 69: tpft (trivial file transfer protocol)
  • 80: http
  • 110: post office protocol – version 3 (pop 3)
  • 119: network news transfer protocol
  • 123: network time protocol
  • 137/138/139: NetBIOS
  • 143: internet message access protocol (imap)
  • 161/162: simple network managment
  • 443: https
  • 445: Microsoft DS
  • 465: smpts (smtp secure)
  • 500: ip-security vpn
  • 587: simple mail transfer submission (smtp)
  • 993: impas (imap4 over tls/ssl)
  • 995: pop3 over tls/ssl
  • 989/990: ftp over tls/ssl
  • 3389: remote desktop