Our products can be used in ways that don't require much knowledge about the internet. You can just type in the address of the server you're connecting to, open an SFTP window and start transferring files. However, if you will be using the more advanced features of our products, such as tunneling, you will need to understand the basics of how the Internet is structured. This guide is an attempt at relaying some of that understanding. Show This guide is composed of the following sections:
IP addressesEvery computer connected to the internet has an Internet Protocol or IP address which identifies the computer on the internet. In the currently most widely used version of the Internet Protocol - version 4 - IP addresses are 4 bytes long and are expressed in the form nn.nn.nn.nn. Each nn is a number between 0 and 255. When you connect to a web server to browse a web page, the DNS name of the web server, e.g. www.bitvise.com, is automatically translated by the software in your machine to an IP address in the nn.nn.nn.nn form. This address is then used to connect to the actual web server. For example, the IP address of the server hosting fogbugz.bitvise.com at the time of this writing is 70.85.217.69. Our primary website, on the other hand, is hosted on several servers, and their IP addresses are 207.155.248.18, 207.155.248.31, 207.155.248.122 and 207.155.252.47. In a Windows Command Prompt session, you can discover the IP addresses associated with DNS names using the nslookup command: e.g. 'nslookup www.bitvise.com'. DNS namesIP addresses are difficult to remember, so the internet provides a translation service which translates memorable names into associated IP addresses. This facility is called the Domain Name System or DNS. You use DNS implicitly every time you type in an address such as 'www.bitvise.com' - your browser asks your operating system for translation into an IP address, and the operating system either returns a cached result, or inquires with a DNS server operated by your ISP. This server in turn either returns a cached result or inquires with another DNS server. SubnetsNo computer is directly connected to every other computer on the internet. Instead, each computer is a member of one or more subnets. Subnets, in turn, are connected to each other by machines called routers or gateways, which belong to multiple subnets, forwarding internet traffic from one subnet to the other and reverse. In order to successfully communicate with other computers throughout the internet, your computer must know what subnet it is part of, so that it knows what IP addresses are outside your local subnet and must be relayed through the gateway. In addition, your computer must of course also know the IP address of the gateway. Typically, a subnet is a group of consecutive IP addresses, such as all IP addresses from 11.22.33.0 to 11.22.33.255. Types of IP addresses and subnetsThere are three major types of IP addresses (or subnets) that you need to be aware of.
TCP and UDPThe Internet Protocol itself is a relatively rudimentary protocol which provides only the capability of delivering small chunks of data to other computers. The Internet Protocol does not provide reliability: chunks of data that are sent using the Internet Protocol may be lost. They also may arrive in an order different to the order in which the chunks were sent. For some types of data transfer, the (un)reliability afforded by the Internet Protocol is fine. When streaming video, for example, it does not matter if chunks that make up intermediate frames of the video are lost. What matters is that most of the data arrives relatively quickly, allowing the video to be played with reasonable quality and on the fly. The User Datagram Protocol, or UDP, is a simple protocol layered on top of the Internet Protocol that provides this level of reliability. UDP is used for purposes such as relaying video and audio streams as well as for networked games; all environments where responsiveness and fast delivery are more important than perfect reliability. For other types of data transfer, however, this level of reliability is not enough. When transferring a file, for example, you want to transfer all of its contents in perfect order and integrity; you don't want any chunks of it to accidentally be lost. When accessing a web page, likewise, you want all the text to be transferred without error. Data transfers that require this higher level of reliability use the Transmission Control Protocol, or TCP. Like UDP, TCP is a protocol layered on top of the Internet Protocol, but it is more complex than UDP: it contains mechanisms to ensure that data is received in order and that, if any chunks are lost, they are resent. The reliability provided by TCP has costs in terms of responsiveness. Before any data can be sent using TCP, the two computers must engage in a short back-to-forth to establish a TCP connection. If any data are lost during transmission, delivery of subsequent data awaits until the data that were lost are retransmitted and delivered. When there is a high rate of data loss on a connection, this may cause transmission to be jerky. The majority of widely known protocols used on the internet are layered on top of TCP. These include:
Direction of TCP connectionsTCP connections are like phone calls: they are always initiated by one party and accepted (or not) by the other. The computer that originates the TCP connection is usually the client, and the computer that accepts it is usually the server. Sometimes, notably in the FTP protocol, a secondary TCP connection will be established in the reverse direction, from the server to the client. But, in protocols other than FTP, connections are almost always initiated by the client. Regardless of the direction in which a TCP connection is established, data can always flow both ways. However, the direction of the TCP connection matters because it determines who the initiating party is, and is also used by network components to impose rules on whether a connection can be established. PortsIn order to handle multiple simultaneous connections with the same computer, your computer must be able to distinguish them. To do so, each connection is assigned two port numbers, one at each end point of the connection. A connection is then uniquely identified with four pieces of information: (1) local address, (2) local port, (3) remote address, (4) remote port. Valid port numbers are between 1 and 65535. The party that originates a TCP connection usually selects a local port number at random. On the other hand, the port number of the party that accepts the connection must be known in advance by the party that originates the connection. You can confirm this by executing 'netstat -n' from a Windows Command Prompt just after loading a web page in your browser. For example, this excerpt from 'netstat -n' output was taken just after opening www.bitvise.com in a browser. Proto Local
Address Foreign Address State The above output indicates an established TCP connection with local address 10.10.10.123, local port 21681, remote address 207.155.248.122 and remote port 80. The connection was initiated by the local machine, therefore the local port number 21681 was randomly selected, whereas the remote port number 80 is the well-known HTTP port. This is the port where the vast majority of web servers accept connection, so even when access to other ports is blocked, connections to port 80 will very likely be permitted. Other well-known destination ports are:
On Windows, a more exhaustive list of well-known ports can be found in the file \Windows\System32\Drivers\etc\services (open it with Notepad). Connecting to the internet from officeIn an office environment, your computer will most likely be connected to a subnet in one of the private address ranges. This means that your computer will have an IP address not unique throughout the internet, so it cannot communicate with other computers on the internet directly. However, the network administrators at your office have most likely applied one of the following solutions to allow you to access the internet.
There is also a number of office environments where each computer has a separate, own public IP address. These are simple and involve no NAT or proxy servers as outlined above. Connecting to the internet from homeFrom home, you usually connect to the internet through a modem - whether it is phone, cable, ISDN or DSL. In any case, you can either hook the modem directly to your computer; or, if you have multiple computers, you can buy a router, connect the router to your modem and your computers to the router.
In most cases, you will be provided a single public IP address by your internet provider. Sometimes this IP address will be fixed; this is called a static IP address. In other situations, the IP address will periodically change; this is called a dynamic IP addres. With dial-up modems, you will get a different public IP address every time you dial up. With DSL and cable modems, your IP address may change at a predefined time every day or night. Dynamic IP address issuesThe following issues correspond with a continuously changing IP address. Whenever your public IP address changes, all ongoing TCP connections to and from your machine are terminated and must be reestablished using the new IP. Since the IP address of your computer is unpredictable, it is difficult for others to connect to it. If you want to host any kind of network-accessible service on your machine, you need to either use a dynamic DNS service; this works by allocating you a DNS name which is regularly updated to reflect your changing IP address; or you need to implement a more pedestrian solution, such as configuring a program on your computer to periodically connect to another server and store your current IP address there, making it available for retrieval. If you want to host a service on your home machine and find that your IP address changes periodically, the best way around this problem is to ask your ISP to grant you a static IP. They will frequently agree to do this free of charge. If this is unavailable, you can use a dynamic DNS service. Virtual servers - port forwarding at the routerIf you want to make a server accessible from the internet, but the computer on which the server will be based has only a private subnet IP addresses, there is a solution. Usually, the router which connects the private subnet to the internet can be configured to forward all incoming connections on a certain port to one of the computers inside the private network. This is called port forwarding (not the same thing as SSH port forwarding) or a 'virtual server' facility (although the server is quite real; it's just its IP address that is not). This setup generally works just fine, but there is one thing to remember. The IP address by which the server is known to internet clients is not the IP address that the server machine actually has. This distinction between the public IP address at the router, and the private IP address of the actual server machine inside, frequently arises in SSH connection tunneling, leading to incorrect configuration if not properly understood. FirewallsModern computers run a large number of local services (such as Windows file and printer sharing) which accept connections on various port numbers, but are meant to be accessible only from locally trusted subnets. Preventing the wider internet from accessing these services in possibly malicious ways is the purpose of ingress firewalls. In organizations, gateways that connect the local subnet to the internet usually feature an ingress firewall. This firewall should normally be configured to allow no connections into the subnet, except connections to servers that must accept connections from the internet. At home, your ISP will usually not protect your PC from malicious access from the internet. Instead, this task must be performed by a firewall installed on your home router, or if your computer is connected to the internet directly, a software firewall in your machine. Windows XP comes equipped with such a firewall; you should use it. Software firewall solutions are available for earlier versions of Windows. There is another type of firewall called an egress firewall, or a firewall that filters outbound connections from your machine to the internet. This is generally software which tries to control what programs on your machine access the internet. This is intended to block malicious software from doing too much damage after it has already infected your computer. However, cleverly written malware can fool an egress firewall like this with fairly simple and straightforward deceptions. The only real medicine against malware is therefore to prevent it from infecting your computer in the first place. |