Friday 12 September 2014

How I Solved: Spoj IITKWPCC

You are given some points on a 2-D plane. You are to determine the number of Isosceles-Right Angled Triangles formed by these points. Here is a direct link to the problem.

At first the problem was looking a bit difficult. A brute-force approach would be to check all triplets of points and see if they form such a triangle, resulting in $O(N^3)$.

The number of points were about $10^5$ which implies you need a way better solution and this is my approach: Divide all the points into groups based on X-coordinate and Y-coordinate. This means that each point belongs to two groups, one pertaining to its X- and the other to its Y-coordinate.

For each point, consider its X-coordinate group and Y-coordinate group. We will count all those triangles that form a $90^{\circ}$ angle at the current point. Clearly, the other two points must come from the X- and Y- coordinate groups.

Now the question trims down to how efficiently we count such triangles. Although it may look like $O(N^2)$ for a moment, it can actually be done in $O(N)$! We have two perpendicular lines(X- and Y-coordinate group) meeting at the current point. Consider an imaginary hypotenuse moving away from the current point making an angle of $45^{\circ}$ with the lines. For every point of intersection which is an integer on the perpendicular lines, we check if both the points exist one in each group and then count it as a valid triangle.
Now, instead of moving the hypotenuse by a single unit, we directly move it to the nearest next valid point on the lines, whichever is minimum and check if the other point exists. Now, this drastically reduces the time-complexity. To give a more clear understanding, see the example below:

As you can see, we initially we move the hypotenuse directly to point 1. This gives a valid triangle, since point 2 also exists. Now we move the hypotenuse to next nearest point from 1 or 2 whichever is minimum, which gives 3 in this case. But since 3 doesn't have a valid point on the other line, we then move to the next nearest point which is 6, but this too doesn't have its corresponding point either. Next we move to 4 (or 5) and they make up a valid triangle.

In this way, no matter what the co-ordinates are, this whole operation is taking not more than $O(N)$. And remember since there are 4 different planes, the hypotenuse has to be moved in four-directions. This procedure has to be applied for every given point.

Time-complexity:

Now, coming to time-complexity, although at a first glance, it may look like $O(N^2)$, its actually less than that. It takes $O(NlogN)$ to sort out the points into groups and an amortized time of $O(k.N)$ for the counting process, where $k$ is the maximum number of points on a single line, either horizontal or on vertical line. Although, this may not be a tight bound, but still it suffices. The worst case happens when all the points form a complete square.
For a square of side $n$, there are as many as $$2.n.(n+1).(2n+1)\over3$$ number of right-angled isosceles triangles which is $O(N^3)$, but since $N=\sqrt {10^5}$(remember its a square), we can have at most $3.10^7$ triangles, which explains there is such a test-case since my code took $\approx 1.5$ seconds to execute. I hope you have understood my idea :)

Tuesday 11 February 2014

How I downloaded large files from proxy server which imposed file-size limit.

In this post, I will be describing about one of the methods I discovered to download large files from a server which actually imposed file-size limit. So, Intuitively what do you think I am gonna do? I will be downloading it by parts. But how?

At first, I started testing how is the proxy server actually able to detect such requests. I found that it merely checks the length field of the incoming packet and throws an error if its size exceeds the maximum specified value.

Then I went through the HTTP Request Protocol. At some stage I came to know that there is a special field in the header called Range. With this we can actually request the start and end points of the bytes in the file in zero-based manner.

For example, if you want to download a file of size say 50 bytes(that's too tiny now-a-days). You only want bytes from 34 to 43 say. Then the HTTP request looks as follows:

GET file_name.extension HTTP/1.1
....
....
Range: bytes=33-42
...
...
It starts from 33 because it is zero-based, meaning the first byte starts at zero and so on. Now, I am able to figure out how to download a file by parts. But another question remains. What will be the size of the entire file? How to figure this out?

This problem was solved when I looked at the response from the server. For instance, the response looks as follows when the above request is sent:

HTTP/1.1 206 Partial content
....
....
Content-Range: bytes 33-42/50
Content-length: 10
...
...
I think you should be able to figure it out from the above response. The total length of the file is send in the Content-Range field after the "/". So, first I request only 1 byte of data, which then gives me the length of the file and then proceed further to download it by parts.

We are ready with the idea and its a matter of coding the above idea. I have used the urllib2 from python, since I was too lazy to code it in C.

As an extension to this, I have used threads which increased the speed to large extent and it was as if I downloaded from my Local Area Network rather than the Internet.

As you may argue whats so special about this? There are many download accelerators that employ this method. But wait a minute. None of them employ the exact method I have described above. Indeed, well-known programs such as axel,wget and others failed to download the file when such constraint was introduced. So, I guess my idea is a bit better given these circumstances :).

Here is the code for my idea in Python:

import urllib2,sys,thread,time,tempfile,os
data=[]
def partial_download(url, st, en,idv):
 global data
# print 'Thread:',str(idv),' for ',str(en-st+1),'bytes'
 req = urllib2.Request(url)
 req.headers["Range"]='bytes='+str(st)+'-'+str(en)
 f = urllib2.urlopen(req)
 fd = tempfile.NamedTemporaryFile(delete=False)
 resp = ''
 while 1: 
  stt = f.read()
  if not stt:
   break
  resp += stt
 fd.write(resp)
 fd.close()
 data.append([idv,fd])
 print 'Thread:',str(idv),'finished getting ',str(en-st+1),'bytes to',fd.name

if len(sys.argv)<3:
 print 'Format:[url] [parallel_download_count]'
 sys.exit()
parallel_download_count = 1


parallel_download_count = int(sys.argv[2])

proxy = urllib2.ProxyHandler({'http': 'http://172.30.0.19:3128'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
link = sys.argv[1]
file_name = link.split("/")[-1:][0]
print file_name
#print link
req = urllib2.Request(link)
#first we need to know the content-length..

req.headers['Range'] = 'bytes=0-0'
f = urllib2.urlopen(req)
meta =f.info()
content_length = int(meta["Content-Range"].split('/')[1])
print 'File-size:',content_length

chunk_size = content_length/parallel_download_count

curr_count = 0
idc = 0
while curr_count+chunk_size<=content_length:
 thread.start_new_thread(partial_download, (link,curr_count, curr_count+chunk_size-1,idc))
 idc+=1
 curr_count += chunk_size

if curr_count+chunk_size>content_length:
 thread.start_new_thread(partial_download,(link,curr_count,content_length-1,idc))
 idc+=1

while len(data)<idc:
 time.sleep(1)

print 'Merging into single file...'
data.sort()
#file_type = meta['Content-Type'].split('/')[1]
fd =open(file_name, 'w')

for chunk in data:
 tmp_fd = open(chunk[1].name,'r')
 tmps = tmp_fd.read()
 fd.write(tmps)
 print 'Wrote',len(tmps),'bytes!'
 tmp_fd.close()
 os.unlink(chunk[1].name)
fd.close()

#print 'Length:',meta.getheaders("Content-Length")[0]