Blog About Me Posts By Tags Subscribe Elastic Search My Sessions Terraform

Pavan Kumar Aryasomayajulu


Interesting Stackoverflow stats and a place to learn SQL Querying

Hi,
Did you ever think like it would be nice to learn and execute SQL queries on 381GB+ data?

https://data.stackexchange.com/ is the place where you can execute an unlimited number of queries and find out interesting stats on Stackoverflow dataset?

I did write a few queries to find out interesting stats on users from my city (Visakhapatnam). You can also play with the site and learn SQL querying and at the same time find out nice stats.

Some of my queries: https://data.stackexchange.com/users/32567/pavan-kumar-aryasomayajulu

1. How to use this dataset for learning purposes?

https://data.stackexchange.com/stackoverflow/query/1283225/get-top-questions-by-score
For example, I want to find out all questions with a particular tag. So how does this help me, So I’ll sort this query based on “score” count in desc order. Score count is nothing but a number of upvotes for a question. So Questions with more upvotes are common problems related to that tech stack or they are some must-know basics in that area.

This way for sure I’ll try to learn in and outs about these top questions

select id,score,title,FavoriteCount,body,viewcount from posts where tags like '%elasticsearch%' and score>1 order by score desc


2. I am trying to get active users with more questions/answers in my city.

https://data.stackexchange.com/stackoverflow/query/1283632/get-top-active-stackoverflow-contributors-in-your-city

Select * from (select (select count(id) from posts where owneruserid=u.id) as TotalPosts,*
from users u where lower(location) like '%visak%' and Year(LastAccessDate)=2020 ) a
where TotalPosts>0
order by TotalPosts desc,LastAccessDate desc

/*
select p.totalPosts,*
from users u inner join ( select distinct owneruserid,count(id) over(partition by owneruserid) as TotalPosts from posts) p
on u.id=p.owneruserid
where p.totalPosts>0 and lower(location) like '%visak%' and Year(LastAccessDate)=2020
order by p.TotalPosts desc,LastAccessDate desc


select * from (select distinct t.*,count(p1.id) over(partition by p1.owneruserid) as totalPosts from users t 
inner join posts p1
on t.Id=p1.owneruserid
where lower(location) like '%visak%' and Year(LastAccessDate)=2020) a
where totalPosts>0 
order by TotalPosts desc,LastAccessDate desc

*/


3. What is your current position in terms of number of questions answered in your city

with test1 as (select distinct
u.id,DisplayName,reputation, location,
count(p.id) over(partition by u.id) as [questions answered]
from users u left join posts p on u.id=p.owneruserid
where lower(location) like '%visak%'
)


/* Select row_number() over(order by [questions answered] desc) as rownum,*,sum(reputation) over() as [Total Stackoverflow reputation In your City]
from test1 order by [questions answered] desc

*/

select * from (Select row_number() over(order by [questions answered] desc) as [your position],*,sum([questions answered]) over() as [Total Stackoverflow questions answered In your City]
from test1 ) as t1 where id=1415739



Apart from these you can also find more queries that are already present and written by others

If you have any interesting stats or other purposes please do share with me in comments section.

#stack exchange data science
#stack exchange data explorer



Thanks,
Pavan Kumar Aryasomayajulu