Secret Debian Internals



Similar documents
Search and Information Retrieval

Rocoto. HWRF Python Scripts Training Miami, FL November 19, 2015

How to use the UNIX commands for incident handling. June 12, 2013 Koichiro (Sparky) Komiyama Sam Sasaki JPCERT Coordination Center, Japan

USING MAGENTO TRANSLATION TOOLS

Tor Exit Node Block Scripts

Open Source Monitoring

Hadoop Streaming. Table of contents

Secure Shell Demon setup under Windows XP / Windows Server 2003

Configuring MailArchiva with Insight Server

Version Control with. Ben Morgan

General principles and architecture of Adlib and Adlib API. Petra Otten Manager Customer Support

escan SBS 2008 Installation Guide

Introduction to Shell Scripting

Linux Syslog Messages in IBM Director

IceWarp Server. Log Analyzer. Version 10

How to use Certificate in Outlook Express

A Tutorial Introduc/on to Big Data. Hands On Data Analy/cs over EMR. Robert Grossman University of Chicago Open Data Group

SQL Server Replication Guide

SEO - Access Logs After Excel Fails...

Entrust IdentityGuard Comprehensive

Cabot Consulting Oracle Solutions. The Benefits of this Approach. Infrastructure Requirements

Interwise Connect. Working with Reverse Proxy Version 7.x

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

Hadoop and Map-Reduce. Swati Gore

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

XpoLog Center Suite Log Management & Analysis platform

CTIS486 Midterm I 20/11/ Akgül

Amazon-style shopping cart analysis using MapReduce on a Hadoop cluster. Dan Şerban

shodan-python Documentation

Symom Documentation. Symom Agent MRTG. Linux. Windows. Agent. Agent. Windows Agent. Linux Agent

Presented by: CSIR-KNOWGATE. KNOWGATE KNOWGATE Website: knowgate.niscair.res.in

Introduction to Git. Markus Kötter Notes. Leinelab Workshop July 28, 2015

Scaling up = getting a better machine. Scaling out = use another server and add it to your cluster.

Using New Relic to Monitor Your Servers

Iotivity Programmer s Guide Soft Sensor Manager for Android

An Eprints Apache Log Filter for Non-Redundant Document Downloads by Browser Agents

Chapter 5. Learning Objectives. DW Development and ETL

Archiving File Data with Snap Enterprise Data Replicator (Snap EDR): Technical Overview

Software Testing. Theory and Practicalities

A Time Efficient Algorithm for Web Log Analysis

CSCI 5417 Information Retrieval Systems Jim Martin!

The Pentester s Guide to Akamai

Using the MetaMap Java API

SAP BusinessObjects Query as a Web Service Designer SAP BusinessObjects Business Intelligence platform 4.0

UVOS WEB REGISTRATION EXTENSION MANUAL

Here, we will discuss step-by-step procedure for enabling LDAP Authentication.

Oracle EBS Interface Connector User Guide V1.4

Automating Document Creation with eforms and Mail Merge

Installation Guide. Copyright (c) 2015 The OpenNMS Group, Inc. OpenNMS SNAPSHOT Last updated :19:20 EDT

CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei

AXIS 7000 FAX Server Integration

Recovering and Analyzing Deleted Registry Files

Troubleshooting Mac OS X Server Tips and tricks

Log Correlation Engine Backup Strategy

Shiny Server Pro: Regulatory Compliance and Validation Issues

EMC Documentum Webtop

Benchmarking Citrix XenDesktop using Login Consultants VSI

Introduction to Python for Text Analysis

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Recommended Solutions for Installing Symantec Endpoint Protection 12.1.x in Shared and PvD Virtual Machines

EFFECTIVE STRATEGIES FOR SEARCHING ORACLE UCM. Alan Mackenthun Senior Software Consultant 4/23/2010. F i s h b o w l S o l u t I o n s

IceWarp to IceWarp Server Migration

Replicating File Data with Snap Enterprise Data Replicator (Snap EDR)

Central Administration QuickStart Guide

Data Intensive Computing Handout 5 Hadoop

Continuous Inria

How to Build an RPM OVERVIEW UNDERSTANDING THE PROCESS OF BUILDING RPMS. Author: Chris Negus Editor: Allison Pranger 09/16/2011

From 1000/day to 1000/sec

Sorting. Lists have a sort method Strings are sorted alphabetically, except... Uppercase is sorted before lowercase (yes, strange)

White Paper. Fabasoft app.test Load Testing. Fabasoft app.test 2015 Update Rollup 2. Fabasoft app.test Load Testing 1

AAI for Mobile Apps How mobile Apps can use SAML Authentication and Attributes. Lukas Hämmerle

Implementing MDaemon as an Security Gateway to Exchange Server

freeradius A High Performance, Open Source, Pluggable, Scalable (but somewhat complex) RADIUS Server Aurélien Geron, Wifirst, 7 janvier 2011

MAX_RMAN_08137_IGNORE=5 DISK_RETENTION_POLICY='RECOVERY WINDOW OF 7 DAYS'

Secure Web. Hardware Sizing Guide

Mapping ITS s File Server Folder to Mosaic Windows to Publish a Website

Log Analyzer Reference


Birmingham Environment for Academic Research. Introduction to Linux Quick Reference Guide. Research Computing Team V1.0

Intro to djbdns. A DNS server besides BIND. Nathan Straz. nate@techie.com. Intro to djbdns p.1/21

Active Directory Manager Pro New Features

HP INTEGRATED ARCHIVE PLATFORM

Version Control! Scenarios, Working with Git!

StreamServe Persuasion SP5 Document Broker Plus

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Thirty Useful Unix Commands

Networks. Inter-process Communication. Pipes. Inter-process Communication

Client-Server Creation of Complex PDF Documents

Transcription:

Enrico Zini enrico@debian.org 25 February 2007

BTS Where to find it Source code: bzr branch http://bugs.debian.org/debbugs-source/mainline/ Data on merkel at /org/bugs.debian.org/spool/ Data rsyncable at merkel.debian.org::bts-spool-db/ Directory structure: / various data files (which I haven t explored) archive/nn archived bugs (nn is last 2 digits of bug no.) db-h/nn active bugs (nn is last 2 digits of bug no.) user/ usertags data Files: *.log all raw bug activity, including the archived messages *.report the mail that opened the bug *.summary some summary bug information *.status obsolete, superseded by summary

BTS Other access methods LDAP query ldapsearch -p 10101 -h bts2ldap.debian.net -x -b \ dc=current,dc=bugs,dc=debian,dc=org \ "(&(debbugssourcepackage=$srcpkg)(debbugsstate=open))" \ debbugsid grep ˆdebbugsID sed s/ˆdebbugsid: // SOAP interface (see http://bugs.debian.org/377520, ask dondelelcaro for more info)

BTS Example code #!/usr/bin/perl -w # Prints the e-mail of the sender of the last message for the given bug use Debbugs::Log; use...; my $CACHEDIR =./cache ; my $MERKELPATH = /org/bugs.debian.org/spool/db-h/ ; my $RSYNCPATH = merkel.debian.org::bts-spool-db/ ; my $bug = shift(@argv); die " $bug is not a bug number" if ($bug!~ /^\d+$/); my $log = substr($bug, -2)."/".$bug.".log"; if ( -d $MERKELPATH ) { # We are on merkel $log = $MERKELPATH.$log; } else { # We are elsewhere: rsync the bug log from merkel my $cmd = "rsync -q $RSYNCPATH$log $CACHEDIR/"; system($cmd) and die "Cannot fetch bug log from merkel: $cmd failed with status $?"; $log = "$CACHEDIR/$bug.log"; } my $in = IO::File->new($log); my $reader = Debbugs::Log->new($in); my $lastrec = undef; while (my $rec = $reader->read_record()) { $lastrec = $rec if $rec->{type} eq incoming-recv ; } die "No incoming-recv records found" if not defined $lastrec; $in->close(); open (IN, "<", \$lastrec->{text}); my $h = Mail::Header->new(\*IN); my $from = $h->get("from"); close(in); die "No From address in the last mail" if not defined $from; for my $f (Mail::Address->parse($from)) { print $f->address(), "\n"; } exit 0;

Mole Big index of data periodically mined from the archive, by Jeroen van Wolffelaar. Info: http://wiki.debian.org/mole Source: merkel:/org/qa.debian.org/mole/db/ Public source: http://qa.debian.org/data/mole/db Databases I used: desktopfiles: all.desktop files in the archive dscfiles-control: all debian/control files More databases: dscfiles-watch, lintian-version, packages-debian-suite-bin packages-debian-suite-src

Mole Example code #!/usr/bin/python import bsddb import re DB = /org/qa.debian.org/data/mole/db/dscfiles-control.moledb db = bsddb.btopen(db, "r") re_pkg = re.compile(r"^package:\s+(\s+)\s*$", re.m) re_tag = re.compile(r"^tag: +([^\n]+?)(?:, \s)*$", re.m) for k, v in db.iteritems(): m_pkg = re_pkg.search(v) if not m_pkg: continue m_tag = re_tag.search(v) if not m_tag: continue print "%s: %s" % (m_pkg.groups()[0], m_tag.groups()[0])

db.debian.org LDAP interface To access it, from any Debian machine: ldapsearch -x -h db.debian.org -b dc=debian,dc=org "$@" Example code: # Count developers: ldapsearch -x -h db.debian.org -b dc=debian,dc=org \ (&(keyfingerprint=*)(gidnumber=800)) grep ˆuid: wc # Stats by nationality: ldapsearch -x -h db.debian.org -b ou=users,dc=debian,dc=org c \ grep ˆc: sort uniq -c sort -n tail

Debian Developer s Packages Overview Besides developer.php there is a repository with raw data at http://qa.debian.org/data/ddpo/. How to read maintainer / comaintainer information: Location: http://qa.debian.org/data/ddpo/results/ddpo_maintainers passwd-like format, one maintainer per line. Comaintained packages are marked with a #: ;enrico@debian.org;noid;enrico Zini;buffy cnf dballe debtags debtags-edit festival-it# guessnet launchtool libapt-front# libbuffy libdebtags-perl libept# libwibble# openoffice.org-thesaurus-it polygen python-debian# tagcoll tagcoll2 tagcolledit thescoder;;;;;

Aggregated package descriptions All package descriptions of all architectures of sid and experimental: http://people.debian.org/ enrico/allpackages.gz Same, but sid only: http://people.debian.org/ enrico/allpackages-nonexperimental.gz In your system only: grep-aptavail -spackage,description.

Indexing and searching package descriptions #!/usr/bin/python "Create the package description index" import xapian, re, gzip, deb822 #!/usr/bin/python "Search the package description index" tokenizer = re.compile("[^a-za-z0-9_-]+") # How we normalize tokens before indexing stemmer = xapian.stem("english") def normalise(word): return stemmer.stem_word(word.lower()) # Index all packages # ( wget -c http://people.debian.org/~enrico/allpackages.gz ) database = xapian.writabledatabase( \ "descindex", xapian.db_create_or_open) input = gzip.gzipfile("allpackages.gz") for p in deb822.packages.iter_paragraphs(input): idx = 1 doc = xapian.document() doc.set_data(p["package"]); doc.add_posting(normalise(p["package"]), idx); idx += 1 for tok in tokenizer.split(p["description"]): if len(tok) == 0: continue doc.add_posting(normalise(tok), idx); idx += 1 database.add_document(doc); database.flush() import xapian, sys # Open the database database = xapian.database("descindex") # We need to stem search terms as well stemmer = xapian.stem("english") def normalise(word): return stemmer.stem_word(word.lower()) # Perform the query enquire = xapian.enquire(database) query = xapian.query(xapian.query.op_or, \ map(normalise, sys.argv[1:])) enquire.set_query(query) # Show the matching packages matches = enquire.get_mset(0, 30) for match in matches: print "%3d%%: %s" % ( \ match[xapian.mset_percent], \ match[xapian.mset_document].get_data())

Aggregated popcon frequencies http://people.debian.org/ enrico/popcon-frequencies.gz #!/usr/bin/python "Print the most representative packages in the system" import gzip, math freqs, local = {}, {} # Read global frequency data for line in gzip.gzipfile("popcon-frequencies.gz"): key, val = line[:-1].split( ) freqs[key] = float(val) doccount = freqs.pop( NDOCS ) # Read local popcon data for line in open("/var/log/popularity-contest"): if line.startswith("popularity"): continue if line.startswith("end-popularity"): continue data = line[:-1].split(" ") if len(data) < 4: continue if data[3] == <NOFILES> : # Empty/virtual local[data[2]] = 0.1 elif len(data) == 4: # In use local[data[2]] = 1. elif data[4] == <OLD> : # Unused local[data[2]] = 0.3 elif data[4] == <RECENT-CTIME> : local[data[2]] = 0.8 # Recently installed # TFIDF package scoring function def score(pkg): if not pkg in freqs: return 0 return local[pkg] * math.log(doccount / freqs[pkg]) # Sort the package list by TFIDF score packages = local.keys() packages.sort(key=score, reverse=true) # Output the sorted package list for idx, pkg in enumerate(packages): print "%2d) %s" % (idx+1, pkg) if idx > 30: break

Popcon-based suggestions 1 Submit /var/log/popularity-contest as a file form field called scan to http://people.debian.org/ enrico/anapop 2 Get a text/plain answer with a token 3 Get statistics with http://people.debian.org/ enrico/anapop/stats/token 4 Get package suggestions with http://people.debian.org/ enrico/anapop/xposquery/token

debtags data Locally installed data sources: Package tag mapping in /var/lib/debtags/package-tags (merges all configured tag sources) Facet and tag descriptions in /var/lib/debtags/vocabulary (merges all configured tag sources) Tags in the packages file: grep-aptavail -spackage,tag. On the internet: http://debtags.alioth.debian.org/tags/tags-current.gz http://debtags.alioth.debian.org/tags/vocabulary.gz Other tag sources can be available (e.g. http://www.iterating.org/tags/{tags-current,vocabulary}.gz) tagcoll grep - tagcoll reverse - debtags search - debtags tagsearch - debtags dumpavail - debtags tag [add,rm,ls] - debtags smartsearch -...

debtags web queries Suite of CGIs at http://debtags.alioth.debian.org/cgi-bin/... fts/key1[/key2...] Full text search stags/key1[/key2...] Smart tag search sugg/pkg Suggests tags for the package unt[/num] Packages with no tags, or no more than num tags maint/email Package list for a given maintainer pkgs/tag1[/tag2...] Packages with at least all these tags pkg/pkgname Package tags and description qstats Quick tag database statistics patch Submit an unreviewed patch http://debtags.alioth.debian.org/js/debtags.js Functions to easily interface to all these CGIs http://debtags.alioth.debian.org/js/vocabulary.js Javascript-accessible facet and tag descriptions

In need of more documentation What else do you know?

Conclusion There is no conclusion. Add more examples in the wiki. Post more examples in debian-devel-announce. Blog more examples. This can only be fun.